James Darnley
728651df06
avcodec/h264: mmx2, sse2, avx 10-bit 4:2:2 h chroma deblock/loop filter
...
Yorkfield:
- mmx2: 2.53x (504 vs. 199 cycles)
- sse2: 3.83x (504 vs. 131 cycles)
Nehalem:
- mmx2: 2.42x (365 vs. 151 cycles)
- sse2: 3.56x (365 vs. 103 cycles)
Skylake:
- mmx2: 1.81x (308 vs. 170 cycles)
- sse2: 2.84x (308 vs. 108 cycles)
- avx: 2.93x (308 vs. 105 cycles)
8 years ago
James Darnley
add21d0bb3
avcodec/h264: mmx2, sse2, avx 10-bit h chroma deblock/loop filter
...
Yorkfield:
- mmx2: 2.45x (279 vs. 114 cycles)
- sse2: 3.36x (279 vs. 83 cycles)
Nehalem:
- mmx2: 2.10x (192 vs. 92 cycles)
- sse2: 2.84x (192 vs. 68 cycles)
Skylake:
- mmx2: 1.75x (170 vs. 97 cycles)
- sse2: 2.47x (170 vs. 69 cycles)
- avx: 2.47x (170 vs. 69 cycles)
8 years ago
James Darnley
58ca2ef62e
whitespace changes after last commit
8 years ago
James Darnley
f33714a694
avcodec/h264: clean up and expand x86 function definitions
8 years ago
James Darnley
13d71c28cc
avcodec/h264: sse2 and avx 4:2:2 idct add8 10-bit functions
...
Yorkfield:
- sse2:
- complex: 4.13x faster (1514 vs. 367 cycles)
- simple: 4.38x faster (1836 vs. 419 cycles)
Skylake:
- sse2:
- complex: 3.61x faster ( 936 vs. 260 cycles)
- simple: 3.97x faster (1126 vs. 284 cycles)
- avx (versus sse2):
- complex: 1.07x faster (260 vs. 244 cycles)
- simple: 1.03x faster (284 vs. 274 cycles)
8 years ago
James Darnley
1dae7ffa0b
avcodec/h264: mmx 4:2:2 idct add8 function
...
2.87 times faster (1830 vs. 638 cycles)
8 years ago
James Darnley
815ea8c6cc
avcodec/h264: mmxext 4:2:2 chroma intra deblock/loop filter
...
2.1 times faster (401 vs. 194 cycles)
8 years ago
Michael Niedermayer
bc26fe8927
avcodec/h264: Use ptrdiff_t for (bi)weight functions
...
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
9 years ago
James Darnley
7042a55c55
avcodec/h264: mmxext 4:2:2 chroma deblock/loop filter
...
2.6 times faster (366 vs. 142 cycles)
9 years ago
Diego Biurrun
5ab03e41e5
x86: h264dsp: Fix link failure with optimizations disabled
...
With optimzations disabled compilers have trouble doing dead code
elimination on 'if (foo && 0)' expressions, while 'if (0 && foo)'
still works, so use the latter to avoid problems.
Bug-Id: 707
11 years ago
Diego Biurrun
b42f49e42f
x86: dsputil: Eliminate some unnecessary dsputil_x86.h #includes
11 years ago
Anton Khirnov
a03a642d5c
h264: do not use 422 functions for monochrome
...
Fixes invalid memory access.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC:libav-stable@libav.org
11 years ago
Diego Biurrun
e998b56362
x86: avcodec: Consistently structure CPU extension initialization
12 years ago
Diego Biurrun
3ac7fa81b2
Consistently use "cpu_flags" as variable/parameter name for CPU flags
12 years ago
Diego Biurrun
1399931d07
x86: dsputil: Rename dsputil_mmx.h --> dsputil_x86.h
...
The header is not (anymore) MMX-specific.
12 years ago
Diego Biurrun
f2e9d44a57
x86: Drop unnecessary ff_ name prefixes from static functions
12 years ago
Diego Biurrun
c9f933b5b6
Add av_cold attributes to arch-specific init functions
12 years ago
Diego Biurrun
88bd7fdc82
Drop DCTELEM typedef
...
It does not help as an abstraction and adds dsputil dependencies.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
12 years ago
Ronald S. Bultje
ce58642ed0
x86inc: support stack mem allocation and re-alignment in PROLOGUE.
...
Use this in VP8/H264-8bit loopfilter functions so they can be used if
there is no aligned stack (e.g. MSVC 32bit or ICC 10.x).
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
12 years ago
Ronald S. Bultje
6f40e9f070
x86inc: support stack mem allocation and re-alignment in PROLOGUE
...
Use this in VP8/H264-8bit loopfilter functions so they can be used if
there is no aligned stack (e.g. MSVC 32bit or ICC 10.x).
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
12 years ago
Diego Biurrun
89145fbbfe
x86: h264dsp: Fix linking with yasm and optimizations disabled
...
Some optimized functions reference optimized symbols, so the functions
must be explicitly disabled when those symbols are unavailable.
12 years ago
Diego Biurrun
26301caaa1
x86: mmx2 ---> mmxext in asm constructs
12 years ago
Diego Biurrun
d8eda37080
x86: mmx2 ---> mmxext in function names
12 years ago
Michael Niedermayer
6add8eb2ce
x86/h264dsp_init: put a HAVE_YASM back
...
Should fix compilation on open solaris
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Diego Biurrun
e0c6cce447
x86: Replace checks for CPU extensions and flags by convenience macros
...
This separates code relying on inline from that relying on external
assembly and fixes instances where the coalesced check was incorrect.
13 years ago
Diego Biurrun
a84ac7a860
x86: h264dsp: drop some unnecessary ifdefs around prototype declarations
13 years ago
Carl Eugen Hoyos
a26789cf9f
Fix compilation with yasm-0.6.2.
13 years ago
Diego Biurrun
17337f54c0
x86: Split inline and external assembly #ifdefs
13 years ago
Diego Biurrun
29cfdd3767
x86: avcodec: Appropriately name files containing only init functions
13 years ago
Mans Rullgard
c318626ce2
x86: rename libavutil/x86_cpu.h to libavutil/x86/asm.h
...
This puts x86-specific things in the x86/ subdirectory where they
belong.
Signed-off-by: Mans Rullgard <mans@mansr.com>
13 years ago
Diego Biurrun
239fdf1b4a
x86: build: replace mmx2 by mmxext
...
Refactoring mmx2/mmxext YASM code with cpuflags will force renames.
So switching to a consistent naming scheme beforehand is sensible.
The name "mmxext" is more official and widespread and also the name
of the CPU flag, as reported e.g. by the Linux kernel.
13 years ago
Diego Biurrun
81905088a1
x86: h264dsp: K&R formatting cosmetics
13 years ago
Diego Biurrun
6376a3ad24
x86: h264dsp: Remove unused variable ff_pb_3_1
13 years ago
Diego Biurrun
8728b381cb
x86: h264dsp: Adjust YASM #ifdefs
...
This fixes compilation with YASM disabled.
13 years ago
Ronald S. Bultje
b829b4ce29
h264: convert loop filter strength dsp function to yasm.
...
This completes the conversion of h264dsp to yasm; note that h264 also
uses some dsputil functions, most notably qpel. Performance-wise, the
yasm-version is ~10 cycles faster (182->172) on x86-64, and ~8 cycles
faster (201->193) on x86-32.
13 years ago
Ronald S. Bultje
a5bbb1242c
h264_loopfilter: port x86 simd to cpuflags.
13 years ago
Diego Biurrun
fe07c9c6b5
x86: Only use optimizations with cmov if the CPU supports the instruction
13 years ago
Michael Niedermayer
915ec91e6b
libavcodec/x86/h264dsp_mmx.c: add forgotten HAVE_YASM
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Reimar Döffinger
b223035511
Detect and check for CMOV.
...
Some MMX-only CPUs do not have support for CMOV.
All SSE/MMX2 CPUs should be fine, thus no check was
added to those functions.
See also https://sourceforge.net/tracker/?func=detail&aid=3358347&group_id=205275&atid=992986
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
13 years ago
Ronald S. Bultje
c2d337429c
H264: change weight/biweight functions to take a height argument.
...
Neon parts by Mans Rullgard <mans@mansr.com>.
14 years ago
Ronald S. Bultje
229d263cc9
Support for lossless and inter H264 4:2:2.
14 years ago
Baptiste Coudurier
76741b0e56
h264: 4:2:2 intra decoding support
...
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
14 years ago
Baptiste Coudurier
231a6df9ea
h264dec: h264: 4:2:2 intra decoding
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
14 years ago
Jason Garrett-Glaser
b5bbc84fe2
H.264: add filter_mb_fast support for >8-bit decoding
...
Much faster high bit depth deblocking.
14 years ago
Daniel Kang
84e70ef004
h264: Add x86 assembly for 10-bit weight/biweight H.264 functions.
...
Mainly ported from 8-bit H.264 weight/biweight.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
14 years ago
Carl Eugen Hoyos
5fb67d8039
Fix compilation with old yasm.
14 years ago
Daniel Kang
f3aa65af3a
h264/10bit: add HAVE_ALIGNED_STACK checks.
...
Fixes regression in 836f47d34b
in ICC-10.x,
since ICC<=11.0 doesn't align stack upon function calls.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
14 years ago
Daniel Kang
348493db60
Update 8-bit H.264 IDCT function names to reflect bit-depth.
...
Signed-off-by: Ronald S. Bultje <rbultje@google.com>
14 years ago
Daniel Kang
836f47d34b
Add IDCT functions for 10-bit H.264.
...
Ports the majority of IDCT functions for 10-bit H.264.
Parts are inspired from 8-bit IDCT code in Libav; other parts ported from x264 with relicensing permission from author.
Signed-off-by: Ronald S. Bultje <rbultje@google.com>
14 years ago
Gil Pedersen
257de5fb25
h264dsp_mmx: Add #ifdefs around some mmxext functions on x86_64.
...
This fixes linking errors due to undefined symbols on x86_64 OS X.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
14 years ago