James Almer
1faedb9a11
x85/opusdsp: enable the functions on all FMA3 CPUs
...
It's not using ymm registers, so limiting it to CPUs with fast AVX
is not necessary.
Signed-off-by: James Almer <jamrial@gmail.com>
5 years ago
James Almer
80444e23ac
x86/opusdps: clear the high bits from some gprs
...
Fixes checkasm on systems like win64.
Reviewed-by: Lynne
Signed-off-by: James Almer <jamrial@gmail.com>
5 years ago
James Almer
58d167bcd5
avcodec/Makefile: add missing pngdsp dependency to the lscr decoder
...
Signed-off-by: James Almer <jamrial@gmail.com>
6 years ago
James Almer
b41d8ab2e6
x86/v210dec: use named registers
...
Signed-off-by: James Almer <jamrial@gmail.com>
6 years ago
James Almer
abf1aa87ab
x86/v210dec: don't reserve more xmm regs than needed
...
Prevents pointless register saving on win64 for the sse3 and avx
versions of the function.
Signed-off-by: James Almer <jamrial@gmail.com>
6 years ago
James Almer
b0e29357ba
x86/v210dec: remove duplicate load instruction
...
Signed-off-by: James Almer <jamrial@gmail.com>
6 years ago
James Darnley
46f1718cd9
avcodec/x86/v210: fix operands of vpblendd used in new avx2 code
...
Assembly failed when using yasm rather than nasm.
6 years ago
Michael Stoner
ebd6fb23c5
libavcodec Adding ff_v210_planar_unpack AVX2
...
Replaced VSHUFPS with VPBLENDD to relieve port 5 bottleneck
AVX2 is 1.4x faster than AVX
6 years ago
Lynne
4b7166c9d5
x86/opusdsp: replace loads with shuffles
...
Has a slight speedup.
Can't be carried over to aarch64, since it has no shufps-like instruction.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
6 years ago
Lynne
b43b8d337d
x86/opusdsp: fix WIN64 return value
...
Signed-off-by: James Almer <jamrial@gmail.com>
6 years ago
Lynne
605e330310
x86/opusdsp: implement FMA3 accelerated postfilter and deemphasis
...
58893 decicycles in deemphasis_c, 130548 runs, 524 skips
9475 decicycles in deemphasis_fma3, 130686 runs, 386 skips -> 6.21x speedup
24866 decicycles in postfilter_c, 65386 runs, 150 skips
5268 decicycles in postfilter_fma3, 65505 runs, 31 skips -> 4.72x speedup
Total decoder speedup: ~14%
Deemphasis SIMD based on the following unrolling:
const float c1 = CELT_EMPH_COEFF, c2 = c1*c1, c3 = c2*c1, c4 = c3*c1;
float state = coeff;
for (int i = 0; i < len; i += 4) {
y[0] = x[0] + c1*state;
y[1] = x[1] + c2*state + c1*x[0];
y[2] = x[2] + c3*state + c1*x[1] + c2*x[0];
y[3] = x[3] + c4*state + c1*x[2] + c2*x[1] + c3*x[0];
state = y[3];
y += 4;
x += 4;
}
6 years ago
Lynne
5468c1d075
celt_pvq_init: only build when CONFIG_OPUS_ENCODER is enabled
...
The entire function was defined away before.
6 years ago
Lynne
4a2c651620
x86/opus_dsp: rename to celt_pvq
...
Its only used in the encoder and in CELT's PVQ.
6 years ago
James Almer
d5d699ab6e
avcodec/h264dsp: change loop filter stride argument to ptrdiff_t
6 years ago
Janne Grunau
156ea66c91
h264/x86: sign extend int stride in deblock functions
...
Fixes checkasm errors after adding the h264 deblock tests.
6 years ago
Martin Vignali
9a22e6fa1d
avcodec/proresdsp indent after prev commit
6 years ago
Martin Vignali
c097a32e93
avcodec/proresdec : rename dsp part for 10b and check dspinit for supported bits per raw sample
...
based on patch by Kieran Kunhya
6 years ago
Rostislav Pehlivanov
29eb1c51d7
mdct15: simplify x86 exptab permutation
...
Removes an unneeded copy and does the 5-point permute in-place.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
7 years ago
Rostislav Pehlivanov
a72d0fb973
mdct15: simplify the fft15 x86 SIMD
...
Saves 1 gpr and 2 instructions and simplifies the macros a bit.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
7 years ago
Martin Storsjö
347aa8f723
x86: Don't declare a non-static function as inline
...
This fixes building with clang in msvc mode, which does support
gcc style inline assembly.
7 years ago
Kieran Kunhya
f9d3841ae6
mpeg4video: Add support for MPEG-4 Simple Studio Profile.
...
This is a profile supporting > 8-bit video and has a higher quality DCT
7 years ago
Aurelien Jacobs
f1e490b1ad
sbcenc: add MMX optimizations
...
This was originally based on libsbc, and was fully integrated into ffmpeg.
Rough speed test:
C version: speed= 592x
MMX version: speed= 785x
7 years ago
Rostislav Pehlivanov
50945482a7
h264_idct: enable unmacro on newer NASM versions
...
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
7 years ago
Martin Vignali
8f9c38b196
avcodec/utvideoenc : add SIMD (avx) for sub_left_prediction
...
asm code by Henrik Gramner
7 years ago
James Almer
6e80079a28
avcodec: increase AV_INPUT_BUFFER_PADDING_SIZE to 64
...
AVX-512 support has been introduced, and even if no functions currently
use zmm registers (able to load as much as 64 bytes of consecutive data
per instruction), they will be added eventually.
Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Tested-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
7 years ago
James Almer
438f884fc4
x86/lossless_videodsp: rename ff_add_left_pred_int16_sse4 to ff_add_left_pred_int16_unaligned_ssse3
...
SSSE3_FAST is the proper check for it.
Signed-off-by: James Almer <jamrial@gmail.com>
7 years ago
James Almer
a4fc63c0f9
x86/lossless_videodsp: don't overread the dst buffer in ff_add_left_pred_unaligned_avx2
...
Fixes valgrind
Signed-off-by: James Almer <jamrial@gmail.com>
7 years ago
Martin Vignali
630967ef63
avcodec/utvideodec : add SIMD (SSSE3 and AVX2) for gradient_pred
7 years ago
Martin Vignali
4353c35067
avcodec/x86/lossless_videodsp : add avx2 version for add_left_pred
7 years ago
Martin Vignali
cfbcea1cca
avcodec/x86/lossless_videodsp.asm : make macro for add_left_pred_unaligned in order to add avx2 version
7 years ago
Martin Vignali
be6d1f9632
avcodec/x86/bswapdsp : use macro for 128 bits constants loading in xmm or ymm
7 years ago
Mikulas Patocka
fbdd78fa3e
avcodec/fft: fix INTERL macro on 3dnow
...
The commit b7c16a3f2c
("x86: fft: Port to
cpuflags") breaks the opus decoder in ffmpeg when compiling for 3dnow. The
output is audible, but there's a lot of noise.
The reason for the breakage is that the commit unintentionally changed the
INTERL macro so that it is empty when compiling for 3dnow. This patch
fixes it.
Signed-off-by: Mikulas Patocka <mikulas@twibright.com>
Signed-off-by: James Almer <jamrial@gmail.com>
7 years ago
Martin Vignali
515555af6c
avcodec/x86/exrdsp : use ymm constant for pb_80
...
speed seems to be similar, but simplify code
7 years ago
James Almer
beb63baa69
x86/utvideodsp: reuse shared constants
...
Remove the broadcast instructions as well now that they are wide
enough.
Signed-off-by: James Almer <jamrial@gmail.com>
7 years ago
James Almer
ebf352116b
x86/constants: make pb_80 32 byte wide
...
Signed-off-by: James Almer <jamrial@gmail.com>
7 years ago
Martin Vignali
ba98f8463f
avcodec/huffyuvdspenc : add diff_int16 AVX2 func
7 years ago
Martin Vignali
d189a426fa
avcodec/huffyuvdspenc : reorganize diff_int16
7 years ago
Martin Vignali
e641c94190
avcodec/huffyuvdsp : add add_int16 AVX2 func
7 years ago
Martin Vignali
6955e8842e
avcodec/huffyuvdsp : reorganize add_int16 asm
7 years ago
Martin Vignali
7f9b67bcb6
avcodec/huffyuvdsp(enc) : move duplicate macro to a template file
7 years ago
Martin Vignali
caf51a573d
avcodec/x86/utvideodsp.asm : cosmetic
...
better func separator
and add comment for the restore rgb planes10 declaration
7 years ago
Martin Vignali
b5ebe38443
avcodec/utvideodsp : add avx2 version for the dsp
7 years ago
Martin Vignali
48b7c45b0c
avcodec/x86/utvideodsp : make macro for func
7 years ago
James Almer
aea0f06db7
x86/jpeg2000dsp: add ff_ict_float_{fma3,fma4}
...
jpeg2000_ict_float_c: 2296.0
jpeg2000_ict_float_sse: 628.0
jpeg2000_ict_float_avx: 317.0
jpeg2000_ict_float_fma3: 262.0
Signed-off-by: James Almer <jamrial@gmail.com>
7 years ago
Michael Niedermayer
58cf31cee7
avcodec/x86/mpegvideodsp: Fix signedness bug in need_emu
...
Fixes: out of array read
Fixes: 3516/attachment-311488.dat
Found-by: Insu Yun, Georgia Tech.
Tested-by: wuninsu@gmail.com
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
7 years ago
Thomas Köppe
43171a2a73
Fix missing used attribute for inline assembly variables
...
Variables used in inline assembly need to be marked with attribute((used)).
Static constants already were, via the define of DECLARE_ASM_CONST.
But DECLARE_ALIGNED does not add this attribute, and some of the variables
defined with it are const only used in inline assembly, and therefore
appeared dead. This change adds a macro DECLARE_ASM_ALIGNED that marks
variables as used.
This change makes FFMPEG work with Clang's ThinLTO.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
7 years ago
Martin Vignali
0380b72d35
libavcodec/lossless_video_dsp : cosmetic add better separator for each function, in order to make reading of the asm file easier
7 years ago
Martin Vignali
da62128ea1
libavcodec/lossless_videodsp : add add_bytes avx2 version
7 years ago
James Almer
783535a4cd
x86/bswapdsp: add missing preprocessor wrappers for AVX2 functions
...
Fixes build with old nasm/yasm.
Signed-off-by: James Almer <jamrial@gmail.com>
7 years ago
Martin Vignali
e9930883a2
libavcodec/bswapdsp : add AVX2 func for bswap_buf (swap uint32_t)
7 years ago