FFmpeg

Commit Graph

Author	SHA1	Message	Date
Rémi Denis-Courmont	adc87a5f7c	lavc/opusdsp: rewrite R-V V postfilter This uses a more traditional approach allowing up processing of up to period minus two elements per iteration. This also allows the algorithm to work for all and any vector length. As the T-Head C908 device under test can load 16 elements loop, there is unsurprisingly a little performance drop when the period is minimal and the parallelism is capped at 13 elements: Before: postfilter_15_c: 21222.2 postfilter_15_rvv_f32: 22007.7 postfilter_512_c: 20189.7 postfilter_512_rvv_f32: 22004.2 postfilter_1022_c: 20189.7 postfilter_1022_rvv_f32: 22004.2 After: postfilter_15_c: 20189.5 postfilter_15_rvv_f32: 7057.2 postfilter_512_c: 20189.5 postfilter_512_rvv_f32: 5667.2 postfilter_1022_c: 20192.7 postfilter_1022_rvv_f32: 5667.2	1 year ago
Rémi Denis-Courmont	bfc69297c5	lavc/opusdsp: RISC-V V (512-bit) postfilter This adds a variant of the postfilter for use with 512-bit vectors. Half a vector is enough to perform the scalar product. Normally a whole vector would be used anyhow. Indeed fractional multiplers are no faster than the unit multipler. But in this particular function, a full vector makes up 16 samples, which would be loaded at each iteration of the outer loop. The minimum guaranteed CELT postfilter period is only 15. Accounting for the edges, we can only safely preload up to 13 samples. The fractional multipler is thus used to cap the selected vector length to a safe value of 8 elements or 256 bits. Likewise, we have the 1024-bit variant with the quarter multipler. In theory, a 2048-bit one would be possible with the eigth multipler, but that length is not even defined in the specifications as of yet, nor is it supported by any emulator - forget actual hardware.	2 years ago
Rémi Denis-Courmont	97d34befea	lavc/opusdsp: RISC-V V (256-bit) postfilter This adds a variant of the postfilter for use with 256-bit vectors. As a single vector is then large enough to perform the scalar product, the group multipler is reduced to just one at run-time. The different vector type is passed via register. Unfortunately, there is no VSETIVL instruction, so the constant vector size (5) also needs to be passed via a register.	2 years ago
Rémi Denis-Courmont	8009581912	lavc/opusdsp: RISC-V V (128-bit) postfilter This is implemented for a vector size of 128-bit. Since the scalar product in the inner loop covers 5 samples or 160 bits, we need a group multipler of 2. To avoid reconfiguring the vector type, the outer loop, which loads multiple input samples sticks to the same multipler. Consequently, the outer loop loads 8 samples per iteration. This is safe since the minimum period of the CELT codec is 15 samples. The same code would also work, albeit needlessly inefficiently with a vector length of 256 bits. A proper implementation will follow instead.	2 years ago
Rémi Denis-Courmont	c1bb19e263	lavu/fixeddsp: RISC-V V butterflies_fixed	2 years ago
Rémi Denis-Courmont	04d092e7d5	lavc/audiodsp: RISC-V F vector_clipf RV64G supports MIN & MAX instructions natively only on floating point registers, not general purpose ones. The later would require the Zbb extension. Due to that, it is actually faster to perform the clipping "properly" in FPU. Benchmarks on SiFive U74-MC (courtesy of Shanghai StarFive Tech): audiodsp.vector_clipf_c: 29551.5 audiodsp.vector_clipf_rvf: 17871.0 Also tried unrolling with 2 or 8 elements but it gets worse either way.	2 years ago
Diego Biurrun	9a9e2f1c8a	dsputil: Split audio operations off into a separate context	11 years ago
Ben Avison	9d8ecdd8ca	vc-1: Add platform-specific start code search routine to VC1DSPContext. Initialise VC1DSPContext for parser as well as for decoder. Note, the VC-1 code doesn't actually use the function pointer yet. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	11 years ago
Mason Carter	832e190632	vc1: arm: Add NEON assembly For: ff_vc1_inv_trans_{8,4}x{8,4}_{dc_,}neon ff_put_pixels8x8_neon ff_put_vc1_mspel_mc{0,1,2,3}{0,1,2,3}_neon (except for 00) Based on ARM assembly code in libavcodec/arm by Rob Clark and Mans Rullgard. Signed-off-by: Martin Storsjö <martin@martin.st>	11 years ago
Diego Biurrun	73b704ac60	arm: Add some missing header #includes	12 years ago
Mans Rullgard	b692d246ea	vp8: arm: separate ARMv6 functions from NEON This is a preparation for complete ARMv6 optimisations. Signed-off-by: Mans Rullgard <mans@mansr.com>	13 years ago
Mans Rullgard	d526c5338d	ARM: allow runtime masking of CPU features This allows masking CPU features with the -cpuflags avconv option which is useful for testing different optimisations without rebuilding. Signed-off-by: Mans Rullgard <mans@mansr.com>	13 years ago
Michael Niedermayer	c266eb1928	arm: Fix 10l typo Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	13 years ago
Ronald S. Bultje	bd66f073fe	vp8: change int stride to ptrdiff_t stride. On 64bit platforms with 32bit int, this means we won't have to sign- extend the integer anymore.	13 years ago
Diego Biurrun	32f3c541bc	doxygen: Do not include license boilerplates in Doxygen comment blocks.	13 years ago
Ronald S. Bultje	a5dfeb612e	VP8: armv6 optimizations. From 52.503s (~40fps) to 27.973sec (~80fps) decoding of 480p sintel trailer, i.e. a ~2x speedup overall, on a Nexus S. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	14 years ago
Mans Rullgard	2912e87a6c	Replace FFmpeg with Libav in licence headers Signed-off-by: Mans Rullgard <mans@mansr.com>	14 years ago
Mans Rullgard	ef15d71c1f	VP8: ARM NEON optimisations for dsp functions This adds NEON optimised versions of all functions in VP8DSPContext. Based on initial work by Rob Clark. Signed-off-by: Mans Rullgard <mans@mansr.com> (cherry picked from commit `a1c1d3c003`)	14 years ago
Mans Rullgard	a1c1d3c003	VP8: ARM NEON optimisations for dsp functions This adds NEON optimised versions of all functions in VP8DSPContext. Based on initial work by Rob Clark. Signed-off-by: Mans Rullgard <mans@mansr.com>	14 years ago

4 Commits (9b41cc04300e8d00ae3a6326639e975712e21bb6)