FFmpeg

Commit Graph

Author	SHA1	Message	Date
Rémi Denis-Courmont	f576a0835b	lavc/aacpsdsp: rework R-V V hybrid_synthesis_deint Given the size of the data set, strided memory accesses cannot be avoided. We can still do better than the current code. ps_hybrid_synthesis_deint_c: 12065.5 ps_hybrid_synthesis_deint_rvv_i32: 13650.2 (before) ps_hybrid_synthesis_deint_rvv_i64: 8181.0 (after)	1 year ago
Rémi Denis-Courmont	eb508702a8	lavc/aacpsdsp: rework R-V V add_squares Segmented loads may be slower than not. So this advantageously uses a unit-strided load and narrowing shifts instead. Before: ps_add_squares_c: 60757.7 ps_add_squares_rvv_f32: 22242.5 After: ps_add_squares_c: 60516.0 ps_add_squares_rvv_i64: 17067.7	1 year ago
Rémi Denis-Courmont	b6585eb04c	lavu: add/use flag for RISC-V Zba extension The code was blindly assuming that Zbb or V implied Zba. While the earlier is practically always true, the later broke some QEMU setups, as V was introduced earlier than Zba.	1 year ago
Rémi Denis-Courmont	c03f9654c9	lavc/aacpsdsp: RISC-V V stereo_interpolate[0]	2 years ago
Rémi Denis-Courmont	a15edb0bc0	lavc/aacpsdsp: RISC-V V hybrid_synthesis_deint	2 years ago
Rémi Denis-Courmont	09f907999f	lavc/aacpsdsp: RISC-V V hybrid_analysis_ileave	2 years ago
Rémi Denis-Courmont	15c3a0bd6e	lavc/aacpsdsp: RISC-V V hybrid_analysis This starts with one-time initialisation of the 26 constant factors like `08edacc248`. That is done with the scalar instruction set. While the formula can readily be vectored, the gains would (probably) be more than lost in transfering the results back to FP registers (or suitably reshuffling them into vector registers). Note that the main loop could likely be scheduled sligthly better by expanding the filter macro and interleaving loads with arithmetic. It is not clear yet if that would be relevant for vector processing (as opposed to traditional SIMD). We could also use fewer vectors, but there is not much point in sparing them (they are all callee-clobbered).	2 years ago
Rémi Denis-Courmont	e180326a0b	lavc/aacpsdsp: RISC-V V mul_pair_single	2 years ago
Rémi Denis-Courmont	b0cacf4c3f	lavc/aacpsdsp: RISC-V V add_squares	2 years ago
Rémi Denis-Courmont	c1bb19e263	lavu/fixeddsp: RISC-V V butterflies_fixed	2 years ago
Rémi Denis-Courmont	04d092e7d5	lavc/audiodsp: RISC-V F vector_clipf RV64G supports MIN & MAX instructions natively only on floating point registers, not general purpose ones. The later would require the Zbb extension. Due to that, it is actually faster to perform the clipping "properly" in FPU. Benchmarks on SiFive U74-MC (courtesy of Shanghai StarFive Tech): audiodsp.vector_clipf_c: 29551.5 audiodsp.vector_clipf_rvf: 17871.0 Also tried unrolling with 2 or 8 elements but it gets worse either way.	2 years ago
Diego Biurrun	9a9e2f1c8a	dsputil: Split audio operations off into a separate context	11 years ago
Ben Avison	9d8ecdd8ca	vc-1: Add platform-specific start code search routine to VC1DSPContext. Initialise VC1DSPContext for parser as well as for decoder. Note, the VC-1 code doesn't actually use the function pointer yet. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	11 years ago
Mason Carter	832e190632	vc1: arm: Add NEON assembly For: ff_vc1_inv_trans_{8,4}x{8,4}_{dc_,}neon ff_put_pixels8x8_neon ff_put_vc1_mspel_mc{0,1,2,3}{0,1,2,3}_neon (except for 00) Based on ARM assembly code in libavcodec/arm by Rob Clark and Mans Rullgard. Signed-off-by: Martin Storsjö <martin@martin.st>	11 years ago
Diego Biurrun	73b704ac60	arm: Add some missing header #includes	12 years ago
Mans Rullgard	b692d246ea	vp8: arm: separate ARMv6 functions from NEON This is a preparation for complete ARMv6 optimisations. Signed-off-by: Mans Rullgard <mans@mansr.com>	13 years ago
Mans Rullgard	d526c5338d	ARM: allow runtime masking of CPU features This allows masking CPU features with the -cpuflags avconv option which is useful for testing different optimisations without rebuilding. Signed-off-by: Mans Rullgard <mans@mansr.com>	13 years ago
Michael Niedermayer	c266eb1928	arm: Fix 10l typo Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	13 years ago
Ronald S. Bultje	bd66f073fe	vp8: change int stride to ptrdiff_t stride. On 64bit platforms with 32bit int, this means we won't have to sign- extend the integer anymore.	13 years ago
Diego Biurrun	32f3c541bc	doxygen: Do not include license boilerplates in Doxygen comment blocks.	13 years ago
Ronald S. Bultje	a5dfeb612e	VP8: armv6 optimizations. From 52.503s (~40fps) to 27.973sec (~80fps) decoding of 480p sintel trailer, i.e. a ~2x speedup overall, on a Nexus S. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	13 years ago
Mans Rullgard	2912e87a6c	Replace FFmpeg with Libav in licence headers Signed-off-by: Mans Rullgard <mans@mansr.com>	14 years ago
Mans Rullgard	ef15d71c1f	VP8: ARM NEON optimisations for dsp functions This adds NEON optimised versions of all functions in VP8DSPContext. Based on initial work by Rob Clark. Signed-off-by: Mans Rullgard <mans@mansr.com> (cherry picked from commit `a1c1d3c003`)	14 years ago
Mans Rullgard	a1c1d3c003	VP8: ARM NEON optimisations for dsp functions This adds NEON optimised versions of all functions in VP8DSPContext. Based on initial work by Rob Clark. Signed-off-by: Mans Rullgard <mans@mansr.com>	14 years ago

9 Commits (161d0aa2a8d18f1f8a01cbc4c1061eadcbe592e5)