Rémi Denis-Courmont
616fdeaea3
lavc/riscv: depend on RVB and simplify accordingly
...
There is no known (real) hardware with V and without the complete B
extension. B was indeed required in the RISC-V application profile from
2022, earlier than V. There should not be any relevant hardware in the
future either.
In practice, different R-V Vector optimisations in FFmpeg already depend on
every constituent of the B extension anyhow, so it would not work well.
5 months ago
Rémi Denis-Courmont
658439934b
lavc/vp8dsp: R-V V vp8_idct_add
...
T-Head C908 (cycles):
vp8_idct_add_c: 312.2
vp8_idct_add_rvv_i32: 117.0
7 months ago
Rémi Denis-Courmont
121fb846b9
lavc/vp7dsp: add R-V V vp7_idct_dc_add4uv
...
This is almost the same story as vp7_idct_add4y. We just have to use
strided loads of 2 64-bit elements to account for the different data
layout in memory.
T-Head C908:
vp7_idct_dc_add4uv_c: 7.5
vp7_idct_dc_add4uv_rvv_i64: 2.0
vp8_idct_dc_add4uv_c: 6.2
vp8_idct_dc_add4uv_rvv_i32: 2.2 (before)
vp8_idct_dc_add4uv_rvv_i64: 2.0
SpacemiT X60:
vp7_idct_dc_add4uv_c: 6.7
vp7_idct_dc_add4uv_rvv_i64: 2.2
vp8_idct_dc_add4uv_c: 5.7
vp8_idct_dc_add4uv_rvv_i32: 2.5 (before)
vp8_idct_dc_add4uv_rvv_i64: 2.0
7 months ago
Rémi Denis-Courmont
91b5ea7bb9
lavc/vp8dsp: R-V V vp8_luma_dc_wht
...
This is not great as transposition is poorly supported, but it works:
vp8_luma_dc_wht_c: 2.5
vp8_luma_dc_wht_rvv_i32: 1.7
7 months ago
Rémi Denis-Courmont
4e56455d36
lavc/vp8dsp: avoid one multiplication on RISC-V
...
Use shifts rather than multiply, and save one instruction.
7 months ago
Rémi Denis-Courmont
5ebb071d79
lavc/vp8dsp: disable EPEL HV on RV128
...
RV128 is mostly scifi at this point, so we can just disable it here
(the EPEL HV prologue/epilogue do not save 128-bit registers).
7 months ago
sunyuechi
63697d3350
lavc/vp8dsp: R-V V put_epel hv
...
C908:
vp8_put_epel4_h4v4_c: 20.0
vp8_put_epel4_h4v4_rvv_i32: 11.0
vp8_put_epel4_h4v6_c: 25.2
vp8_put_epel4_h4v6_rvv_i32: 13.5
vp8_put_epel4_h6v4_c: 22.2
vp8_put_epel4_h6v4_rvv_i32: 14.5
vp8_put_epel4_h6v6_c: 29.0
vp8_put_epel4_h6v6_rvv_i32: 15.7
vp8_put_epel8_h4v4_c: 73.0
vp8_put_epel8_h4v4_rvv_i32: 22.2
vp8_put_epel8_h4v6_c: 90.5
vp8_put_epel8_h4v6_rvv_i32: 26.7
vp8_put_epel8_h6v4_c: 85.0
vp8_put_epel8_h6v4_rvv_i32: 27.2
vp8_put_epel8_h6v6_c: 104.7
vp8_put_epel8_h6v6_rvv_i32: 29.5
vp8_put_epel16_h4v4_c: 145.5
vp8_put_epel16_h4v4_rvv_i32: 26.5
vp8_put_epel16_h4v6_c: 190.7
vp8_put_epel16_h4v6_rvv_i32: 47.5
vp8_put_epel16_h6v4_c: 173.7
vp8_put_epel16_h6v4_rvv_i32: 33.2
vp8_put_epel16_h6v6_c: 222.2
vp8_put_epel16_h6v6_rvv_i32: 35.5
Amended to disable unsupported RV128.
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
8 months ago
Rémi Denis-Courmont
9d3f561721
lavc/vp8dsp: restrict RVI optimisations
...
They are actually awfully slow if the CPU does not support misaligned
accesses natively, so only use them if misaligned accesses are fast.
8 months ago
Rémi Denis-Courmont
cdcb4b98b7
lavc/riscv: use ff_rv_vlen_least()
8 months ago
sunyuechi
6e77af1c22
lavc/vp8dsp: R-V V put_epel v
...
C908:
vp8_put_epel4_v4_c: 11.0
vp8_put_epel4_v4_rvv_i32: 5.0
vp8_put_epel4_v6_c: 16.5
vp8_put_epel4_v6_rvv_i32: 6.2
vp8_put_epel8_v4_c: 43.7
vp8_put_epel8_v4_rvv_i32: 11.2
vp8_put_epel8_v6_c: 68.7
vp8_put_epel8_v6_rvv_i32: 13.2
vp8_put_epel16_v4_c: 92.5
vp8_put_epel16_v4_rvv_i32: 13.7
vp8_put_epel16_v6_c: 135.7
vp8_put_epel16_v6_rvv_i32: 16.5
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
8 months ago
sunyuechi
109daea619
lavc/vp8dsp: R-V V put_epel h
...
C908:
vp8_put_epel4_h4_c: 10.7
vp8_put_epel4_h4_rvv_i32: 5.0
vp8_put_epel4_h6_c: 15.0
vp8_put_epel4_h6_rvv_i32: 6.2
vp8_put_epel8_h4_c: 43.2
vp8_put_epel8_h4_rvv_i32: 11.2
vp8_put_epel8_h6_c: 57.5
vp8_put_epel8_h6_rvv_i32: 13.5
vp8_put_epel16_h4_c: 92.5
vp8_put_epel16_h4_rvv_i32: 13.7
vp8_put_epel16_h6_c: 139.0
vp8_put_epel16_h6_rvv_i32: 16.5
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
8 months ago
sunyuechi
538f217bbb
lavc/vp8dsp: R-V V put_bilin_hv
...
C908:
vp8_put_bilin4_hv_c: 561.0
vp8_put_bilin4_hv_rvv_i32: 232.7
vp8_put_bilin8_hv_c: 2162.7
vp8_put_bilin8_hv_rvv_i32: 506.7
vp8_put_bilin16_hv_c: 4769.7
vp8_put_bilin16_hv_rvv_i32: 556.7
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
8 months ago
sunyuechi
bb5039b3cb
lavc/vp8dsp: R-V V put_bilin_h v
...
C908:
vp8_put_bilin4_h_c: 367.0
vp8_put_bilin4_h_rvv_i32: 137.7
vp8_put_bilin4_v_c: 377.0
vp8_put_bilin4_v_rvv_i32: 137.7
vp8_put_bilin8_h_c: 1431.0
vp8_put_bilin8_h_rvv_i32: 297.5
vp8_put_bilin8_v_c: 1449.0
vp8_put_bilin8_v_rvv_i32: 297.5
vp8_put_bilin16_h_c: 2839.0
vp8_put_bilin16_h_rvv_i32: 344.7
vp8_put_bilin16_v_c: 2857.0
vp8_put_bilin16_v_rvv_i32: 344.7
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
8 months ago
sunyuechi
0b8e5e5a00
lavc/vp8dsp: R-V put_vp8_pixels
...
C908:
vp8_put_pixels4_c: 78.0
vp8_put_pixels4_rvi: 33.7
vp8_put_pixels8_c: 278.0
vp8_put_pixels8_rvi: 55.0
vp8_put_pixels16_c: 999.0
vp8_put_pixels16_rvi: 86.7
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
8 months ago
sunyuechi
d897bbb48d
lavc/vp8dsp: R-V V vp8_idct_dc_add4uv
...
c908:
vp8_idct_dc_add4uv_c: 387.7
vp8_idct_dc_add4uv_rvv_i32: 134.5
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
11 months ago
sunyuechi
e74e18cae4
lavc/vp8dsp: R-V V vp8_idct_dc_add4y
...
c908:
vp8_idct_dc_add4y_c: 368.5
vp8_idct_dc_add4y_rvv_i32: 134.5
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
11 months ago
sunyuechi
c12053cefc
lavc/vp8dsp: R-V V vp8_idct_dc_add
...
c908:
vp8_idct_dc_add_c: 102.2
vp8_idct_dc_add_rvv_i32: 42.0
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
11 months ago
sunyuechi
89189dd9e7
lavc/rv34dsp: R-V V rv34_idct_dc_add
...
C908:
rv34_idct_dc_add_c: 134.7
rv34_idct_dc_add_rvv_i32: 45.5
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
11 months ago
sunyuechi
ee08974f90
lavc/rv34dsp: R-V V rv34_inv_transform_dc
...
C908:
rv34_inv_transform_dc_c: 35.5
rv34_inv_transform_dc_rvv_i32: 27.0
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
11 months ago
sunyuechi
fdebde817c
lavc/blockdsp: R-V V clear_blocks
...
C908:
blockdsp.clear_blocks_c: 128.2
blockdsp.clear_blocks_rvv_i64: 102.5
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
11 months ago
sunyuechi
0748d2bbc7
lavc/blockdsp: R-V V clear_block
...
C908:
blockdsp.clear_block_c: 47.2
blockdsp.clear_block_rvv_i64: 28.5
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
11 months ago
Rémi Denis-Courmont
9bc5676e40
lavc/g722dsp: add RISC-V V DSP function
1 year ago
Rémi Denis-Courmont
b6585eb04c
lavu: add/use flag for RISC-V Zba extension
...
The code was blindly assuming that Zbb or V implied Zba. While the
earlier is practically always true, the later broke some QEMU setups,
as V was introduced earlier than Zba.
1 year ago
Rémi Denis-Courmont
c1bb19e263
lavu/fixeddsp: RISC-V V butterflies_fixed
2 years ago
Rémi Denis-Courmont
04d092e7d5
lavc/audiodsp: RISC-V F vector_clipf
...
RV64G supports MIN & MAX instructions natively only on floating point
registers, not general purpose ones. The later would require the Zbb
extension. Due to that, it is actually faster to perform the clipping
"properly" in FPU.
Benchmarks on SiFive U74-MC (courtesy of Shanghai StarFive Tech):
audiodsp.vector_clipf_c: 29551.5
audiodsp.vector_clipf_rvf: 17871.0
Also tried unrolling with 2 or 8 elements but it gets worse either way.
2 years ago
Diego Biurrun
9a9e2f1c8a
dsputil: Split audio operations off into a separate context
11 years ago
Ben Avison
9d8ecdd8ca
vc-1: Add platform-specific start code search routine to VC1DSPContext.
...
Initialise VC1DSPContext for parser as well as for decoder.
Note, the VC-1 code doesn't actually use the function pointer yet.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Mason Carter
832e190632
vc1: arm: Add NEON assembly
...
For:
ff_vc1_inv_trans_{8,4}x{8,4}_{dc_,}neon
ff_put_pixels8x8_neon
ff_put_vc1_mspel_mc{0,1,2,3}{0,1,2,3}_neon (except for 00)
Based on ARM assembly code in libavcodec/arm by Rob Clark and Mans
Rullgard.
Signed-off-by: Martin Storsjö <martin@martin.st>
11 years ago
Diego Biurrun
73b704ac60
arm: Add some missing header #includes
12 years ago
Mans Rullgard
b692d246ea
vp8: arm: separate ARMv6 functions from NEON
...
This is a preparation for complete ARMv6 optimisations.
Signed-off-by: Mans Rullgard <mans@mansr.com>
13 years ago
Mans Rullgard
d526c5338d
ARM: allow runtime masking of CPU features
...
This allows masking CPU features with the -cpuflags avconv option
which is useful for testing different optimisations without rebuilding.
Signed-off-by: Mans Rullgard <mans@mansr.com>
13 years ago
Michael Niedermayer
c266eb1928
arm: Fix 10l typo
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Ronald S. Bultje
bd66f073fe
vp8: change int stride to ptrdiff_t stride.
...
On 64bit platforms with 32bit int, this means we won't have to sign-
extend the integer anymore.
13 years ago
Diego Biurrun
32f3c541bc
doxygen: Do not include license boilerplates in Doxygen comment blocks.
13 years ago
Ronald S. Bultje
a5dfeb612e
VP8: armv6 optimizations.
...
From 52.503s (~40fps) to 27.973sec (~80fps) decoding of 480p sintel
trailer, i.e. a ~2x speedup overall, on a Nexus S.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Mans Rullgard
2912e87a6c
Replace FFmpeg with Libav in licence headers
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
14 years ago
Mans Rullgard
ef15d71c1f
VP8: ARM NEON optimisations for dsp functions
...
This adds NEON optimised versions of all functions in VP8DSPContext.
Based on initial work by Rob Clark.
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit a1c1d3c003
)
14 years ago
Mans Rullgard
a1c1d3c003
VP8: ARM NEON optimisations for dsp functions
...
This adds NEON optimised versions of all functions in VP8DSPContext.
Based on initial work by Rob Clark.
Signed-off-by: Mans Rullgard <mans@mansr.com>
14 years ago