Rémi Denis-Courmont
cdcb4b98b7
lavc/riscv: use ff_rv_vlen_least()
8 months ago
sunyuechi
a7ad76fbbf
lavc/me_cmp: R-V V nsse
...
C908:
nsse_0_c: 1990.0
nsse_0_rvv_i32: 572.0
nsse_1_c: 910.0
nsse_1_rvv_i32: 456.0
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
11 months ago
sunyuechi
9b90d0d36a
lavc/me_cmp: R-V V vsse vsad intra
...
C908:
vsad_4_c: 681.0
vsad_4_rvv_i32: 182.2
vsad_5_c: 278.0
vsad_5_rvv_i32: 145.2
vsse_4_c: 595.0
vsse_4_rvv_i32: 125.2
vsse_5_c: 281.0
vsse_5_rvv_i32: 101.2
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
11 months ago
sunyuechi
925b55a5e8
lavc/me_cmp: R-V V vsse vsad
...
C908:
vsad_0_c: 936.0
vsad_0_rvv_i32: 236.2
vsad_1_c: 424.0
vsad_1_rvv_i32: 190.2
vsse_0_c: 877.0
vsse_0_rvv_i32: 204.2
vsse_1_c: 439.0
vsse_1_rvv_i32: 140.2
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
11 months ago
sunyuechi
9cb8f262f2
lavc/me_cmp: R-V V sse
...
C908:
sse_0_c: 614.7
sse_0_rvv_i32: 138.2
sse_1_c: 302.7
sse_1_rvv_i32: 107.2
sse_2_c: 175.7
sse_2_rvv_i32: 104.2
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
11 months ago
sunyuechi
37463d7979
lavc/me_cmp: R-V V pix_abs_y2
...
C908:
pix_abs_0_2_c: 904.0
pix_abs_0_2_rvv_i32: 172.2
pix_abs_1_2_c: 460.0
pix_abs_1_2_rvv_i32: 168.2
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
11 months ago
sunyuechi
f1ec475f66
lavc/me_cmp: R-V V pix_abs_x2
...
C908:
pix_abs_0_1_c: 767.0
pix_abs_0_1_rvv_i32: 196.2
pix_abs_1_1_c: 388.0
pix_abs_1_1_rvv_i32: 185.2
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
11 months ago
sunyuechi
b41e115dde
lavc/me_cmp: R-V V pix_abs
...
C908:
pix_abs_0_0_c: 534.0
pix_abs_0_0_rvv_i32: 136.2
pix_abs_1_0_c: 287.7
pix_abs_1_0_rvv_i32: 125.2
sad_0_c: 534.0
sad_0_rvv_i32: 136.2
sad_1_c: 287.7
sad_1_rvv_i32: 125.2
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
11 months ago
sunyuechi
d897bbb48d
lavc/vp8dsp: R-V V vp8_idct_dc_add4uv
...
c908:
vp8_idct_dc_add4uv_c: 387.7
vp8_idct_dc_add4uv_rvv_i32: 134.5
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
11 months ago
sunyuechi
e74e18cae4
lavc/vp8dsp: R-V V vp8_idct_dc_add4y
...
c908:
vp8_idct_dc_add4y_c: 368.5
vp8_idct_dc_add4y_rvv_i32: 134.5
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
11 months ago
sunyuechi
c12053cefc
lavc/vp8dsp: R-V V vp8_idct_dc_add
...
c908:
vp8_idct_dc_add_c: 102.2
vp8_idct_dc_add_rvv_i32: 42.0
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
11 months ago
sunyuechi
89189dd9e7
lavc/rv34dsp: R-V V rv34_idct_dc_add
...
C908:
rv34_idct_dc_add_c: 134.7
rv34_idct_dc_add_rvv_i32: 45.5
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
11 months ago
sunyuechi
ee08974f90
lavc/rv34dsp: R-V V rv34_inv_transform_dc
...
C908:
rv34_inv_transform_dc_c: 35.5
rv34_inv_transform_dc_rvv_i32: 27.0
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
11 months ago
sunyuechi
fdebde817c
lavc/blockdsp: R-V V clear_blocks
...
C908:
blockdsp.clear_blocks_c: 128.2
blockdsp.clear_blocks_rvv_i64: 102.5
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
11 months ago
sunyuechi
0748d2bbc7
lavc/blockdsp: R-V V clear_block
...
C908:
blockdsp.clear_block_c: 47.2
blockdsp.clear_block_rvv_i64: 28.5
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
11 months ago
Rémi Denis-Courmont
9bc5676e40
lavc/g722dsp: add RISC-V V DSP function
1 year ago
Rémi Denis-Courmont
b6585eb04c
lavu: add/use flag for RISC-V Zba extension
...
The code was blindly assuming that Zbb or V implied Zba. While the
earlier is practically always true, the later broke some QEMU setups,
as V was introduced earlier than Zba.
1 year ago
Rémi Denis-Courmont
c1bb19e263
lavu/fixeddsp: RISC-V V butterflies_fixed
2 years ago
Rémi Denis-Courmont
04d092e7d5
lavc/audiodsp: RISC-V F vector_clipf
...
RV64G supports MIN & MAX instructions natively only on floating point
registers, not general purpose ones. The later would require the Zbb
extension. Due to that, it is actually faster to perform the clipping
"properly" in FPU.
Benchmarks on SiFive U74-MC (courtesy of Shanghai StarFive Tech):
audiodsp.vector_clipf_c: 29551.5
audiodsp.vector_clipf_rvf: 17871.0
Also tried unrolling with 2 or 8 elements but it gets worse either way.
2 years ago
Diego Biurrun
9a9e2f1c8a
dsputil: Split audio operations off into a separate context
11 years ago
Ben Avison
9d8ecdd8ca
vc-1: Add platform-specific start code search routine to VC1DSPContext.
...
Initialise VC1DSPContext for parser as well as for decoder.
Note, the VC-1 code doesn't actually use the function pointer yet.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Mason Carter
832e190632
vc1: arm: Add NEON assembly
...
For:
ff_vc1_inv_trans_{8,4}x{8,4}_{dc_,}neon
ff_put_pixels8x8_neon
ff_put_vc1_mspel_mc{0,1,2,3}{0,1,2,3}_neon (except for 00)
Based on ARM assembly code in libavcodec/arm by Rob Clark and Mans
Rullgard.
Signed-off-by: Martin Storsjö <martin@martin.st>
11 years ago
Diego Biurrun
73b704ac60
arm: Add some missing header #includes
12 years ago
Mans Rullgard
b692d246ea
vp8: arm: separate ARMv6 functions from NEON
...
This is a preparation for complete ARMv6 optimisations.
Signed-off-by: Mans Rullgard <mans@mansr.com>
13 years ago
Mans Rullgard
d526c5338d
ARM: allow runtime masking of CPU features
...
This allows masking CPU features with the -cpuflags avconv option
which is useful for testing different optimisations without rebuilding.
Signed-off-by: Mans Rullgard <mans@mansr.com>
13 years ago
Michael Niedermayer
c266eb1928
arm: Fix 10l typo
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Ronald S. Bultje
bd66f073fe
vp8: change int stride to ptrdiff_t stride.
...
On 64bit platforms with 32bit int, this means we won't have to sign-
extend the integer anymore.
13 years ago
Diego Biurrun
32f3c541bc
doxygen: Do not include license boilerplates in Doxygen comment blocks.
13 years ago
Ronald S. Bultje
a5dfeb612e
VP8: armv6 optimizations.
...
From 52.503s (~40fps) to 27.973sec (~80fps) decoding of 480p sintel
trailer, i.e. a ~2x speedup overall, on a Nexus S.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Mans Rullgard
2912e87a6c
Replace FFmpeg with Libav in licence headers
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
14 years ago
Mans Rullgard
ef15d71c1f
VP8: ARM NEON optimisations for dsp functions
...
This adds NEON optimised versions of all functions in VP8DSPContext.
Based on initial work by Rob Clark.
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit a1c1d3c003
)
14 years ago
Mans Rullgard
a1c1d3c003
VP8: ARM NEON optimisations for dsp functions
...
This adds NEON optimised versions of all functions in VP8DSPContext.
Based on initial work by Rob Clark.
Signed-off-by: Mans Rullgard <mans@mansr.com>
14 years ago