Rémi Denis-Courmont
210877c5fd
sws/riscv: depend on RVB and simplify accordingly
5 months ago
Rémi Denis-Courmont
417957ec5e
sws/range_convert: R-V V to/from JPEG
...
C908 X60
chrRangeFromJpeg_8_c: 2.7 2.5
chrRangeFromJpeg_8_rvv_i32: 1.7 1.5
chrRangeFromJpeg_24_c: 7.5 6.7
chrRangeFromJpeg_24_rvv_i32: 1.7 1.5
chrRangeFromJpeg_128_c: 55.2 34.7
chrRangeFromJpeg_128_rvv_i32: 6.5 3.0
chrRangeFromJpeg_144_c: 44.0 39.2
chrRangeFromJpeg_144_rvv_i32: 7.7 4.5
chrRangeFromJpeg_256_c: 78.2 69.5
chrRangeFromJpeg_256_rvv_i32: 12.2 6.0
chrRangeFromJpeg_512_c: 172.2 138.5
chrRangeFromJpeg_512_rvv_i32: 24.5 11.7
chrRangeToJpeg_8_c: 4.7 4.2
chrRangeToJpeg_8_rvv_i32: 2.0 1.7
chrRangeToJpeg_24_c: 13.7 12.2
chrRangeToJpeg_24_rvv_i32: 2.0 1.5
chrRangeToJpeg_128_c: 72.0 63.7
chrRangeToJpeg_128_rvv_i32: 6.7 3.2
chrRangeToJpeg_144_c: 80.7 71.7
chrRangeToJpeg_144_rvv_i32: 8.5 4.7
chrRangeToJpeg_256_c: 143.2 127.2
chrRangeToJpeg_256_rvv_i32: 13.5 6.5
chrRangeToJpeg_512_c: 285.7 253.7
chrRangeToJpeg_512_rvv_i32: 27.0 13.0
lumRangeFromJpeg_8_c: 1.7 1.5
lumRangeFromJpeg_8_rvv_i32: 1.2 1.0
lumRangeFromJpeg_24_c: 4.2 3.7
lumRangeFromJpeg_24_rvv_i32: 1.2 1.0
lumRangeFromJpeg_128_c: 21.7 19.2
lumRangeFromJpeg_128_rvv_i32: 3.7 1.7
lumRangeFromJpeg_144_c: 24.7 22.0
lumRangeFromJpeg_144_rvv_i32: 4.7 2.7
lumRangeFromJpeg_256_c: 43.7 39.0
lumRangeFromJpeg_256_rvv_i32: 7.5 3.2
lumRangeFromJpeg_512_c: 87.0 77.2
lumRangeFromJpeg_512_rvv_i32: 14.5 6.7
lumRangeToJpeg_8_c: 2.7 2.2
lumRangeToJpeg_8_rvv_i32: 1.0 1.0
lumRangeToJpeg_24_c: 7.2 6.5
lumRangeToJpeg_24_rvv_i32: 1.2 1.0
lumRangeToJpeg_128_c: 37.7 33.7
lumRangeToJpeg_128_rvv_i32: 3.7 2.0
lumRangeToJpeg_144_c: 42.5 37.7
lumRangeToJpeg_144_rvv_i32: 4.7 2.7
lumRangeToJpeg_256_c: 75.0 66.7
lumRangeToJpeg_256_rvv_i32: 7.5 3.5
lumRangeToJpeg_512_c: 149.5 133.0
lumRangeToJpeg_512_rvv_i32: 14.7 7.0
7 months ago
Rémi Denis-Courmont
7a3369398f
sws/input: R-V V 32-bit RGB to halved UV
...
T-Head C908:
abgr_to_uv_half_8_c: 2.2
abgr_to_uv_half_8_rvv_i32: 3.5
abgr_to_uv_half_128_c: 44.0
abgr_to_uv_half_128_rvv_i32: 13.0
abgr_to_uv_half_1080_c: 245.0
abgr_to_uv_half_1080_rvv_i32: 107.2
abgr_to_uv_half_1920_c: 406.2
abgr_to_uv_half_1920_rvv_i32: 188.7
bgra_to_uv_half_8_c: 2.2
bgra_to_uv_half_8_rvv_i32: 3.5
bgra_to_uv_half_128_c: 26.5
bgra_to_uv_half_128_rvv_i32: 13.0
bgra_to_uv_half_1080_c: 219.7
bgra_to_uv_half_1080_rvv_i32: 107.0
bgra_to_uv_half_1920_c: 406.7
bgra_to_uv_half_1920_rvv_i32: 188.7
SpacemiT X60:
abgr_to_uv_half_8_c: 2.2
abgr_to_uv_half_8_rvv_i32: 3.0
abgr_to_uv_half_128_c: 28.2
abgr_to_uv_half_128_rvv_i32: 5.7
abgr_to_uv_half_1080_c: 235.5
abgr_to_uv_half_1080_rvv_i32: 47.7
abgr_to_uv_half_1920_c: 418.2
abgr_to_uv_half_1920_rvv_i32: 84.0
bgra_to_uv_half_8_c: 2.0
bgra_to_uv_half_8_rvv_i32: 3.0
bgra_to_uv_half_128_c: 23.7
bgra_to_uv_half_128_rvv_i32: 5.7
bgra_to_uv_half_1080_c: 195.5
bgra_to_uv_half_1080_rvv_i32: 47.7
bgra_to_uv_half_1920_c: 346.5
bgra_to_uv_half_1920_rvv_i32: 84.0
7 months ago
Rémi Denis-Courmont
e2f069905e
sws/input: R-V V 32-bit RGB to UV
7 months ago
Rémi Denis-Courmont
f5555cb106
sws/input: R-V V 32-bit RGB to Y
...
T-Head C908:
abgr_to_y_8_c: 2.5
abgr_to_y_8_rvv_i32: 2.2
abgr_to_y_128_c: 37.0
abgr_to_y_128_rvv_i32: 8.5
abgr_to_y_1080_c: 327.0
abgr_to_y_1080_rvv_i32: 69.5
abgr_to_y_1920_c: 552.0
abgr_to_y_1920_rvv_i32: 122.2
bgra_to_y_8_c: 2.5
bgra_to_y_8_rvv_i32: 2.2
bgra_to_y_128_c: 37.2
bgra_to_y_128_rvv_i32: 8.5
bgra_to_y_1080_c: 310.2
bgra_to_y_1080_rvv_i32: 69.5
bgra_to_y_1920_c: 568.2
bgra_to_y_1920_rvv_i32: 122.5
SpacemiT X60:
abgr_to_y_8_c: 2.5
abgr_to_y_8_rvv_i32: 2.0
abgr_to_y_128_c: 33.0
abgr_to_y_128_rvv_i32: 3.7
abgr_to_y_1080_c: 276.0
abgr_to_y_1080_rvv_i32: 31.5
abgr_to_y_1920_c: 493.7
abgr_to_y_1920_rvv_i32: 55.5
bgra_to_y_8_c: 2.2
bgra_to_y_8_rvv_i32: 2.0
bgra_to_y_128_c: 33.0
bgra_to_y_128_rvv_i32: 3.7
bgra_to_y_1080_c: 276.0
bgra_to_y_1080_rvv_i32: 31.5
bgra_to_y_1920_c: 490.7
bgra_to_y_1920_rvv_i32: 55.5
7 months ago
Rémi Denis-Courmont
e0f4d185f1
sws/input: R-V V rgb24ToUV_half and bgr24ToUV_half
...
T-Head C908:
rgb24_to_uv_half_4_c: 2.0
rgb24_to_uv_half_4_rvv_i32: 3.5
rgb24_to_uv_half_64_c: 27.0
rgb24_to_uv_half_64_rvv_i32: 12.5
rgb24_to_uv_half_540_c: 223.7
rgb24_to_uv_half_540_rvv_i32: 105.2
rgb24_to_uv_half_640_c: 265.5
rgb24_to_uv_half_640_rvv_i32: 123.7
rgb24_to_uv_half_960_c: 414.5
rgb24_to_uv_half_960_rvv_i32: 249.5
SpacemiT X60:
rgb24_to_uv_half_4_c: 1.7
rgb24_to_uv_half_4_rvv_i32: 4.2
rgb24_to_uv_half_64_c: 24.0
rgb24_to_uv_half_64_rvv_i32: 8.7
rgb24_to_uv_half_540_c: 199.2
rgb24_to_uv_half_540_rvv_i32: 72.5
rgb24_to_uv_half_640_c: 235.7
rgb24_to_uv_half_640_rvv_i32: 85.2
rgb24_to_uv_half_960_c: 353.5
rgb24_to_uv_half_960_rvv_i32: 127.5
7 months ago
Rémi Denis-Courmont
3ef5867e4b
sws/input: R-V V rgb24ToUV and bgr24ToUV
...
T-Head C908:
rgb24_to_uv_8_c: 2.7
rgb24_to_uv_8_rvv_i32: 3.2
rgb24_to_uv_128_c: 41.0
rgb24_to_uv_128_rvv_i32: 12.7
rgb24_to_uv_1080_c: 342.5
rgb24_to_uv_1080_rvv_i32: 105.7
rgb24_to_uv_1280_c: 406.0
rgb24_to_uv_1280_rvv_i32: 124.2
rgb24_to_uv_1920_c: 626.0
rgb24_to_uv_1920_rvv_i32: 186.0
SpacemiT X60:
rgb24_to_uv_8_c: 2.5
rgb24_to_uv_8_rvv_i32: 3.0
rgb24_to_uv_128_c: 36.5
rgb24_to_uv_128_rvv_i32: 5.7
rgb24_to_uv_1080_c: 304.2
rgb24_to_uv_1080_rvv_i32: 49.0
rgb24_to_uv_1280_c: 360.5
rgb24_to_uv_1280_rvv_i32: 57.5
rgb24_to_uv_1920_c: 540.7
rgb24_to_uv_1920_rvv_i32: 86.2
7 months ago
Rémi Denis-Courmont
79dfdac4db
sws/input: R-V V rgb24ToY & bgr24ToY
...
T-Head C908:
rgb24_to_y_8_c: 2.0
rgb24_to_y_8_rvv_i32: 2.7
rgb24_to_y_128_c: 26.2
rgb24_to_y_128_rvv_i32: 9.2
rgb24_to_y_1080_c: 219.5
rgb24_to_y_1080_rvv_i32: 76.2
rgb24_to_y_1280_c: 276.2
rgb24_to_y_1280_rvv_i32: 89.7
rgb24_to_y_1920_c: 389.7
rgb24_to_y_1920_rvv_i32: 134.2
SpacemiT X60:
rgb24_to_y_8_c: 1.7
rgb24_to_y_8_rvv_i32: 2.2
rgb24_to_y_128_c: 23.2
rgb24_to_y_128_rvv_i32: 4.2
rgb24_to_y_1080_c: 195.0
rgb24_to_y_1080_rvv_i32: 33.7
rgb24_to_y_1280_c: 231.0
rgb24_to_y_1280_rvv_i32: 40.0
rgb24_to_y_1920_c: 346.2
rgb24_to_y_1920_rvv_i32: 59.7
7 months ago
Rémi Denis-Courmont
463c573e6b
lavc/huffyuvdsp: optimise RVV vtype for add_hfyu_left_pred_bgr32
...
T-Head C908:
add_hfyu_left_pred_bgr32_c: 237.5
add_hfyu_left_pred_bgr32_rvv_i32: 173.5 (before)
add_hfyu_left_pred_bgr32_rvv_i32: 110.0 (after)
7 months ago
Rémi Denis-Courmont
90a779bed6
lavc/huffyuvdsp: basic R-V V add_hfyu_left_pred_bgr32
...
Better performance can probably be achieved with a more intricate
unrolled loop, but this is a start:
add_hfyu_left_pred_bgr32_c: 15084.0
add_hfyu_left_pred_bgr32_rvv_i32: 10280.2
This would actually be cleaner with the RISC-V P extension, but that is
not ratified yet (I think?) and usually not supported if V is supported.
1 year ago
Rémi Denis-Courmont
424c8ceb08
lavc/huffyuvdsp: R-V V add_int16
...
add_int16_128_c: 2390.5
add_int16_128_rvv_i32: 832.0
add_int16_rnd_width_c: 2390.2
add_int16_rnd_width_rvv_i32: 832.5
1 year ago
Rémi Denis-Courmont
b6585eb04c
lavu: add/use flag for RISC-V Zba extension
...
The code was blindly assuming that Zbb or V implied Zba. While the
earlier is practically always true, the later broke some QEMU setups,
as V was introduced earlier than Zba.
1 year ago
Rémi Denis-Courmont
453aba71e6
lavc/vorbisdsp: RISC-V V inverse_coupling
...
This uses the following vectorisation:
for (i = 0; i < blocksize; i++) {
ang[i] = mag[i] - copysignf(fmaxf(ang[i], 0.f), mag[i]);
mag[i] = mag[i] - copysignf(fminf(ang[i], 0.f), mag[i]);
}
2 years ago
Rémi Denis-Courmont
c1bb19e263
lavu/fixeddsp: RISC-V V butterflies_fixed
2 years ago
Rémi Denis-Courmont
04d092e7d5
lavc/audiodsp: RISC-V F vector_clipf
...
RV64G supports MIN & MAX instructions natively only on floating point
registers, not general purpose ones. The later would require the Zbb
extension. Due to that, it is actually faster to perform the clipping
"properly" in FPU.
Benchmarks on SiFive U74-MC (courtesy of Shanghai StarFive Tech):
audiodsp.vector_clipf_c: 29551.5
audiodsp.vector_clipf_rvf: 17871.0
Also tried unrolling with 2 or 8 elements but it gets worse either way.
2 years ago
Diego Biurrun
9a9e2f1c8a
dsputil: Split audio operations off into a separate context
11 years ago
Ben Avison
9d8ecdd8ca
vc-1: Add platform-specific start code search routine to VC1DSPContext.
...
Initialise VC1DSPContext for parser as well as for decoder.
Note, the VC-1 code doesn't actually use the function pointer yet.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
11 years ago
Mason Carter
832e190632
vc1: arm: Add NEON assembly
...
For:
ff_vc1_inv_trans_{8,4}x{8,4}_{dc_,}neon
ff_put_pixels8x8_neon
ff_put_vc1_mspel_mc{0,1,2,3}{0,1,2,3}_neon (except for 00)
Based on ARM assembly code in libavcodec/arm by Rob Clark and Mans
Rullgard.
Signed-off-by: Martin Storsjö <martin@martin.st>
11 years ago
Diego Biurrun
73b704ac60
arm: Add some missing header #includes
12 years ago
Mans Rullgard
b692d246ea
vp8: arm: separate ARMv6 functions from NEON
...
This is a preparation for complete ARMv6 optimisations.
Signed-off-by: Mans Rullgard <mans@mansr.com>
13 years ago
Mans Rullgard
d526c5338d
ARM: allow runtime masking of CPU features
...
This allows masking CPU features with the -cpuflags avconv option
which is useful for testing different optimisations without rebuilding.
Signed-off-by: Mans Rullgard <mans@mansr.com>
13 years ago
Michael Niedermayer
c266eb1928
arm: Fix 10l typo
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Ronald S. Bultje
bd66f073fe
vp8: change int stride to ptrdiff_t stride.
...
On 64bit platforms with 32bit int, this means we won't have to sign-
extend the integer anymore.
13 years ago
Diego Biurrun
32f3c541bc
doxygen: Do not include license boilerplates in Doxygen comment blocks.
13 years ago
Ronald S. Bultje
a5dfeb612e
VP8: armv6 optimizations.
...
From 52.503s (~40fps) to 27.973sec (~80fps) decoding of 480p sintel
trailer, i.e. a ~2x speedup overall, on a Nexus S.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
13 years ago
Mans Rullgard
2912e87a6c
Replace FFmpeg with Libav in licence headers
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
14 years ago
Mans Rullgard
ef15d71c1f
VP8: ARM NEON optimisations for dsp functions
...
This adds NEON optimised versions of all functions in VP8DSPContext.
Based on initial work by Rob Clark.
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit a1c1d3c003
)
14 years ago
Mans Rullgard
a1c1d3c003
VP8: ARM NEON optimisations for dsp functions
...
This adds NEON optimised versions of all functions in VP8DSPContext.
Based on initial work by Rob Clark.
Signed-off-by: Mans Rullgard <mans@mansr.com>
14 years ago