James Almer
17c3cc5bb6
swscale/x86/rgb_2_rgb: add missing wrap to ff_uyvytoyuv422_avx2
...
Fixes old yasm.
Signed-off-by: James Almer <jamrial@gmail.com>
5 months ago
James Almer
03546f49a3
swscale/x86/rgb2rgb: add missing wrap for ff_uyvytoyuv422_avx2
...
Signed-off-by: James Almer <jamrial@gmail.com>
5 months ago
James Almer
287d139b77
checkasm/sw_rgb: fix alignment of buffers for rgb_to_yuv tests
...
src is apparently not guaranteed to be >8 byte aligned, but align to 16
nonetheless as the x86 asm will do unaligned loads anyway.
dst is guaranteed to be 32 byte aligned for the Y plane, but 16 byte for UV.
Signed-off-by: James Almer <jamrial@gmail.com>
5 months ago
James Almer
e8cef5e152
swscale/x86/rgb2rgb: remove mmxext version of shuffle_bytes_2103
...
Signed-off-by: James Almer <jamrial@gmail.com>
5 months ago
James Almer
c578bb9864
swscale/x86/input: add AVX2 optimized uyvytoyuv422
...
uyvytoyuv422_c: 23991.8
uyvytoyuv422_sse2: 2817.8
uyvytoyuv422_avx: 2819.3
uyvytoyuv422_avx2: 1972.3
Signed-off-by: James Almer <jamrial@gmail.com>
5 months ago
James Almer
e9cfd53257
swscale/x86/input: add AVX2 optimized RGB32 to YUV functions
...
abgr_to_uv_8_c: 43.3
abgr_to_uv_8_sse2: 14.3
abgr_to_uv_8_avx: 15.3
abgr_to_uv_8_avx2: 18.8
abgr_to_uv_128_c: 650.3
abgr_to_uv_128_sse2: 110.8
abgr_to_uv_128_avx: 112.3
abgr_to_uv_128_avx2: 64.8
abgr_to_uv_1080_c: 5456.3
abgr_to_uv_1080_sse2: 888.8
abgr_to_uv_1080_avx: 900.8
abgr_to_uv_1080_avx2: 518.3
abgr_to_uv_1920_c: 9692.3
abgr_to_uv_1920_sse2: 1593.8
abgr_to_uv_1920_avx: 1613.3
abgr_to_uv_1920_avx2: 864.8
abgr_to_y_8_c: 23.3
abgr_to_y_8_sse2: 12.8
abgr_to_y_8_avx: 13.3
abgr_to_y_8_avx2: 17.3
abgr_to_y_128_c: 308.3
abgr_to_y_128_sse2: 67.3
abgr_to_y_128_avx: 66.8
abgr_to_y_128_avx2: 44.8
abgr_to_y_1080_c: 2371.3
abgr_to_y_1080_sse2: 512.8
abgr_to_y_1080_avx: 505.8
abgr_to_y_1080_avx2: 314.3
abgr_to_y_1920_c: 4177.3
abgr_to_y_1920_sse2: 915.8
abgr_to_y_1920_avx: 926.8
abgr_to_y_1920_avx2: 519.3
bgra_to_uv_8_c: 37.3
bgra_to_uv_8_sse2: 13.3
bgra_to_uv_8_avx: 14.8
bgra_to_uv_8_avx2: 19.8
bgra_to_uv_128_c: 563.8
bgra_to_uv_128_sse2: 111.3
bgra_to_uv_128_avx: 112.3
bgra_to_uv_128_avx2: 64.8
bgra_to_uv_1080_c: 4691.8
bgra_to_uv_1080_sse2: 893.8
bgra_to_uv_1080_avx: 899.8
bgra_to_uv_1080_avx2: 517.8
bgra_to_uv_1920_c: 8332.8
bgra_to_uv_1920_sse2: 1590.8
bgra_to_uv_1920_avx: 1605.8
bgra_to_uv_1920_avx2: 867.3
bgra_to_y_8_c: 22.3
bgra_to_y_8_sse2: 12.8
bgra_to_y_8_avx: 12.8
bgra_to_y_8_avx2: 17.3
bgra_to_y_128_c: 291.3
bgra_to_y_128_sse2: 67.8
bgra_to_y_128_avx: 69.3
bgra_to_y_128_avx2: 45.3
bgra_to_y_1080_c: 2357.3
bgra_to_y_1080_sse2: 508.3
bgra_to_y_1080_avx: 518.3
bgra_to_y_1080_avx2: 399.8
bgra_to_y_1920_c: 4202.8
bgra_to_y_1920_sse2: 906.8
bgra_to_y_1920_avx: 907.3
bgra_to_y_1920_avx2: 526.3
Signed-off-by: James Almer <jamrial@gmail.com>
5 months ago
James Almer
d5fe99dc5f
swscale/x86/input: add AVX2 optimized RGB24 to YUV functions
...
rgb24_to_uv_8_c: 39.3
rgb24_to_uv_8_sse2: 14.3
rgb24_to_uv_8_ssse3: 13.3
rgb24_to_uv_8_avx: 12.8
rgb24_to_uv_8_avx2: 14.3
rgb24_to_uv_128_c: 582.8
rgb24_to_uv_128_sse2: 127.3
rgb24_to_uv_128_ssse3: 107.3
rgb24_to_uv_128_avx: 111.3
rgb24_to_uv_128_avx2: 62.3
rgb24_to_uv_1080_c: 4981.3
rgb24_to_uv_1080_sse2: 1048.3
rgb24_to_uv_1080_ssse3: 876.8
rgb24_to_uv_1080_avx: 887.8
rgb24_to_uv_1080_avx2: 492.3
rgb24_to_uv_1280_c: 5906.8
rgb24_to_uv_1280_sse2: 1263.3
rgb24_to_uv_1280_ssse3: 1048.3
rgb24_to_uv_1280_avx: 1045.8
rgb24_to_uv_1280_avx2: 579.8
rgb24_to_uv_1920_c: 8665.3
rgb24_to_uv_1920_sse2: 1888.8
rgb24_to_uv_1920_ssse3: 1571.8
rgb24_to_uv_1920_avx: 1558.8
rgb24_to_uv_1920_avx2: 869.3
rgb24_to_y_8_c: 20.3
rgb24_to_y_8_sse2: 11.8
rgb24_to_y_8_ssse3: 10.3
rgb24_to_y_8_avx: 10.3
rgb24_to_y_8_avx2: 10.8
rgb24_to_y_128_c: 284.8
rgb24_to_y_128_sse2: 83.3
rgb24_to_y_128_ssse3: 66.8
rgb24_to_y_128_avx: 64.8
rgb24_to_y_128_avx2: 39.3
rgb24_to_y_1080_c: 2451.3
rgb24_to_y_1080_sse2: 696.3
rgb24_to_y_1080_ssse3: 516.8
rgb24_to_y_1080_avx: 518.8
rgb24_to_y_1080_avx2: 301.8
rgb24_to_y_1280_c: 2892.8
rgb24_to_y_1280_sse2: 816.8
rgb24_to_y_1280_ssse3: 623.3
rgb24_to_y_1280_avx: 616.3
rgb24_to_y_1280_avx2: 350.8
rgb24_to_y_1920_c: 4338.8
rgb24_to_y_1920_sse2: 1210.8
rgb24_to_y_1920_ssse3: 928.3
rgb24_to_y_1920_avx: 920.3
rgb24_to_y_1920_avx2: 534.8
Signed-off-by: James Almer <jamrial@gmail.com>
5 months ago
James Almer
6743c2fc6a
checkasm/sw_rgb: test rgb32/rgb32_1 to yuv
...
Test all four pixel formats, but only bench the two native endian ones for a
given target.
Signed-off-by: James Almer <jamrial@gmail.com>
5 months ago
James Almer
91b9af0058
x86/aacencdsp: add AVX version of quantize_bands
...
quant_bands_signed_c: 1928.0
quant_bands_signed_sse2: 406.0
quant_bands_signed_avx: 207.0
quant_bands_unsigned_c: 1702.0
quant_bands_unsigned_sse2: 404.0
quant_bands_unsigned_avx: 209.0
Signed-off-by: James Almer <jamrial@gmail.com>
5 months ago
Rémi Denis-Courmont
7a3369398f
sws/input: R-V V 32-bit RGB to halved UV
...
T-Head C908:
abgr_to_uv_half_8_c: 2.2
abgr_to_uv_half_8_rvv_i32: 3.5
abgr_to_uv_half_128_c: 44.0
abgr_to_uv_half_128_rvv_i32: 13.0
abgr_to_uv_half_1080_c: 245.0
abgr_to_uv_half_1080_rvv_i32: 107.2
abgr_to_uv_half_1920_c: 406.2
abgr_to_uv_half_1920_rvv_i32: 188.7
bgra_to_uv_half_8_c: 2.2
bgra_to_uv_half_8_rvv_i32: 3.5
bgra_to_uv_half_128_c: 26.5
bgra_to_uv_half_128_rvv_i32: 13.0
bgra_to_uv_half_1080_c: 219.7
bgra_to_uv_half_1080_rvv_i32: 107.0
bgra_to_uv_half_1920_c: 406.7
bgra_to_uv_half_1920_rvv_i32: 188.7
SpacemiT X60:
abgr_to_uv_half_8_c: 2.2
abgr_to_uv_half_8_rvv_i32: 3.0
abgr_to_uv_half_128_c: 28.2
abgr_to_uv_half_128_rvv_i32: 5.7
abgr_to_uv_half_1080_c: 235.5
abgr_to_uv_half_1080_rvv_i32: 47.7
abgr_to_uv_half_1920_c: 418.2
abgr_to_uv_half_1920_rvv_i32: 84.0
bgra_to_uv_half_8_c: 2.0
bgra_to_uv_half_8_rvv_i32: 3.0
bgra_to_uv_half_128_c: 23.7
bgra_to_uv_half_128_rvv_i32: 5.7
bgra_to_uv_half_1080_c: 195.5
bgra_to_uv_half_1080_rvv_i32: 47.7
bgra_to_uv_half_1920_c: 346.5
bgra_to_uv_half_1920_rvv_i32: 84.0
5 months ago
Rémi Denis-Courmont
e2f069905e
sws/input: R-V V 32-bit RGB to UV
5 months ago
Rémi Denis-Courmont
f5555cb106
sws/input: R-V V 32-bit RGB to Y
...
T-Head C908:
abgr_to_y_8_c: 2.5
abgr_to_y_8_rvv_i32: 2.2
abgr_to_y_128_c: 37.0
abgr_to_y_128_rvv_i32: 8.5
abgr_to_y_1080_c: 327.0
abgr_to_y_1080_rvv_i32: 69.5
abgr_to_y_1920_c: 552.0
abgr_to_y_1920_rvv_i32: 122.2
bgra_to_y_8_c: 2.5
bgra_to_y_8_rvv_i32: 2.2
bgra_to_y_128_c: 37.2
bgra_to_y_128_rvv_i32: 8.5
bgra_to_y_1080_c: 310.2
bgra_to_y_1080_rvv_i32: 69.5
bgra_to_y_1920_c: 568.2
bgra_to_y_1920_rvv_i32: 122.5
SpacemiT X60:
abgr_to_y_8_c: 2.5
abgr_to_y_8_rvv_i32: 2.0
abgr_to_y_128_c: 33.0
abgr_to_y_128_rvv_i32: 3.7
abgr_to_y_1080_c: 276.0
abgr_to_y_1080_rvv_i32: 31.5
abgr_to_y_1920_c: 493.7
abgr_to_y_1920_rvv_i32: 55.5
bgra_to_y_8_c: 2.2
bgra_to_y_8_rvv_i32: 2.0
bgra_to_y_128_c: 33.0
bgra_to_y_128_rvv_i32: 3.7
bgra_to_y_1080_c: 276.0
bgra_to_y_1080_rvv_i32: 31.5
bgra_to_y_1920_c: 490.7
bgra_to_y_1920_rvv_i32: 55.5
5 months ago
Andreas Rheinhardt
8b62fb231a
swscale/x86/rgb2rgb: Detemplatize
...
Every function in rgb2rgb_template.c is only compiled exactly
once; there is no overlap at all between the MMXEXT and the
SSE2 functions, so detemplatize it.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
5 months ago
Andreas Rheinhardt
5421dee0e7
swscale/x86/rgb2rgb_template: Remove unused uyvytoyv12
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
5 months ago
Andreas Rheinhardt
c1c35380a7
swscale/x86/rgb2rgb: Don't unnecessarily check for inline ASM
...
The SSE2 and AVX versions of deinterleaveBytes are external ASM.
Move them out of the inline ASM template.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
5 months ago
Andreas Rheinhardt
f7305eb3b3
swscale/x86/rgb2rgb_template: Remove unnecessary SFENCE
...
The ff_nv12ToUV_* functions don't use non-temporal stores
at all.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
5 months ago
Andreas Rheinhardt
fca796ac3b
tests/checkasm/sw_rgb: Be more strict about clobbering MMX state
...
The MMXEXT versions of the rgb2rgb functions tested here
always emit emms on their own. Therefore one can use
a stricter test to ensure that it stays that way.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
5 months ago
Andreas Rheinhardt
3af6136669
avcodec/dnxhdenc: Simplify padding
...
It is unnecessary to first pad to 32bits; the memset later
will pad everything will with zeroes anyway.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
5 months ago
Andreas Rheinhardt
b0e0b3c58a
avcodec/dnxhdenc: Move PutBitContext from ctx to stack
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
5 months ago
Andreas Rheinhardt
542abee213
avcodec/cbs_h266_syntax_template: Use correct format specifier
...
H266RawSliceHeader.num_entry_points is an uint32_t.
Fixes -Wformat warnings:
https://fate.ffmpeg.org/log.cgi?slot=aarch64-osx-clang-1200.0.32.29&time=20240604151047&log=compile
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
5 months ago
Andreas Rheinhardt
8f199cfb5b
avformat/evc: Fix format specifiers
...
Fixes -Wformat warnings; see e.g.
https://fate.ffmpeg.org/log.cgi?slot=aarch64-osx-clang-1200.0.32.29&time=20240604151047&log=compile
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
5 months ago
Andreas Rheinhardt
5f31a4fd16
avformat/vvc: Don't use uint8_t iterators, fix shadowing
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
5 months ago
Andreas Rheinhardt
1c4362cce9
avformat/vvc: Fix comment
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
5 months ago
Andreas Rheinhardt
fa77dc8c44
avformat/vvc: Reindent after the previous commit
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
5 months ago
Andreas Rheinhardt
8b6c7e7cda
avformat/vvc: Fix crash on allocation failure, avoid allocations
...
This is the VVC version of 8b5d155301
.
(Hint: This ensures that the order of NALU arrays is OPI-VPS-SPS-PPS-
Prefix-SEI-Suffix-SEI, regardless of the order in the original
extradata. I hope this is right.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
5 months ago
Andreas Rheinhardt
4482b3353d
avformat/vvc: Don't use ff_copy_bits()
...
There is no benefit in using it: The fast path of copying
is not taken because of misalignment; furthermore we are
only dealing with a few byte here anyway, so simply copy
the bytes manually, avoiding the dependency on bitstream.c
in lavf (which also contains a function that is completely
unused in lavf).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
5 months ago
Andreas Rheinhardt
52fb49a8a3
avformat/vvc: Use put_bytes_output()
...
The PutBitContext has just been flushed.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
5 months ago
Andreas Rheinhardt
dd8fb0aaae
avcodec/hevc/Makefile: Move rules for lavc/* files to lavc/Makefile
...
If any of these files (say A) would be changed in such a way
that A acquires a new dependency on another file B, building B
would need to be added to all the rules that lead to A being built.
Yet currently the rules for several files are spread over
the lavc Makefile and the Makefile of the lavc/hevc subdir, making
it more likely to be forgotten. So move the rules for these files
to the lavc/Makefile.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
5 months ago
Rémi Denis-Courmont
daac101e61
lavc/aacencdsp: fix rounding in R-V V quantize_bands
...
We need to round toward zero here.
5 months ago
Rémi Denis-Courmont
658439934b
lavc/vp8dsp: R-V V vp8_idct_add
...
T-Head C908 (cycles):
vp8_idct_add_c: 312.2
vp8_idct_add_rvv_i32: 117.0
5 months ago
Rémi Denis-Courmont
e0f4d185f1
sws/input: R-V V rgb24ToUV_half and bgr24ToUV_half
...
T-Head C908:
rgb24_to_uv_half_4_c: 2.0
rgb24_to_uv_half_4_rvv_i32: 3.5
rgb24_to_uv_half_64_c: 27.0
rgb24_to_uv_half_64_rvv_i32: 12.5
rgb24_to_uv_half_540_c: 223.7
rgb24_to_uv_half_540_rvv_i32: 105.2
rgb24_to_uv_half_640_c: 265.5
rgb24_to_uv_half_640_rvv_i32: 123.7
rgb24_to_uv_half_960_c: 414.5
rgb24_to_uv_half_960_rvv_i32: 249.5
SpacemiT X60:
rgb24_to_uv_half_4_c: 1.7
rgb24_to_uv_half_4_rvv_i32: 4.2
rgb24_to_uv_half_64_c: 24.0
rgb24_to_uv_half_64_rvv_i32: 8.7
rgb24_to_uv_half_540_c: 199.2
rgb24_to_uv_half_540_rvv_i32: 72.5
rgb24_to_uv_half_640_c: 235.7
rgb24_to_uv_half_640_rvv_i32: 85.2
rgb24_to_uv_half_960_c: 353.5
rgb24_to_uv_half_960_rvv_i32: 127.5
5 months ago
Rémi Denis-Courmont
3ef5867e4b
sws/input: R-V V rgb24ToUV and bgr24ToUV
...
T-Head C908:
rgb24_to_uv_8_c: 2.7
rgb24_to_uv_8_rvv_i32: 3.2
rgb24_to_uv_128_c: 41.0
rgb24_to_uv_128_rvv_i32: 12.7
rgb24_to_uv_1080_c: 342.5
rgb24_to_uv_1080_rvv_i32: 105.7
rgb24_to_uv_1280_c: 406.0
rgb24_to_uv_1280_rvv_i32: 124.2
rgb24_to_uv_1920_c: 626.0
rgb24_to_uv_1920_rvv_i32: 186.0
SpacemiT X60:
rgb24_to_uv_8_c: 2.5
rgb24_to_uv_8_rvv_i32: 3.0
rgb24_to_uv_128_c: 36.5
rgb24_to_uv_128_rvv_i32: 5.7
rgb24_to_uv_1080_c: 304.2
rgb24_to_uv_1080_rvv_i32: 49.0
rgb24_to_uv_1280_c: 360.5
rgb24_to_uv_1280_rvv_i32: 57.5
rgb24_to_uv_1920_c: 540.7
rgb24_to_uv_1920_rvv_i32: 86.2
5 months ago
Rémi Denis-Courmont
79dfdac4db
sws/input: R-V V rgb24ToY & bgr24ToY
...
T-Head C908:
rgb24_to_y_8_c: 2.0
rgb24_to_y_8_rvv_i32: 2.7
rgb24_to_y_128_c: 26.2
rgb24_to_y_128_rvv_i32: 9.2
rgb24_to_y_1080_c: 219.5
rgb24_to_y_1080_rvv_i32: 76.2
rgb24_to_y_1280_c: 276.2
rgb24_to_y_1280_rvv_i32: 89.7
rgb24_to_y_1920_c: 389.7
rgb24_to_y_1920_rvv_i32: 134.2
SpacemiT X60:
rgb24_to_y_8_c: 1.7
rgb24_to_y_8_rvv_i32: 2.2
rgb24_to_y_128_c: 23.2
rgb24_to_y_128_rvv_i32: 4.2
rgb24_to_y_1080_c: 195.0
rgb24_to_y_1080_rvv_i32: 33.7
rgb24_to_y_1280_c: 231.0
rgb24_to_y_1280_rvv_i32: 40.0
rgb24_to_y_1920_c: 346.2
rgb24_to_y_1920_rvv_i32: 59.7
5 months ago
Wenbin Chen
7560db937d
libavfi/dnn: enable LibTorch xpu device option support
...
Add xpu device support to libtorch backend.
To enable xpu support you need to add
"-Wl,--no-as-needed -lintel-ext-pt-gpu -Wl,--as-needed" to
"--extra-libs" when configure ffmpeg.
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
5 months ago
Nuo Mi
f68f40736f
avcodec/vvcdec: support mv wraparound
...
A 360 video specific tool
see https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9503377
passed files:
DMVR_A_Huawei_3.bit
WRAP_D_InterDigital_4.bit
WRAP_A_InterDigital_4.bit
WRAP_B_InterDigital_4.bit
WRAP_C_InterDigital_4.bit
ERP_A_MediaTek_3.bit
5 months ago
Nuo Mi
685174069f
avcodec/vvcdec: misc, reindent inter.c
5 months ago
Nuo Mi
a4013e748a
avcodec/vvcdec: refact out emulated_edge_no_wrap
...
prepare for refrence wraparound
5 months ago
Nuo Mi
8abdf0a28e
avcodec/vvcdec: misc, move src offset inside emulated_edge
5 months ago
Nuo Mi
2d98786fee
avcodec/vvcdec: refact, remove emulated_edge_dmvr and emulated_edge_bilinear to simplify code
5 months ago
Lynne
714596bcbf
aacdec_usac: zero out alpha values for the current frame
5 months ago
Lynne
c2d459cb51
aacdec_usac: fix stereo alpha values for transients
...
Typo.
Also added comments and fixed the branch underneath.
5 months ago
Lynne
7223523335
aacdec_usac: use correct TNS values
...
The standard slightly modified the maximum TNS bands allowed.
5 months ago
Lynne
9b41cc0430
aacdec_usac: do not round noise amplitude values
...
Use floating point division instead of integer division.
5 months ago
Lynne
a18d0659f4
aacdec_usac: skip coeff decoding if the number to be decoded is 0
...
Yet another thing not mentioned in the spec.
5 months ago
Lynne
1ad9a4008b
aacdec_usac: decouple TNS active from TNS data present flag
...
The issue was that in case of common TNS parameters, TNS was
entirely skipped, as tns.present was set to 0.
5 months ago
Lynne
c0fdb0cdfd
aacdec_usac: do not continue parsing bitstream on core_mode == 1
...
Although LPD is not functional yet, the bitstream ends at that point.
5 months ago
Lynne
8ecaa64b9b
aacdec_usac: respect tns_on_lr flag
...
This was left out, and due to av_unused, forgotten about.
5 months ago
Lynne
25b848a0bd
aacdec_usac: correctly set and use the layout map
5 months ago
Lynne
ae495b56ff
aacdec_usac: remove fallback for custom maps with invalid position
...
Not needed as every possible index is mapped.
5 months ago
Lynne
91ab17e2fe
aacdec_usac: tag LFE channels as such in the channel map
...
Missed.
5 months ago