James Almer
cd04ebe033
swscale/output: add V30X output support
...
Signed-off-by: James Almer <jamrial@gmail.com>
3 months ago
James Almer
57db8e0571
swscale/output: add VYU444 output support
...
Signed-off-by: James Almer <jamrial@gmail.com>
3 months ago
James Almer
eac9af382a
swscale/output: add UYVA output support
...
Signed-off-by: James Almer <jamrial@gmail.com>
3 months ago
James Almer
6cd52c1080
swscale/output: add AYUV output support
...
Signed-off-by: James Almer <jamrial@gmail.com>
3 months ago
James Almer
5f1bf3cd65
swscale/output: add missing yuv2packed1 and yuv2packed2 support for VUY{X,A}
...
Signed-off-by: James Almer <jamrial@gmail.com>
3 months ago
Martin Storsjö
77e6293735
arm: Consistently use proper interworking function returns
...
Use "bx lr", or "pop {lr}", which do proper mode switching
between thumb and arm modes. A plain "mov pc, lr" does not switch
from thumb mode to arm mode (while in arm mode, it does switch
mode for a thumb caller).
This is normally not an issue, as CONFIG_THUMB only is enabled if
the C compiler defaults to thumb; but stick to patterns that can
do mode switching if needed, for consistency.
Signed-off-by: Martin Storsjö <martin@martin.st>
3 months ago
Niklas Haas
ec9985b54f
swscale/internal: constify and expose ff_swscale()
...
Used as an intermediate entry point for the new swscale context. The extra
constification is a consistency measure, as I want to move the memcpy of
stride and plane pointers to the functions that actually need to mutate them.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
3 months ago
Niklas Haas
403a20b2e6
swscale/rgb2xyz: expose these functions internally
...
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
3 months ago
Niklas Haas
775de8c19d
swscale/rgb2xyz: follow convention on image pointers and strides
...
Instead of taking an int16_t pointer and a stride in halfwords, follow the
usual convention of treating all planes and strides as byte-addressed.
This does not have any immediate effect but makes these functions more
reusable without unintended "gotchas".
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
3 months ago
Niklas Haas
9d8f5141cf
swscale/rgb2xyz: add explicit width parameter
...
This fixes an 11-year-old bug in the rgb2xyz functions, when used with a
negative stride. The current loop bounds turned it into a no-op.
Additionally, this increases performance on highly cropped images, whose
stride may be substantially higher than the effective width.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
3 months ago
Niklas Haas
ea228fc415
swscale/rgb2xyz: minor style fixes
...
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
3 months ago
James Almer
04612351ab
swscale/input: add V30X input support
...
Signed-off-by: James Almer <jamrial@gmail.com>
3 months ago
James Almer
ea05edc9e0
swscale/input: add VYU444 input support
...
Signed-off-by: James Almer <jamrial@gmail.com>
3 months ago
James Almer
ec7f5e314d
swscale/input: add UYVA input support
...
Signed-off-by: James Almer <jamrial@gmail.com>
3 months ago
James Almer
bb37d3c33e
swscale/input: add AYUV input support
...
Signed-off-by: James Almer <jamrial@gmail.com>
3 months ago
jinbo
e6ecc1e757
swscale: Fix conflicting types for loongarch
...
Build breaks after c1a0e65763
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
3 months ago
Niklas Haas
477445722c
swscale/ppc: fix altivec build failure
...
Fixes: c1a0e65763
3 months ago
Martin Storsjö
b9145fcab2
swscale: Fix aarch64 and i386 compilation failures
...
This unbreaks builds after c1a0e65763
,
which broke with errors like
src/libswscale/aarch64/rgb2rgb.c:66:25: error: incompatible function pointer types assigning to 'void (*)(const uint8_t *, uint8_t *, uint8_t *, uint8_t *, int, int, int, int, int, const int32_t *)' (aka 'void (*)(const unsigned char *, unsigned char *, unsigned char *, unsigned char *, int, int, int, int, int, const int *)') from 'void (const uint8_t *, uint8_t *, uint8_t *, uint8_t *, int, int, int, int, int, int32_t *)' (aka 'void (const unsigned char *, unsigned char *, unsigned char *, unsigned char *, int, int, int, int, int, int *)') [-Wincompatible-function-pointer-types]
66 | ff_rgb24toyv12 = rgb24toyv12;
| ^ ~~~~~~~~~~~
and
src/libswscale/aarch64/swscale_unscaled.c:213:29: error: incompatible function pointer types assigning to 'SwsFunc' (aka 'int (*)(struct SwsContext *, const unsigned char *const *, const int *, int, int, unsigned char *const *, const int *)') from 'int (SwsContext *, const uint8_t *const *, const int *, int, int, const uint8_t **, const int *)' (aka 'int (struct SwsContext *, const unsigned char *const *, const int *, int, int, const unsigned char **, const int *)') [-Wincompatible-function-pointer-types]
213 | c->convert_unscaled = nv24_to_yuv420p_neon_wrapper;
| ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Martin Storsjö <martin@martin.st>
3 months ago
Niklas Haas
73b3344edd
swscale/input: parametrize ff_sws_init_input_funcs() pointers
...
Following the precedent set by ff_sws_init_output_funcs().
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
3 months ago
Niklas Haas
20b350b284
swscale/internal: add typedefs for input reading functions
...
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
3 months ago
Niklas Haas
b90d522d2c
swscale/internal: forward typedef SwsContext
...
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
3 months ago
Niklas Haas
c1a0e65763
swscale/internal: constify SwsFunc
...
I want to move away from having random leaf processing functions mutate
plane pointers, and while we're at it, we might as well make the strides
and tables const as well.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
3 months ago
Niklas Haas
286bdc9cdc
swscale/internal: turn cascaded_tmp into an array
...
Slightly more convenient to access from the new wrapping code.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
3 months ago
Niklas Haas
61369484f6
swscale/internal: expose ff_update_palette() internally
...
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
3 months ago
Niklas Haas
aee19ee431
swscale/internal: rename NB_SWS_DITHER for consistency
...
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
3 months ago
Niklas Haas
41ce370b65
tests/swscale: fix minor typos
...
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
3 months ago
Michael Niedermayer
38e224c2ba
*/version.h: bump after release/7.1 branch
...
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
4 months ago
Michael Niedermayer
e1094ac45d
*/version.h: bump minor versions for release/7.1
...
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
4 months ago
Zhao Zhili
e18b46d95f
swscale/aarch64: Fix rgb24toyv12 only works with aligned width
...
Since c0666d8b
, rgb24toyv12 is broken for width non-aligned to 16.
Add a simple wrapper to handle the non-aligned part.
Co-authored-by: johzzy <hellojinqiang@gmail.com>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
4 months ago
Michael Niedermayer
bd80c97391
swscale/output: Fix undefined integer overflow in yuv2rgba64_2_c_template()
...
Fixes: signed integer overflow: -1082982400 + -1083218484 cannot be represented in type 'int'
Fixes: 70657/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-6707819712675840
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
4 months ago
Michael Niedermayer
44c5641ae8
swscale/swscale: Use unsigned operation to avoid undefined behavior
...
I have not checked that the constant is correct, this just fixes the undefined behavior
Fixes: signed integer overflow: -646656 * 3517 cannot be represented in type 'int
Fixes: 70559/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-5209368631508992
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
4 months ago
Ramiro Polla
c0666d8bed
swscale/aarch64/rgb2rgb: add neon implementation for rgb24toyv12
...
A55 A76
rgb24toyv12_16_200_c: 36890.6 17275.5
rgb24toyv12_16_200_neon: 12460.1 ( 2.96x) 5360.8 ( 3.22x)
rgb24toyv12_128_60_c: 83205.1 39884.8
rgb24toyv12_128_60_neon: 27468.4 ( 3.03x) 13552.5 ( 2.94x)
rgb24toyv12_512_16_c: 88111.6 42346.8
rgb24toyv12_512_16_neon: 29126.6 ( 3.03x) 14411.2 ( 2.94x)
rgb24toyv12_1920_4_c: 82068.1 39620.0
rgb24toyv12_1920_4_neon: 27011.6 ( 3.04x) 13492.2 ( 2.94x)
5 months ago
Ramiro Polla
caaec2ea95
swscale/x86/rgb2rgb: disable rgb24toyv12_mmxext for x86_64
...
The mmxext implementation is slower than the C version in x86_64.
m32 m64
rgb24toyv12_16_200_c: 24942.7 14812.6
rgb24toyv12_16_200_mmxext: 17857.2 ( 1.40x) 17400.4 ( 0.85x)
rgb24toyv12_128_60_c: 56892.9 35616.9
rgb24toyv12_128_60_mmxext: 40730.9 ( 1.40x) 39610.4 ( 0.90x)
rgb24toyv12_512_16_c: 58402.7 37209.4
rgb24toyv12_512_16_mmxext: 44842.4 ( 1.30x) 41136.2 ( 0.90x)
rgb24toyv12_1920_4_c: 54827.4 34737.4
rgb24toyv12_1920_4_mmxext: 51169.9 ( 1.07x) 34818.9 ( 1.00x)
5 months ago
Ramiro Polla
3604b2403c
swscale/rgb2rgb: improve chroma conversion in ff_rgb24toyv12_c
...
The current code subsamples by dropping 3/4 pixels to calculate the
chroma components. This commit calculates the average of 4 rgb pixels
before calculating the chroma components, putting it in line with the
mmxext implementation.
5 months ago
Ramiro Polla
d8848325a6
swscale/aarch64/rgb2rgb: add deinterleaveBytes neon implementation
...
A55 A76
deinterleave_bytes_c: 70342.0 34497.5
deinterleave_bytes_neon: 21594.5 ( 3.26x) 5535.2 ( 6.23x)
deinterleave_bytes_aligned_c: 71340.8 34651.2
deinterleave_bytes_aligned_neon: 8616.8 ( 8.28x) 3996.2 ( 8.67x)
5 months ago
Ramiro Polla
4c824ad391
swscale/x86/rgb2rgb: fix deinterleaveBytes writing past the end of the buffers
5 months ago
Ramiro Polla
f17a6bd200
swscale/x86/rgb2rgb: fix deinterleaveBytes for unaligned dst pointers
5 months ago
Rémi Denis-Courmont
27d28b68da
swscale/rgb2rgb: enable R-V V deinterleaveBytes
...
T-Head C908:
deinterleave_bytes_c: 100328.3 ( 1.00x)
deinterleave_bytes_rvv_i32: 19331.3 ( 5.19x)
deinterleave_bytes_aligned_c: 100337.5 ( 1.00x)
deinterleave_bytes_aligned_rvv_i32: 15748.0 ( 6.37x)
SpacemiT X60:
deinterleave_bytes_c: 95230.6 ( 1.00x)
deinterleave_bytes_rvv_i32: 9790.3 ( 9.73x)
deinterleave_bytes_aligned_c: 96564.1 ( 1.00x)
deinterleave_bytes_aligned_rvv_i32: 7780.1 (12.41x)
5 months ago
Ramiro Polla
420d443600
swscale/aarch64: cosmetics fix (spaces inside curly braces)
5 months ago
Ramiro Polla
52887683e9
swscale/aarch64: add nv24/nv42 to yuv420p unscaled converter
...
A55 A76
nv24_yuv420p_128_c: 4956.1 1267.0
nv24_yuv420p_128_neon: 3109.1 ( 1.59x) 640.0 ( 1.98x)
nv24_yuv420p_1920_c: 35728.4 11736.2
nv24_yuv420p_1920_neon: 8011.1 ( 4.46x) 2436.0 ( 4.82x)
nv42_yuv420p_128_c: 4956.4 1270.5
nv42_yuv420p_128_neon: 3074.6 ( 1.61x) 639.5 ( 1.99x)
nv42_yuv420p_1920_c: 35685.9 11732.5
nv42_yuv420p_1920_neon: 7995.1 ( 4.46x) 2437.2 ( 4.81x)
5 months ago
Ramiro Polla
88a563ad18
swscale: export ff_copyPlane so it may be used by simd code
5 months ago
Ramiro Polla
4eb5594295
swscale: add nv24/nv42 to yuv420p unscaled converter
5 months ago
Martin Storsjö
cfe0a36352
libswscale: aarch64: Fix the indentation of some macro invocations
...
Signed-off-by: Martin Storsjö <martin@martin.st>
5 months ago
Martin Storsjö
507c2a5774
libswscale: arm: Don't assume aligned output in yuv2rgb functions
...
This fixes failures in recently added checkasm tests.
While the buffers in most cases are aligned, libswscale in general
can't assume the output to be aligned.
Signed-off-by: Martin Storsjö <martin@martin.st>
5 months ago
Ramiro Polla
181cd260db
swscale/aarch64/yuv2rgb: add neon yuv42{0,2}p -> gbrp unscaled colorspace converters
...
checkasm --bench on a Raspberry Pi 5 Model B Rev 1.0:
yuv420p_gbrp_128_c: 1243.0
yuv420p_gbrp_128_neon: 453.5
yuv420p_gbrp_1920_c: 18165.5
yuv420p_gbrp_1920_neon: 6700.0
yuv422p_gbrp_128_c: 1463.5
yuv422p_gbrp_128_neon: 471.5
yuv422p_gbrp_1920_c: 21343.7
yuv422p_gbrp_1920_neon: 6743.5
5 months ago
Ramiro Polla
8744764a4c
swscale/x86/yuv2rgb: add ssse3 yuv42{0,2}p -> gbrp unscaled colorspace converters
...
Note: this implementation is limited to x86_64 due to general purpose
register pressure.
checkasm --bench on an Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz:
yuv420p_gbrp_8_c: 118.5
yuv420p_gbrp_8_ssse3: 93.3
yuv420p_gbrp_128_c: 1068.3
yuv420p_gbrp_128_ssse3: 319.3
yuv420p_gbrp_1080_c: 8841.8
yuv420p_gbrp_1080_ssse3: 2211.8
yuv420p_gbrp_1920_c: 15903.8
yuv420p_gbrp_1920_ssse3: 3814.3
yuv422p_gbrp_8_c: 144.8
yuv422p_gbrp_8_ssse3: 93.8
yuv422p_gbrp_128_c: 1395.8
yuv422p_gbrp_128_ssse3: 313.0
yuv422p_gbrp_1080_c: 11551.5
yuv422p_gbrp_1080_ssse3: 2240.8
yuv422p_gbrp_1920_c: 20585.3
yuv422p_gbrp_1920_ssse3: 5249.5
yuva420p_gbrp_8_c: 117.5
yuva420p_gbrp_8_ssse3: 92.0
yuva420p_gbrp_128_c: 1593.0
yuva420p_gbrp_128_ssse3: 319.3
yuva420p_gbrp_1080_c: 8694.5
yuva420p_gbrp_1080_ssse3: 2186.0
yuva420p_gbrp_1920_c: 15946.5
yuva420p_gbrp_1920_ssse3: 3805.3
5 months ago
Ramiro Polla
4545205a26
swscale/yuv2rgb: add yuv42{0,2}p -> gbrp unscaled colorspace converters
5 months ago
Ramiro Polla
af5adf57e3
swscale/yuv2rgb: prepare YUV2RGBFUNC macro for multi-planar rgb
...
This will be used in the upcoming yuv42{0,2}p -> gbrp unscaled
colorspace converters.
There is no difference in performance.
5 months ago
Ramiro Polla
24063e7827
swscale/yuv2rgb: prepare LOADCHROMA/PUTFUNC macros for multi-planar rgb
...
This will be used in the upcoming yuv42{0,2}p -> gbrp unscaled
colorspace converters.
There is no difference in performance.
5 months ago
Niklas Haas
6b40be941a
swscale/options: relax src/dst_h/v_chr_pos value range
...
When dealing with 4x subsampling ratios (log2 == 2), such as can arise
with 4:1:1 or 4:1:0, a value range of 512 is not enough to cover the
range of possible scenarios.
For example, bottom-sited chroma in 4:1:0 would require an offset of 768
(three luma rows). Simply double the limit to 1024. I don't see any
place in initFilter() that would experience overflow as a result of this
change, especially since get_local_pos() right-shifts it by the
subsampling ratio again.
5 months ago