FFmpeg

Commit Graph

Author	SHA1	Message	Date
Michael Niedermayer	ba209e3d51	swscale/input: Use more unsigned intermediates Same principle as previous commit, with sufficiently huge rgb2yuv table values this produces wrong results and undefined behavior. The unsigned produces the same incorrect results. That is probably ok as these cases with huge values seem not to occur in any real use case. Fixes: signed integer overflow Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2 years ago
Jeremy Dorfman	ce566281f9	swscale/input: Use unsigned intermediates in rgb64ToUV_c_template Large rgb2yuv tables and high pixel values cause the intermediate int32_t of rur + gug + bu*b to exceed INT_MAX, which is undefined behavior. This causes libswscale built with LLVM -fsanitize=undefined to assert. Using unsigned integers instead has defined behavior and produces identical results, and makes rgb64ToUV_c_template match rgb64ToY_c_template. Fixes: signed integer overflow Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2 years ago
Andreas Rheinhardt	b616b04704	swscale/utils: Remove obsolete 3DNow reference swscale does not use 3DNow any more since commit `608319a311`. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Michael Niedermayer	b74f89caae	swscale/output: Bias 16bps output calculations to improve non overflowing range for GBRP16/GBRPF32 Fixes: integer overflow Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2 years ago
Michael Niedermayer	0f0afc7fb5	swscale/output: Bias 16bps output calculations to improve non overflowing range Fixes: integer overflow Fixes: ./ffmpeg -f rawvideo -video_size 66x64 -pixel_format yuva420p10le -i ~/videos/overflow_input_w66h64.yuva420p10le -filter_complex "scale=flags=bicubic+full_chroma_int+full_chroma_inp+bitexact+accurate_rnd:in_color_matrix=bt2020:out_color_matrix=bt2020:in_range=full:out_range=full,format=rgba64[out]" -pixel_format rgba64 -map '[out]' -y overflow_w66h64.png Found-by: Drew Dunne <asdunne@google.com> Tested-by: Drew Dunne <asdunne@google.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2 years ago
Hubert Mazur	2537fdc510	sw_scale: Add specializations for hscale 16 to 19 Provide arm64 neon optimized implementations for hscale16To19 with filter sizes 4, 8 and X4. The tests and benchmarks run on AWS Graviton 2 instances. The results from a checkasm tool are shown below. hscale_16_to_19__fs_4_dstW_512_c: 6216.0 hscale_16_to_19__fs_4_dstW_512_neon: 2257.0 hscale_16_to_19__fs_8_dstW_512_c: 10417.7 hscale_16_to_19__fs_8_dstW_512_neon: 3112.5 hscale_16_to_19__fs_12_dstW_512_c: 14890.5 hscale_16_to_19__fs_12_dstW_512_neon: 3899.0 hscale_16_to_19__fs_16_dstW_512_c: 19006.5 hscale_16_to_19__fs_16_dstW_512_neon: 5341.2 hscale_16_to_19__fs_32_dstW_512_c: 36629.5 hscale_16_to_19__fs_32_dstW_512_neon: 9502.7 hscale_16_to_19__fs_40_dstW_512_c: 45477.5 hscale_16_to_19__fs_40_dstW_512_neon: 11552.0 (Note, the checkasm tests for these functions haven't been merged since they fail on x86.) Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Hubert Mazur	9ccf8c5bfc	sw_scale: Add specializations for hscale 16 to 15 Add arm64 neon implementations for hscale 16 to 15 with filter sizes 4, 8 and X4. The tests and benchmarks run on AWS Graviton 2 instances. The results from a checkasm tool are shown below. hscale_16_to_15__fs_4_dstW_512_c: 6703.5 hscale_16_to_15__fs_4_dstW_512_neon: 2298.0 hscale_16_to_15__fs_8_dstW_512_c: 10983.0 hscale_16_to_15__fs_8_dstW_512_neon: 3216.5 hscale_16_to_15__fs_12_dstW_512_c: 15526.0 hscale_16_to_15__fs_12_dstW_512_neon: 3993.0 hscale_16_to_15__fs_16_dstW_512_c: 20183.5 hscale_16_to_15__fs_16_dstW_512_neon: 5369.7 hscale_16_to_15__fs_32_dstW_512_c: 39315.2 hscale_16_to_15__fs_32_dstW_512_neon: 9511.2 hscale_16_to_15__fs_40_dstW_512_c: 48995.7 hscale_16_to_15__fs_40_dstW_512_neon: 11570.0 (Note, the checkasm tests for these functions haven't been merged since they fail on x86.) Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Hubert Mazur	1e9cfa5bb0	sw_scale: Add specializations for hscale 8 to 19 Add arm64 neon implementations for hscale 8 to 19 with filter sizes 4, 4X and 8. Both implementations are based on very similar ones dedicated to hscale 8 to 15. The major changes refer to saving the data - instead of writing the result as int16_t it is done with int32_t. These functions are heavily inspired on patches provided by J. Swinney and M. Storsjö for hscale8to15 which were slightly adapted for hscale8to19. The tests and benchmarks run on AWS Graviton 2 instances. The results from a checkasm tool shown below. hscale_8_to_19__fs_4_dstW_512_c: 5663.2 hscale_8_to_19__fs_4_dstW_512_neon: 1259.7 hscale_8_to_19__fs_8_dstW_512_c: 9306.0 hscale_8_to_19__fs_8_dstW_512_neon: 2020.2 hscale_8_to_19__fs_12_dstW_512_c: 12932.7 hscale_8_to_19__fs_12_dstW_512_neon: 2462.5 hscale_8_to_19__fs_16_dstW_512_c: 16844.2 hscale_8_to_19__fs_16_dstW_512_neon: 4671.2 hscale_8_to_19__fs_32_dstW_512_c: 32803.7 hscale_8_to_19__fs_32_dstW_512_neon: 5474.2 hscale_8_to_19__fs_40_dstW_512_c: 40948.0 hscale_8_to_19__fs_40_dstW_512_neon: 6669.7 Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Martin Storsjö	cb803a0072	swscale: aarch64: Fix yuv2rgb with negative strides Treat the 32 bit stride registers as signed. Alternatively, we could make the stride arguments ptrdiff_t instead of int, and changing all of the assembly to operate on these registers with their full 64 bit width, but that's a much larger and more intrusive change (and risks missing some operation, which would clamp the intermediates to 32 bit still). Fixes: https://trac.ffmpeg.org/ticket/9985 Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Marvin Scholz	4aa04c255d	swscale: document some missing arguments	2 years ago
Marvin Scholz	aba8cf654f	swscale: Fix bogus doxy comment #ifdefs The intention here was probably to document this as use of conditionals does not make sense in a comment. Fixes doxy warning: warning: explicit link request to 'if' could not be resolved	2 years ago
Chema Gonzalez	bf64a75c5a	libswscale: force a minimum size of the slide for bayer sources Bayer sources are read in groups of 2 lines (e.g. for a BGGR flavor, the first row contains only B and G samples, while the second row contains only G and R samples). They need to be read as a whole. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2 years ago
Rémi Denis-Courmont	a1bfb5290e	sws/rgb2rgb: RISC-V 64-bit V packed YUYV/UYVY to planar 4:2:2 This is currently 64-bit only because the stack spilling code would not assemble on RV32I (and it would corrupt s0 and s1 on RV128I, in theory). This could be added later in the unlikely that someone wants it.	2 years ago
Rémi Denis-Courmont	9181835a24	sws/rgb2rgb: RISC-V V interleaveBytes	2 years ago
Rémi Denis-Courmont	66a03f4053	sws/rgb2rgb: RISC-V V shuffle_bytes_xxxx functions	2 years ago
Andreas Rheinhardt	888a02a126	swscale/output: Don't call av_pix_fmt_desc_get() in a loop Up until now, libswscale/output.c used a macro to write an output pixel which involved a call to av_pix_fmt_desc_get() to find out whether the input pixel format is BE or LE despite this being known at compile-time (there are templates per pixfmt). Even worse, these calls are made in a loop, so that e.g. there are eight calls to av_pix_fmt_desc_get() for every pixel processed in yuv2rgba64_X_c_template() for 64bit RGB formats. This commit modifies these macros to ensure that isBE() is evaluated at compile-time. This saved 41184B of .text for me (GCC 11.2, -O3). Of course, it also improved performance. E.g. ffmpeg_g -f lavfi -i testsrc2,format=yuva420p -pix_fmt rgba64le \ -threads 1 -t 1:00 -f null - (which uses yuv2rgba64le_X_c, which is an invocation of yuv2rgba64_X_c_template() mentioned above), performance improved from 95589 to 41387 decicycles for one call to yuv2packedX; for the be variant the numbers went down from 76087 to 43024 decicycles. Reviewed-by: Anton Khirnov <anton@khirnov.net> Reviewed-by: Paul B Mahol <onemda@gmail.com> Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Andreas Rheinhardt	4d7a1a4619	swscale/input: Avoid calls to av_pix_fmt_desc_get() Up until now, libswscale/input.c used a macro to read an input pixel which involved a call to av_pix_fmt_desc_get() to find out whether the input pixel format is BE or LE despite this being known at compile-time (there are templates per pixfmt). Even worse, these calls are made in a loop, so that e.g. there are six calls to av_pix_fmt_desc_get() for every pair of UV pixel processed in rgb64ToUV_half_c_template(). This commit modifies these macros to ensure that isBE() is evaluated at compile-time. This saved 9743B of .text for me (GCC 11.2, -O3). For a simple RGB64LE->YUV420P transformation like ffmpeg -f lavfi -i haldclutsrc,format=rgba64le -pix_fmt yuv420p \ -threads 1 -t 1:00 -f null - the amount of decicycles spent in rgb64LEToUV_half_c (which is created via the template mentioned above) decreases from 19751 to 5341; for RGBA64BE the number went down from 11945 to 5393. For shared builds (where the call to av_pix_fmt_desc_get() is indirect) the old numbers are 15230 for RGBA64BE and 27502 for RGBA64LE, whereas the numbers with this patch are indistinguishable from the numbers from a static build. Also make the macros that are touched conform to the usual convention of using uppercase names while just at it. Reviewed-by: Anton Khirnov <anton@khirnov.net> Reviewed-by: Paul B Mahol <onemda@gmail.com> Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Hao Chen	925ac0da32	swscale/la: Add output_lasx.c file. ffmpeg -i 1_h264_1080p_30fps_3Mbps.mp4 -f rawvideo -s 640x480 -pix_fmt rgb24 -y /dev/null -an before: 150fps after: 183fps Signed-off-by: Hao Chen <chenhao@loongson.cn> Reviewed-by: yinshiyou-hf@loongson.cn Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2 years ago
Hao Chen	74d09b068d	swscale/la: Add yuv2rgb_lasx.c and rgb2rgb_lasx.c files ffmpeg -i 1_h264_1080p_30fps_3Mbps.mp4 -f rawvideo -pix_fmt rgb24 -y /dev/null -an before: 178fps after: 210fps Signed-off-by: Hao Chen <chenhao@loongson.cn> Reviewed-by: yinshiyou-hf@loongson.cn Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2 years ago
Hao Chen	38cacce22a	swscale/la: Optimize hscale functions with lasx. ffmpeg -i 1_h264_1080p_30fps_3Mbps.mp4 -f rawvideo -s 640x480 -y /dev/null -an before: 101fps after: 138fps Signed-off-by: Hao Chen <chenhao@loongson.cn> Reviewed-by: yinshiyou-hf@loongson.cn Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2 years ago
Philip Langdale	09a8e5debb	swscale/output: add support for Y210LE and Y212LE	2 years ago
Philip Langdale	68181623e9	swscale/output: add support for XV30LE	2 years ago
Philip Langdale	366f073c62	swscale/output: add support for XV36LE	2 years ago
Philip Langdale	caf8d4d256	swscale/output: add support for P012 This generalises the existing P010 support.	2 years ago
Andreas Rheinhardt	d2428d80ce	swscale/input: Remove spec-incompliant ';' These macros are definitions, not only declarations and therefore should not contain a semicolon. Such a semicolon is actually spec-incompliant, but compilers happen to accept them. Reviewed-by: Philip Langdale <philipl@overt.org> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Philip Langdale	4a59eba227	swscale/input: add support for Y212LE	2 years ago
Philip Langdale	198b5b90d5	swscale/input: add support for XV30LE	2 years ago
Philip Langdale	5bdd726115	swscale/input: add support for P012 As we now have three of these formats, I added macros to generate the conversion functions.	2 years ago
Philip Langdale	8d9462844a	swscale/input: add support for XV36LE	2 years ago
Philip Langdale	45726aa117	libswscale: add support for VUYX format As we already have support for VUYA, I figured I should do the small amount of work to support VUYX as well. That means a little refactoring to share code.	2 years ago
Andreas Rheinhardt	de33506e4b	swscale/x86/rgb_2_rgb: Empty MMX state in ff_shuffle_bytes_2103_mmxext Fixes FATE-failures with the the filter-2xbr filter-3xbr filter-4xbr filter-ep2x filter-ep3x filter-hq2x filter-hq3x filter-hq4x filter-paletteuse-bayer filter-paletteuse-bayer0 filter-paletteuse-nodither and filter-paletteuse-sierra2_4a tests when using 32bit x86 with CPUFLAGS ranging from "mmx+mmxext" to "mmx+mmxext+sse+sse2+sse3" (the relevant function is only overwritten when using SSSE3). Reviewed-by: Lynne <dev@lynne.ee> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Timo Rothenpieler	aca569aad2	swscale/input: add rgbaf16 input support This is by no means perfect, since at least ddagrab will return scRGB data with values outside of 0.0f to 1.0f for HDR values. Its primary purpose is to be able to work with the format at all.	2 years ago
Timo Rothenpieler	f2de911818	swscale: add opaque parameter to input functions	2 years ago
Andreas Rheinhardt	8bec225c3c	swscale/x86/yuv2yuvX: Remove unused ff_yuv2yuvX_mmx() Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Alan Kelly	a38293e444	libswscale: Enable hscale_avx2 for all input sizes. ff_shuffle_filter_coefficients shuffles the tail as required. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2 years ago
Alan Kelly	a6724285fd	sws: allow avx2 hscale to process inputs of any size. The main loop processes blocks of 16 pixels. The tail processes blocks of size 4. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2 years ago
Alan Kelly	51a34e8525	sws: Replace call to yuv2yuvX_mmx by yuv2yuvX_mmxext Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Swinney, Jonathan	0d7caa5b09	swscale/aarch64: add vscale specializations This commit adds new code paths for vscale when filterSize is 2, 4, or 8. By using specialized code with unrolling to match the filterSize we can improve performance. On AWS c7g (Graviton 3, Neoverse V1) instances: before after yuv2yuvX_2_0_512_accurate_neon: 558.8 268.9 yuv2yuvX_4_0_512_accurate_neon: 637.5 434.9 yuv2yuvX_8_0_512_accurate_neon: 1144.8 806.2 yuv2yuvX_16_0_512_accurate_neon: 2080.5 1853.7 Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Swinney, Jonathan	3e708722a2	swscale/aarch64: vscale optimization Use scalar times vector multiply accumlate instructions instead of vector times vector to remove the need for replicating load instructions which are slightly slower. On AWS c7g (Graviton 3, Neoverse V1) instances: yuv2yuvX_8_0_512_accurate_neon: 1144.8 987.4 yuv2yuvX_16_0_512_accurate_neon: 2080.5 1869.4 Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Swinney, Jonathan	4dcd191a50	checkasm: updated tests for sw_scale Change the reference to exactly match the C reference in swscale, instead of exactly matching the x86 SIMD implementations (which differs slightly). Test with and without SWS_ACCURATE_RND - if this flag isn't set, the output must match the C reference exactly, otherwise it is allowed to be off by 2. Mark a couple x86 functions as unavailable when SWS_ACCURATE_RND is set - apparently this discrepancy hasn't been noticed in other exact tests before. Add a test for yuv2plane1. Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Swinney, Jonathan	75ffca7eef	libswscale/aarch64: add another hscale specialization This specialization handles the case where filtersize is 4 mod 8, e.g. 12, 20, etc. Aarch64 was previously using the c function for this case. This implementation speeds up that case significantly. hscale_8_to_15__fs_12_dstW_512_c: 6234.1 hscale_8_to_15__fs_12_dstW_512_neon: 1505.6 Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Timo Rothenpieler	b77fff47d0	configure: always enable gnu_windres if available Use the appropiate Makefile variable to ensure the resource file is only built into shared libraries instead.	2 years ago
James Almer	68e017c487	swscale/output: fix reading chroma values when generating vuya output Signed-off-by: James Almer <jamrial@gmail.com>	2 years ago
James Almer	1974813261	swscale/output: add VUYA output support Signed-off-by: James Almer <jamrial@gmail.com>	2 years ago
James Almer	f0abd07996	swscale/input: add VUYA input support Reviewed-by: Philip Langdale <philipl@overt.org> Signed-off-by: James Almer <jamrial@gmail.com>	2 years ago
Andreas Rheinhardt	da668fa7d2	swscale/rgb2rgb: Don't cast const away Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Matthieu Bouron	0a6bb7da55	swscale: add NV16 input/output Signed-off-by: Anton Khirnov <anton@khirnov.net>	2 years ago
Michael Niedermayer	fd26b07e8b	Bump versions after 5.1 branch Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2 years ago
Michael Niedermayer	6f1b144358	Bump Versions for 5.1 branch Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2 years ago
Andreas Rheinhardt	81d3472031	swscale/x86/swscale: Simplify macro This is possible now that it is no longer used by MMX. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago

1 2 3 4 5 ...

2460 Commits (bda06c60fed8e45481ca2dad040ed9d2f521ca15)