FFmpeg

Commit Graph

Author	SHA1	Message	Date
xufuji456	cc86343b96	lavc/hevcdsp_qpel_neon: using movi.16b instead of movi.2d Building iOS platform with arm64, the compiler has a warning: "instruction movi.2d with immediate #0 may not function correctly on this CPU, converting to movi.16b" Signed-off-by: xufuji456 <839789740@qq.com> Signed-off-by: Martin Storsjö <martin@martin.st>	12 months ago
Rémi Denis-Courmont	6d60cc7baf	sws/rgb2rgb: fix unaligned accesses in R-V V YUYV to I422p In my personal opinion, we should not need to support unaligned YUY2 pixel maps. They should always be aligned to at least 32 bits, and the current code assumes just 16 bits. However checkasm does test for unaligned input bitmaps. QEMU accepts it, but real hardware dose not. In this particular case, we can at the same time improve performance and handle unaligned inputs, so do just that. uyvytoyuv422_c: 104379.0 uyvytoyuv422_c: 104060.0 uyvytoyuv422_rvv_i32: 25284.0 (before) uyvytoyuv422_rvv_i32: 19303.2 (after)	1 year ago
Rémi Denis-Courmont	5b8b5ec9c5	sws/rgb2rgb: rework R-V V YUY2 to 4:2:2 planar This saves three scratch registers and three instructions per line. The performance gains are mostly negligible. The main point is to free up registers for further rework.	1 year ago
Niklas Haas	736284e7b9	swscale/yuv2rgb: fix sws_getCoefficients for colorspace=0 The documentation states that invalid entries default to SWS_CS_DEFAULT. A value of 0 is not a valid SWS_CS_*, yet the code incorrectly hard-codes it to BT.709 coefficients instead of SWS_CS_DEFAULT.	1 year ago
Niklas Haas	d043e5c54c	swscale: don't omit ff_sws_init_range_convert for high-bit This was a complete hack seemingly designed to work around a different bug, which was fixed in the previous commit. As such, there is no more reason not to do this, as it simply breaks changing color range in sws_setColorspaceDetails for no reason.	1 year ago
Niklas Haas	cedf589c09	swscale: fix sws_setColorspaceDetails after sws_init_context More commonly, this fixes the case of sws_setColorspaceDetails after sws_getContext, since the latter implies sws_init_context. The problem here is that sws_init_context sets up the range conversion and fast path tables based on the values of srcRange/dstRange at init time. This may result in locking in a "wrong" path (either using unscaled fast path when range conversion later required, or using scaled slow path when range conversion becomes no longer required). There are two way outs: 1. Always initialize range conversion and unscaled converters, even if they will be unused, and extend the runtime check. 2. Re-do initialization if the values change after sws_setColorspaceDetails. I opted for approach 1 because it was simpler and easier to reason about. Reword the av_log message to make it clear that this special converter is not necessarily used, depending on whether or not there is range conversion or YUV matrix conversion going on.	1 year ago
Michael Niedermayer	47e784f881	Bump versions after 6.1 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	1 year ago
Michael Niedermayer	9d3a7d30c4	Bump versions prior to 6.1 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	1 year ago
Martin Storsjö	a76b409dd0	aarch64: Reindent all assembly to 8/24 column indentation libavcodec/aarch64/vc1dsp_neon.S is skipped here, as it intentionally uses a layered indentation style to visually show how different unrolled/interleaved phases fit together. Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Martin Storsjö	93cda5a9c2	aarch64: Lowercase UXTW/SXTW and similar flags Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Martin Storsjö	184103b310	aarch64: Consistently use lowercase for vector element specifiers Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Rémi Denis-Courmont	19baf4e009	swscale/rgb2rgb: R-V V deinterleaveBytes	1 year ago
Rémi Denis-Courmont	ede3215115	swscale/rgb2rgb: fix extra iteration in R-V V interleave There was an additional iteration doing nothing for each line, due to checking the selected vector length instead of the available vector length.	1 year ago
Rémi Denis-Courmont	d14130aea3	swscale/rgb2rgb: unroll R-V V interleave_bytes	1 year ago
Rémi Denis-Courmont	6269c4a440	swscale/rgb2rgb: unroll RISC-V V uyvytoyuv422	1 year ago
Rémi Denis-Courmont	e50f8e861b	swscale/rgb2rgb: avoid S-regs in RISC-V V uyvytoyuv422 We can make do with callee-clobbered registers only now. As an added bonus, this makes the code XLEN-independent.	1 year ago
Rémi Denis-Courmont	be37a2e364	swscale/rgb2rgb: rework RISC-V V uyvytoyuv422 This avoids using relatively slow register strides.	1 year ago
Rémi Denis-Courmont	1a4bd76ea5	swscale/rgb2rgb: remove R-V V shuffle_bytes_3012 This is slower than the Zbb version on real hardware due to register strides. Proper support for vector byte-swap requires the Zvbb extension, but it's much too early for me to worry about it.	1 year ago
Rémi Denis-Courmont	c4a144c29d	swscale/rgb2rgb: add R-V Zbb shuffle_bytes_3210	1 year ago
Paul B Mahol	29b673bdcf	swscale: add GBRAP14 format support	1 year ago
Andreas Rheinhardt	f8503b4c33	avutil/internal: Don't auto-include emms.h Instead include emms.h wherever it is needed. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	1 year ago
L. E. Segovia	ddc1cd5cdd	configure: Set WIN32_LEAN_AND_MEAN at configure time Including winsock2.h or windows.h without WIN32_LEAN_AND_MEAN cause bzlib.h to parse as nonsense, due to an instance of #define char small in rpcndr.h. See: https://stackoverflow.com/a/27794577 Signed-off-by: L. E. Segovia <amy@amyspark.me> Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Rémi Denis-Courmont	c2b38619c0	swscale/rgb2rgb2: rework RISC-V V shuffle_bytes_{1230,3012} This avoids strided loads. Before: shuffle_bytes_1230_rvv_i32: 308.7 shuffle_bytes_3012_rvv_i32: 308.7 After: shuffle_bytes_1230_rvv_i32: 46.7 shuffle_bytes_3012_rvv_i32: 46.7	1 year ago
Rémi Denis-Courmont	15982554e6	swscale/rgb2rgb2: rework RISC-V V shuffle_bytes_{0321,2103} This avoids strided loads. Before: shuffle_bytes_0321_rvv_i32: 307.7 shuffle_bytes_2103_rvv_i32: 308.7 After: shuffle_bytes_0321_rvv_i32: 59.7 shuffle_bytes_2103_rvv_i32: 61.5	1 year ago
Rémi Denis-Courmont	d3948e4db5	swscale: inline ff_shuffle_bytes_3210_rvv No functional changes.	1 year ago
Rémi Denis-Courmont	b6585eb04c	lavu: add/use flag for RISC-V Zba extension The code was blindly assuming that Zbb or V implied Zba. While the earlier is practically always true, the later broke some QEMU setups, as V was introduced earlier than Zba.	1 year ago
Khem Raj	a7b3c0203f	libswscale/riscv: fix syntax of vsetvli Add missing operand which clang complains about but GCC assumes it to be 'm1' if not specified. Works around build failure with Clang: \| src/libswscale/riscv/rgb2rgb_rvv.S:88:25: error: operand must be e[8\|16\|32\|64\|128\|256\|512\|1024],m[1\|2\|4\|8\|f2\|f4\|f8],[ta\|tu],[ma\|mu] \| vsetvli t4, t3, e8, ta, ma \| ^ Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>	1 year ago
Lynne	b3fb73af6b	swscale: bump minor for implementing support for the new pixfmts	1 year ago
Lynne	934525eae0	lsws: add in/out support for the new 12-bit 2-plane 422 and 444 pixfmts	1 year ago
Jin Bo	cb4ae8baee	swscale/la: Add following builtin optimized functions yuv420_rgb24_lsx yuv420_bgr24_lsx yuv420_rgba32_lsx yuv420_argb32_lsx yuv420_bgra32_lsx yuv420_abgr32_lsx ./configure --disable-lasx ffmpeg -i ~/media/1_h264_1080p_30fps_3Mbps.mp4 -f rawvideo -pix_fmt rgb24 -y /dev/null -an before: 184fps after: 207fps Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2 years ago
Lu Wang	4501b1dfd7	swscale/la: Optimize the functions of the swscale series with lsx. ./configure --disable-lasx ffmpeg -i ~/media/1_h264_1080p_30fps_3Mbps.mp4 -f rawvideo -s 640x480 -pix_fmt bgra -y /dev/null -an before: 91fps after: 160fps Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2 years ago
Lynne	a62a3930c2	swscale/ppc: remove hScale8To19_vsx Fails checkasm on a Power9 system.	2 years ago
Michael Niedermayer	47ac3e6065	version.h: Bump minor post 6.0 branch Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2 years ago
Michael Niedermayer	62efa096af	version.h: Bump minor for 6.0 branch Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2 years ago
James Almer	5bad485603	Bump major versions of all libraries Signed-off-by: James Almer <jamrial@gmail.com>	2 years ago
Tomas Härdin	a678b0c252	sws/utils.c: Do not uselessly call initFilter() when unscaling	2 years ago
Lynne	bbe95f7353	x86: replace explicit REP_RETs with RETs From x86inc: > On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either > a branch or a branch target. So switch to a 2-byte form of ret in that case. > We can automatically detect "follows a branch", but not a branch target. > (SSSE3 is a sufficient condition to know that your cpu doesn't have this problem.) x86inc can automatically determine whether to use REP_RET rather than REP in most of these cases, so impact is minimal. Additionally, a few REP_RETs were used unnecessary, despite the return being nowhere near a branch. The only CPUs affected were AMD K10s, made between 2007 and 2011, 16 years ago and 12 years ago, respectively. In the future, everyone involved with x86inc should consider dropping REP_RETs altogether.	2 years ago
Andreas Rheinhardt	1ff9c07fa6	swscale/utils: Fix indentation Forgotten after `c1eb3e7fec`. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Andreas Rheinhardt	b2d1a25816	swscale/utils: Derive range from YUVJ-pix-fmt only once Currently, it is done once per slice-thread, leading to one warning per slice-thread in case a YUVJ pixel format has been originally used. This also fixes the anomaly that said parameter are only updated for the user-facing context (whose values are retrievable via av_opt_get()) if slice-threading is not in use. Fixes ticket #9860. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Andreas Rheinhardt	ff39dcb129	swscale/utils: Move functions to avoid forward declarations Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Andreas Rheinhardt	baccc1c541	swscale/utils: Avoid calling ff_thread_once() unnecessarily Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Andreas Rheinhardt	8ee0711228	swscale/utils: Don't allocate AVFrames for slice contexts Only the parent context's AVFrames are ever used. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Andreas Rheinhardt	64ed1d40df	swscale/utils: Factor initializing single slice context out Initializing slice threads currently uses the function (sws_init_context()) that is also used for initializing user-facing contexts with the only difference being that nb_threads is set to one before initializing the slice contexts. Yet sws_init_context() also initializes lots of stuff that is not slice-dependent, i.e. (src\|dst)Range. This currently only works because the code sets these fields to the same values for all slice contexts. This is not nice; even worse, it entails that log messages are printed once per slice context (and therefore fill the screen). This commit lays the groundwork to fix this. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Michael Niedermayer	ba209e3d51	swscale/input: Use more unsigned intermediates Same principle as previous commit, with sufficiently huge rgb2yuv table values this produces wrong results and undefined behavior. The unsigned produces the same incorrect results. That is probably ok as these cases with huge values seem not to occur in any real use case. Fixes: signed integer overflow Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2 years ago
Jeremy Dorfman	ce566281f9	swscale/input: Use unsigned intermediates in rgb64ToUV_c_template Large rgb2yuv tables and high pixel values cause the intermediate int32_t of rur + gug + bu*b to exceed INT_MAX, which is undefined behavior. This causes libswscale built with LLVM -fsanitize=undefined to assert. Using unsigned integers instead has defined behavior and produces identical results, and makes rgb64ToUV_c_template match rgb64ToY_c_template. Fixes: signed integer overflow Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2 years ago
Andreas Rheinhardt	b616b04704	swscale/utils: Remove obsolete 3DNow reference swscale does not use 3DNow any more since commit `608319a311`. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Michael Niedermayer	b74f89caae	swscale/output: Bias 16bps output calculations to improve non overflowing range for GBRP16/GBRPF32 Fixes: integer overflow Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2 years ago
Michael Niedermayer	0f0afc7fb5	swscale/output: Bias 16bps output calculations to improve non overflowing range Fixes: integer overflow Fixes: ./ffmpeg -f rawvideo -video_size 66x64 -pixel_format yuva420p10le -i ~/videos/overflow_input_w66h64.yuva420p10le -filter_complex "scale=flags=bicubic+full_chroma_int+full_chroma_inp+bitexact+accurate_rnd:in_color_matrix=bt2020:out_color_matrix=bt2020:in_range=full:out_range=full,format=rgba64[out]" -pixel_format rgba64 -map '[out]' -y overflow_w66h64.png Found-by: Drew Dunne <asdunne@google.com> Tested-by: Drew Dunne <asdunne@google.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2 years ago
Hubert Mazur	2537fdc510	sw_scale: Add specializations for hscale 16 to 19 Provide arm64 neon optimized implementations for hscale16To19 with filter sizes 4, 8 and X4. The tests and benchmarks run on AWS Graviton 2 instances. The results from a checkasm tool are shown below. hscale_16_to_19__fs_4_dstW_512_c: 6216.0 hscale_16_to_19__fs_4_dstW_512_neon: 2257.0 hscale_16_to_19__fs_8_dstW_512_c: 10417.7 hscale_16_to_19__fs_8_dstW_512_neon: 3112.5 hscale_16_to_19__fs_12_dstW_512_c: 14890.5 hscale_16_to_19__fs_12_dstW_512_neon: 3899.0 hscale_16_to_19__fs_16_dstW_512_c: 19006.5 hscale_16_to_19__fs_16_dstW_512_neon: 5341.2 hscale_16_to_19__fs_32_dstW_512_c: 36629.5 hscale_16_to_19__fs_32_dstW_512_neon: 9502.7 hscale_16_to_19__fs_40_dstW_512_c: 45477.5 hscale_16_to_19__fs_40_dstW_512_neon: 11552.0 (Note, the checkasm tests for these functions haven't been merged since they fail on x86.) Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Hubert Mazur	9ccf8c5bfc	sw_scale: Add specializations for hscale 16 to 15 Add arm64 neon implementations for hscale 16 to 15 with filter sizes 4, 8 and X4. The tests and benchmarks run on AWS Graviton 2 instances. The results from a checkasm tool are shown below. hscale_16_to_15__fs_4_dstW_512_c: 6703.5 hscale_16_to_15__fs_4_dstW_512_neon: 2298.0 hscale_16_to_15__fs_8_dstW_512_c: 10983.0 hscale_16_to_15__fs_8_dstW_512_neon: 3216.5 hscale_16_to_15__fs_12_dstW_512_c: 15526.0 hscale_16_to_15__fs_12_dstW_512_neon: 3993.0 hscale_16_to_15__fs_16_dstW_512_c: 20183.5 hscale_16_to_15__fs_16_dstW_512_neon: 5369.7 hscale_16_to_15__fs_32_dstW_512_c: 39315.2 hscale_16_to_15__fs_32_dstW_512_neon: 9511.2 hscale_16_to_15__fs_40_dstW_512_c: 48995.7 hscale_16_to_15__fs_40_dstW_512_neon: 11570.0 (Note, the checkasm tests for these functions haven't been merged since they fail on x86.) Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago

1 2 3 4 5 ...

2503 Commits (a30adf9f96254f4870066c98a6dbf13fc74515a3)