FFmpeg

Commit Graph

Author	SHA1	Message	Date
Niklas Haas	67adb30322	swscale: rename SwsContext to SwsInternal And preserve the public SwsContext as separate name. The motivation here is that I want to turn SwsContext into a public struct, while keeping the internal implementation hidden. Additionally, I also want to be able to use multiple internal implementations, e.g. for GPU devices. This commit does not include any functional changes. For the most part, it is a simple rename. The only complications arise from the public facing API functions, which preserve their current type (and hence require an additional unwrapping step internally), and the checkasm test framework, which directly accesses SwsInternal. For consistency, the affected functions that need to maintain a distionction have generally been changed to refer to the SwsContext as sws, and the SwsInternal as c. In an upcoming commit, I will provide a backing definition for the public SwsContext, and update `sws_internal()` to dereference the internal struct instead of merely casting it. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	3 months ago
Lynne	bbe95f7353	x86: replace explicit REP_RETs with RETs From x86inc: > On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either > a branch or a branch target. So switch to a 2-byte form of ret in that case. > We can automatically detect "follows a branch", but not a branch target. > (SSSE3 is a sufficient condition to know that your cpu doesn't have this problem.) x86inc can automatically determine whether to use REP_RET rather than REP in most of these cases, so impact is minimal. Additionally, a few REP_RETs were used unnecessary, despite the return being nowhere near a branch. The only CPUs affected were AMD K10s, made between 2007 and 2011, 16 years ago and 12 years ago, respectively. In the future, everyone involved with x86inc should consider dropping REP_RETs altogether.	2 years ago
Michael Niedermayer	b74f89caae	swscale/output: Bias 16bps output calculations to improve non overflowing range for GBRP16/GBRPF32 Fixes: integer overflow Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2 years ago
Andreas Rheinhardt	a05f22eaf3	swscale/x86/swscale: Remove obsolete and harmful MMX(EXT) functions x64 always has MMX, MMXEXT, SSE and SSE2 and this means that some functions for MMX, MMXEXT, SSE and 3dnow are always overridden by other functions (unless one e.g. explicitly disables SSE2). So given that the only systems that benefit from these functions are truely ancient 32bit x86s they are removed. Moreover, some of the removed code was buggy/not bitexact and lead to failures involving the f32le and f32be versions of gray, gbrp and gbrap on x86-32 when SSE2 was not disabled. See e.g. https://fate.ffmpeg.org/report.cgi?time=20220609221253&slot=x86_32-debian-kfreebsd-gcc-4.4-cpuflags-mmx Notice that yuv2yuvX_mmx is not removed, because it is used by SSE3 and AVX2 as fallback in case of unaligned data and also for tail processing. I don't know why yuv2yuvX_mmxext isn't being used for this; an earlier version [1] of `554c2bc708` used it, but the version that was eventually applied does not. [1]: https://ffmpeg.org/pipermail/ffmpeg-devel/2020-November/272124.html Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	3 years ago
Mark Reid	9e445a5be2	swscale/x86/output.asm: add x86-optimized planer gbr yuv2anyX functions changes since v2: * fixed label changes since v1: * remove vex intruction on sse4 path * some load/pack marcos use less intructions * fixed some typos yuv2gbrp_full_X_4_512_c: 12757.6 yuv2gbrp_full_X_4_512_sse2: 8946.6 yuv2gbrp_full_X_4_512_sse4: 5138.6 yuv2gbrp_full_X_4_512_avx2: 3889.6 yuv2gbrap_full_X_4_512_c: 15368.6 yuv2gbrap_full_X_4_512_sse2: 11916.1 yuv2gbrap_full_X_4_512_sse4: 6294.6 yuv2gbrap_full_X_4_512_avx2: 3477.1 yuv2gbrp9be_full_X_4_512_c: 14381.6 yuv2gbrp9be_full_X_4_512_sse2: 9139.1 yuv2gbrp9be_full_X_4_512_sse4: 5150.1 yuv2gbrp9be_full_X_4_512_avx2: 2834.6 yuv2gbrp9le_full_X_4_512_c: 12990.1 yuv2gbrp9le_full_X_4_512_sse2: 9118.1 yuv2gbrp9le_full_X_4_512_sse4: 5132.1 yuv2gbrp9le_full_X_4_512_avx2: 2833.1 yuv2gbrp10be_full_X_4_512_c: 14401.6 yuv2gbrp10be_full_X_4_512_sse2: 9133.1 yuv2gbrp10be_full_X_4_512_sse4: 5126.1 yuv2gbrp10be_full_X_4_512_avx2: 2837.6 yuv2gbrp10le_full_X_4_512_c: 12718.1 yuv2gbrp10le_full_X_4_512_sse2: 9106.1 yuv2gbrp10le_full_X_4_512_sse4: 5120.1 yuv2gbrp10le_full_X_4_512_avx2: 2826.1 yuv2gbrap10be_full_X_4_512_c: 18535.6 yuv2gbrap10be_full_X_4_512_sse2: 33617.6 yuv2gbrap10be_full_X_4_512_sse4: 6264.1 yuv2gbrap10be_full_X_4_512_avx2: 3422.1 yuv2gbrap10le_full_X_4_512_c: 16724.1 yuv2gbrap10le_full_X_4_512_sse2: 11787.1 yuv2gbrap10le_full_X_4_512_sse4: 6282.1 yuv2gbrap10le_full_X_4_512_avx2: 3441.6 yuv2gbrp12be_full_X_4_512_c: 13723.6 yuv2gbrp12be_full_X_4_512_sse2: 9128.1 yuv2gbrp12be_full_X_4_512_sse4: 7997.6 yuv2gbrp12be_full_X_4_512_avx2: 2844.1 yuv2gbrp12le_full_X_4_512_c: 12257.1 yuv2gbrp12le_full_X_4_512_sse2: 9107.6 yuv2gbrp12le_full_X_4_512_sse4: 5142.6 yuv2gbrp12le_full_X_4_512_avx2: 2837.6 yuv2gbrap12be_full_X_4_512_c: 18511.1 yuv2gbrap12be_full_X_4_512_sse2: 12156.6 yuv2gbrap12be_full_X_4_512_sse4: 6251.1 yuv2gbrap12be_full_X_4_512_avx2: 3444.6 yuv2gbrap12le_full_X_4_512_c: 16687.1 yuv2gbrap12le_full_X_4_512_sse2: 11785.1 yuv2gbrap12le_full_X_4_512_sse4: 6243.6 yuv2gbrap12le_full_X_4_512_avx2: 3446.1 yuv2gbrp14be_full_X_4_512_c: 13690.6 yuv2gbrp14be_full_X_4_512_sse2: 9120.6 yuv2gbrp14be_full_X_4_512_sse4: 5138.1 yuv2gbrp14be_full_X_4_512_avx2: 2843.1 yuv2gbrp14le_full_X_4_512_c: 14995.6 yuv2gbrp14le_full_X_4_512_sse2: 9119.1 yuv2gbrp14le_full_X_4_512_sse4: 5126.1 yuv2gbrp14le_full_X_4_512_avx2: 2843.1 yuv2gbrp16be_full_X_4_512_c: 12367.1 yuv2gbrp16be_full_X_4_512_sse2: 8233.6 yuv2gbrp16be_full_X_4_512_sse4: 4820.1 yuv2gbrp16be_full_X_4_512_avx2: 2666.6 yuv2gbrp16le_full_X_4_512_c: 10904.1 yuv2gbrp16le_full_X_4_512_sse2: 8214.1 yuv2gbrp16le_full_X_4_512_sse4: 4824.1 yuv2gbrp16le_full_X_4_512_avx2: 2629.1 yuv2gbrap16be_full_X_4_512_c: 26569.6 yuv2gbrap16be_full_X_4_512_sse2: 10884.1 yuv2gbrap16be_full_X_4_512_sse4: 5488.1 yuv2gbrap16be_full_X_4_512_avx2: 3272.1 yuv2gbrap16le_full_X_4_512_c: 14010.1 yuv2gbrap16le_full_X_4_512_sse2: 10562.1 yuv2gbrap16le_full_X_4_512_sse4: 5463.6 yuv2gbrap16le_full_X_4_512_avx2: 3255.1 yuv2gbrpf32be_full_X_4_512_c: 14524.1 yuv2gbrpf32be_full_X_4_512_sse2: 8552.6 yuv2gbrpf32be_full_X_4_512_sse4: 4636.1 yuv2gbrpf32be_full_X_4_512_avx2: 2474.6 yuv2gbrpf32le_full_X_4_512_c: 13060.6 yuv2gbrpf32le_full_X_4_512_sse2: 9682.6 yuv2gbrpf32le_full_X_4_512_sse4: 4298.1 yuv2gbrpf32le_full_X_4_512_avx2: 2453.1 yuv2gbrapf32be_full_X_4_512_c: 18629.6 yuv2gbrapf32be_full_X_4_512_sse2: 11363.1 yuv2gbrapf32be_full_X_4_512_sse4: 15201.6 yuv2gbrapf32be_full_X_4_512_avx2: 3727.1 yuv2gbrapf32le_full_X_4_512_c: 16677.6 yuv2gbrapf32le_full_X_4_512_sse2: 10221.6 yuv2gbrapf32le_full_X_4_512_sse4: 5693.6 yuv2gbrapf32le_full_X_4_512_avx2: 3656.6 Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	3 years ago
James Almer	621e2625e0	swscale/x86/output: add missing AVX2 support preprocessor wrappers Fixes compilation with old yasm Signed-off-by: James Almer <jamrial@gmail.com>	4 years ago
Nelson Gomez	bc01337db4	swscale/x86/output: add AVX2 version of yuv2nv12cX 256 bits is just wide enough to fit all the operands needed to vectorize the software implementation, but AVX2 is needed to for a couple of instructions like cross-lane permutation. Output is bit-for-bit identical to C. Signed-off-by: Nelson Gomez <nelson.gomez@microsoft.com>	5 years ago
Vittorio Giovara	41ed7ab45f	cosmetics: Fix spelling mistakes Signed-off-by: Diego Biurrun <diego@biurrun.de>	9 years ago
Michael Niedermayer	f6492a2ea8	swscale/x86/output: Fix yuv2planeX_16* with unaligned destination Reviewed-by: BBB Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	9 years ago
Michael Niedermayer	d07f6e5f1c	swscale/x86/output: Move code into yuv2planeX_mainloop Reviewed-by: BBB Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	9 years ago
Thilo Borgmann	d814a839ac	Reinstate proper FFmpeg license for all files.	12 years ago
Diego Biurrun	87af05c575	x86: SPLATD: port to cpuflags	12 years ago
Diego Biurrun	26301caaa1	x86: mmx2 ---> mmxext in asm constructs	12 years ago
Diego Biurrun	4b60fac419	x86: PALIGNR: port to cpuflags	12 years ago
Diego Biurrun	652f518594	x86: mmx2 ---> mmxext in comments and messages	12 years ago
Diego Biurrun	04581c8c77	x86: yasm: Use complete source path for macro helper %includes This is more consistent with the way we handle C #includes and it simplifies the build system.	12 years ago
Diego Biurrun	6860b4081d	x86: include x86inc.asm in x86util.asm This is necessary to allow refactoring some x86util macros with cpuflags.	12 years ago
Carl Eugen Hoyos	52be5428c0	Add some missing _EXTERNAL suffixes to yasm source files.	13 years ago
Henrik Gramner	729f90e268	x86inc improvements for 64-bit Add support for all x86-64 registers Prefer caller-saved register over callee-saved on WIN64 Support up to 15 function arguments Also (by Ronald S. Bultje) Fix up our asm to work with new x86inc.asm. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>	13 years ago
Ronald S. Bultje	dccb2cd3f9	swscale: make %rep unconditional. Fixes pre-processing with latest versions of nasm.	13 years ago
Ronald S. Bultje	8249a23fc1	swscale: remove now unnecessary hack.	13 years ago
Carl Eugen Hoyos	186923758d	Fix compilation with old yasm that does not support AVX.	13 years ago
Ronald S. Bultje	771bab7f57	swscale: fix crashes in yuv2yuvX on x86-32. They were introduced in an earlier commit that introduced use of named arguments. One cause was a typo, a second cause appears to be a bug in x264asm that I work around by not using named arguments.	13 years ago
Ronald S. Bultje	3e23badd83	swscale: convert yuv2yuvX() to using named arguments.	13 years ago
Ronald S. Bultje	8c433d8a03	swscale: rename "dstw" to "w" to prevent name collisions. "dstw" can collide with the word-version of the "dst" argument, causing all kind of weird stuff down the pipe.	13 years ago
Ronald S. Bultje	ef66a0ed2e	swscale: use named registers in yuv2yuv1_plane() place. Most of the function had been converted before, but I forgot this particular location.	13 years ago
Ronald S. Bultje	783487ae44	swscale: sign-extend integer function argument to qword on x86-64.	13 years ago
Ronald S. Bultje	ef1c785f11	swscale: make yuv2yuv1 use named registers.	13 years ago
Ronald S. Bultje	3b15a6d742	config.asm: change %ifdef directives to %if directives. This allows combining multiple conditionals in a single statement.	13 years ago
Ronald S. Bultje	3c172a4106	swscale: change yuv2yuvX code to use cpuflag().	13 years ago
Carl Eugen Hoyos	ef3a19d595	Fix compilation with yasm-0.6.2	13 years ago
Ronald S. Bultje	6ea64339c5	swscale: split scale.asm. scale.asm keeps horizontal scaling functions, whereas output.asm gets the vertical scaling/output functions.	13 years ago

47 Commits (afc152f564fbeca4d2ff62195e3f0b6244e28cb3)