There is an issue with the constants used in YUV to YUV range conversion,
where the upper bound is not respected when converting to mpeg range.
With this commit, the constants are calculated at runtime, depending on
the bit depth. This approach also allows us to more easily understand how
the constants are derived.
For bit depths <= 14, the number of fixed point bits has been set to 14
for all conversions, to simplify the code.
For bit depths > 14, the number of fixed points bits has been raised and
set to 18, to allow for the conversion to be accurate enough for the mpeg
range to be respected.
The convert functions now take the conversion constants (coeff and offset)
as function arguments.
For bit depths <= 14, coeff is unsigned 16-bit and offset is 32-bit.
For bit depths > 14, coeff is unsigned 32-bit and offset is 64-bit.
x86_64:
chrRangeFromJpeg8_1920_c: 2127.4 2125.0 (1.00x)
chrRangeFromJpeg16_1920_c: 2325.2 2127.2 (1.09x)
chrRangeToJpeg8_1920_c: 3166.9 3168.7 (1.00x)
chrRangeToJpeg16_1920_c: 2152.4 3164.8 (0.68x)
lumRangeFromJpeg8_1920_c: 1263.0 1302.5 (0.97x)
lumRangeFromJpeg16_1920_c: 1080.5 1299.2 (0.83x)
lumRangeToJpeg8_1920_c: 1886.8 2112.2 (0.89x)
lumRangeToJpeg16_1920_c: 1077.0 1906.5 (0.56x)
aarch64 A55:
chrRangeFromJpeg8_1920_c: 28835.2 28835.6 (1.00x)
chrRangeFromJpeg16_1920_c: 28839.8 32680.8 (0.88x)
chrRangeToJpeg8_1920_c: 23074.7 23075.4 (1.00x)
chrRangeToJpeg16_1920_c: 17318.9 24996.0 (0.69x)
lumRangeFromJpeg8_1920_c: 15389.7 15384.5 (1.00x)
lumRangeFromJpeg16_1920_c: 15388.2 17306.7 (0.89x)
lumRangeToJpeg8_1920_c: 19227.8 19226.6 (1.00x)
lumRangeToJpeg16_1920_c: 15387.0 21146.3 (0.73x)
aarch64 A76:
chrRangeFromJpeg8_1920_c: 6324.4 6268.1 (1.01x)
chrRangeFromJpeg16_1920_c: 6339.9 11521.5 (0.55x)
chrRangeToJpeg8_1920_c: 9656.0 9612.8 (1.00x)
chrRangeToJpeg16_1920_c: 6340.4 11651.8 (0.54x)
lumRangeFromJpeg8_1920_c: 4422.0 4420.8 (1.00x)
lumRangeFromJpeg16_1920_c: 4420.9 5762.0 (0.77x)
lumRangeToJpeg8_1920_c: 5949.1 5977.5 (1.00x)
lumRangeToJpeg16_1920_c: 4446.8 5946.2 (0.75x)
NOTE: all simd optimizations for range_convert have been disabled.
they will be re-enabled when they are fixed for each architecture.
NOTE2: the same issue still exists in rgb2yuv conversions, which is not
addressed in this commit.
As part of a larger, ongoing effort to modernize and partially rewrite
libswscale, it was decided and generally agreed upon to introduce a new
public API for libswscale. This API is designed to be less stateful, more
explicitly defined, and considerably easier to use than the existing one.
Most of the API work has been already accomplished in the previous commits,
this commit merely introduces the ability to use sws_scale_frame()
dynamically, without prior sws_init_context() calls. Instead, the new API
takes frame properties from the frames themselves, and the implementation is
based on the new SwsGraph API, which we simply reinitialize as needed.
This high-level wrapper also recreates the logic that used to live inside
vf_scale for scaling interlaced frames, enabling it to be reused more easily
by end users.
Finally, this function is designed to simply copy refs directly when nothing
needs to be done, substantially improving throughput of the noop fast path.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This is a purely cosmetic commit aimed at replacing accesses to
SwsInternal.opts by direct access to SwsContext wherever convenient.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This is a preliminary step to separating these into a new struct. This
commit contains no functional changes, it is a pure search-and-replace.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Same as it's done in lumRangeToJpeg16_c(). Plenty of allowed input values can
overflow here.
Fixes: src/libswscale/swscale.c:198:47: runtime error: signed integer overflow: 475328 * 4663 cannot be represented in type 'int'
Signed-off-by: James Almer <jamrial@gmail.com>
This commit also fixes the issue that the call to ff_sws_init_range_convert()
from sws_init_swscale() was not setting up the arch-specific optimizations.
And preserve the public SwsContext as separate name. The motivation here
is that I want to turn SwsContext into a public struct, while keeping the
internal implementation hidden. Additionally, I also want to be able to
use multiple internal implementations, e.g. for GPU devices.
This commit does not include any functional changes. For the most part, it is
a simple rename. The only complications arise from the public facing API
functions, which preserve their current type (and hence require an additional
unwrapping step internally), and the checkasm test framework, which directly
accesses SwsInternal.
For consistency, the affected functions that need to maintain a distionction
have generally been changed to refer to the SwsContext as *sws, and the
SwsInternal as *c.
In an upcoming commit, I will provide a backing definition for the public
SwsContext, and update `sws_internal()` to dereference the internal struct
instead of merely casting it.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
I want to pull options out of SwsInternal, so we need to make this field
a dedicated int that gets updated as appropriate in ff_swscale().
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Used as an intermediate entry point for the new swscale context. The extra
constification is a consistency measure, as I want to move the memcpy of
stride and plane pointers to the functions that actually need to mutate them.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Instead of taking an int16_t pointer and a stride in halfwords, follow the
usual convention of treating all planes and strides as byte-addressed.
This does not have any immediate effect but makes these functions more
reusable without unintended "gotchas".
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This fixes an 11-year-old bug in the rgb2xyz functions, when used with a
negative stride. The current loop bounds turned it into a no-op.
Additionally, this increases performance on highly cropped images, whose
stride may be substantially higher than the effective width.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
I have not checked that the constant is correct, this just fixes the undefined behavior
Fixes: signed integer overflow: -646656 * 3517 cannot be represented in type 'int
Fixes: 70559/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-5209368631508992
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This is unlikely to make a difference
Fixes: CID1591896 Unintentional integer overflow
Fixes: CID1591901 Unintentional integer overflow
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
There are lots of files that don't need it: The number of object
files that actually need it went down from 2011 to 884 here.
Keep it for external users in order to not cause breakages.
Also improve the other headers a bit while just at it.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
More commonly, this fixes the case of sws_setColorspaceDetails after
sws_getContext, since the latter implies sws_init_context.
The problem here is that sws_init_context sets up the range conversion
and fast path tables based on the values of srcRange/dstRange at init
time. This may result in locking in a "wrong" path (either using
unscaled fast path when range conversion later required, or using
scaled slow path when range conversion becomes no longer required).
There are two way outs:
1. Always initialize range conversion and unscaled converters, even if
they will be unused, and extend the runtime check.
2. Re-do initialization if the values change after
sws_setColorspaceDetails.
I opted for approach 1 because it was simpler and easier to reason
about.
Reword the av_log message to make it clear that this special converter
is not necessarily used, depending on whether or not there is range
conversion or YUV matrix conversion going on.
This is more spec-compliant because it does not rely
on dead-code elimination by the compiler. Especially
MSVC has problems with this, as can be seen in
https://ffmpeg.org/pipermail/ffmpeg-devel/2022-May/296373.html
or
https://ffmpeg.org/pipermail/ffmpeg-devel/2022-May/297022.html
This commit does not eliminate every instance where we rely
on dead code elimination: It only tackles branching to
the initialization of arch-specific dsp code, not e.g. all
uses of CONFIG_ and HAVE_ checks. But maybe it is already
enough to compile FFmpeg with MSVC with whole-programm-optimizations
enabled (if one does not disable too many components).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Some of these were made possible by moving several common macros to
libavutil/macros.h.
While just at it, also improve the other headers a bit.
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This makes output consistent with a similar warning just few
lines above where this flag is checked in the same way.
Signed-off-by: softworkz <softworkz@hotmail.com>
Signed-off-by: Marton Balint <cus@passwd.hu>
SSE2 is x86 specific, yet due to the call to av_get_cpu_flags()
compilers were unable to optimize the checks (and the call) away
on other arches.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
In this case the current code tries to warn once; to do so, it uses
ordinary static ints to store whether the warning has already been
emitted. This is both a data race (and therefore undefined behaviour)
as well as a race condition, because it is really possible for multiple
threads to be the one thread to emit the warning. This is actually
common since the introduction of the new multithreaded scaling API.
This commit fixes this by using atomic integers for the state;
furthermore, these are not static anymore, but rather contained
in the user-facing SwsContext (i.e. the parent SwsContext in case
of slice-threading).
Given that these atomic variables are not intended for synchronization
at all (but only for atomicity, i.e. only to output the warning once),
the atomic operations use memory_order_relaxed.
This affected the nv12, nv21, yuv420, yuv420p10, yuv422, yuv422p10 and
yuv444 filter-overlay FATE-tests.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Call the scaler function directly rather than through a function
pointer. Drop the now-unused return value from ff_getSwsFunc() and
rename the function to reflect its new role.
This will be useful in the following commits, where it will become
important that the amount of output is different for scaled vs unscaled
case.