Up until now, libswscale/output.c used a macro to write
an output pixel which involved a call to av_pix_fmt_desc_get()
to find out whether the input pixel format is BE or LE
despite this being known at compile-time (there are templates
per pixfmt). Even worse, these calls are made in a loop,
so that e.g. there are eight calls to av_pix_fmt_desc_get()
for every pixel processed in yuv2rgba64_X_c_template()
for 64bit RGB formats.
This commit modifies these macros to ensure that isBE()
is evaluated at compile-time. This saved 41184B of .text
for me (GCC 11.2, -O3). Of course, it also improved performance.
E.g. ffmpeg_g -f lavfi -i testsrc2,format=yuva420p -pix_fmt rgba64le \
-threads 1 -t 1:00 -f null - (which uses yuv2rgba64le_X_c,
which is an invocation of yuv2rgba64_X_c_template() mentioned above),
performance improved from 95589 to 41387 decicycles for one call
to yuv2packedX; for the be variant the numbers went down from
76087 to 43024 decicycles.
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
As we already have support for VUYA, I figured I should do the small
amount of work to support VUYX as well. That means a little refactoring
to share code.
These inclusions are not necessary, as cpu.h is already included
wherever it is needed (via direct inclusion or via the arch-specific
headers).
Also remove other unnecessary cpu.h inclusions from ordinary
non-headers.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Extracting information from SwsContext in assembly is difficult, and
rearranging SwsContext just for asm access didn't look good. These
functions only need a couple of fields from it anyway, so just make
them parameters in their own right.
Signed-off-by: Nelson Gomez <nelson.gomez@microsoft.com>
Fixes: signed integer overflow: 1169365504 + 981452800 cannot be represented in type 'int'
Fixes: ticket8293
Found-by: Suhwan
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: 524280 * 4432 cannot be represented in type 'int'
Fixes: ticket8322
Found-by: Suhwan
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The implementation is pretty straight-forward. Most of the existing
NV12 codepaths work regardless of subsampling and are re-used as is.
Where necessary I wrote the slightly different NV24 versions.
Finally, the one thing that confused me for a long time was the
asm specific x86 path that did an explicit exclusion check for NV12.
I replaced that with a semi-planar check and also updated the
equivalent PPC code, which Lauri kindly checked.
To make the best use of existing code, I generalised the wrapper
that currently does yuv420p10 to p010 to support any mixture of
input and output sizes between 10 and 16 bits. This had the side
effect of yielding a working code path for all yuv420p1x formats
to p01x.
Fixes filter-pixfmts-scale test failing on big-endian systems due to
alpSrc not being cast to (const int32_t**).
Also fixes distortions in the output alpha channel values by copying the
alpha channel code from the rgba64 case found elsewhere in output.c.
Fixes ticket 6555.
Signed-off-by: James Cowgill <James.Cowgill@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>