Provide optimized implementation for vsse_intra16 for arm64.
Performance tests are shown below.
- vsse_4_c: 155.2
- vsse_4_neon: 36.2
Benchmarks and tests are run with checkasm tool on AWS Graviton 3.
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Provide optimized implementation for vsad_intra16 function for arm64.
Performance comparison tests are shown below.
- vsad_4_c: 177.5
- vsad_4_neon: 23.5
Benchmarks and tests are run with checkasm tool on AWS Gravtion 3.
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Provide optimized implementation of vsse16 for arm64.
Performance comparison tests are shown below.
- vsse_0_c: 257.7
- vsse_0_neon: 59.2
Benchmarks and tests are run with checkasm tool on AWS Graviton 3.
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Provide optimized implementation of vsad16 function for arm64.
Performance comparison tests are shown below.
- vsad_0_c: 285.2
- vsad_0_neon: 39.5
Benchmarks and tests are run with checkasm tool on AWS Graviton 3.
Co-authored-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Add "slice" intra refresh type to h264_qsv and hevc_qsv. This type means
horizontal refresh by slices without overlapping. Also update the doc.
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
The latest commit of Loongson MMI macro replaces were incorrect.
It makes a mass of green tints on HEVC videos when playing. I've
compared it with the older MMI implementation, and found out that
several lines have been replaced by wrong macros.
Signed-off-by: Qi Tiezheng <qitiezheng@360.cn>
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
AV_PIX_FMT_VUYX is used in FFmpeg and MFX_FOURCC_AYUV is used in the SDK
Reviewed-by: Philip Langdale <philipl@overt.org>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Currently AVBR is disabled and VBR is the default method if maxrate is
not specified on Linux, but AVBR is the default one if maxrate is not
specified on Windows. In order to make user experience better accross
Linux and Windows, use VBR by default on Windows if maxrate is not
specified. User need to set both avbr_accuracy and avbr_convergence to
non-zero explicitly and not to specify maxrate if AVBR is expected.
In addition, AVBR works for H264 and HEVC only in the SDK.
$ ffmpeg.exe -v verbose -f lavfi -i yuvtestsrc -vf "format=nv12" -c:v
vp9_qsv -f null -
The FFV1 decoder only uses the last frame's data to conceal
errors. The encoder does not have this problem and therefore
only uses the current frame and none of the ThreadFrames.
So only allocate them for the decoder.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
By using a symbol table one can already bake in applying
a LUT on the return value of get_vlc2(). So change the
symbol table for the vec2 and vec4 tables to avoid
using the symbol_to_vec2/4 LUTs.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It allows to replace tables of big codes (uint16_t and uint32_t)
by tables of smaller symbols (mostly uint8_t).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
These codes are already ordered from left-to-right in the tree,
so one can just use ff_init_vlc_static_from_lengths().
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Instead reuse the destination RL VLC as scratch space.
This is possible, because the (implicit) codes here are already
ordered from left-to-right in the tree and because the codelengths
are increasing, which implies that mapping from VLC entries to the
corresponding entries used to initialize the VLC is monotonically
increasing. This means that one can reuse the right end of the
destination RL VLC to store the tables used to initialize the VLC
with.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This is possible because the codes are already ordered
from left to right in the tree. It avoids having to create
the codes ourselves and will enable the codes table
to be removed altogether once the encoder stops using it.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Right now, it is nearly ordered by "left codes in the tree first";
the only exception is the escape value which has been put at the
end. This commit moves it to the place it should have according
to the above order. This is in preparation for further commits.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This state is not refcounted, so make sure it always has a well-defined
owner.
Remove the block added in 091341f2ab, as
this commit also solves that issue in a more general way.
Mention:
- that it is legacy and optional (every hwaccel that uses it can also
work with hwcontext, though some optional information can only be
signalled throught hwaccel_context)
- that it can be used for encoders (only qsvenc currently)
- ownership and lifetime
Inside a function, the second ';' in ";;" is just a null statement,
but it is actually illegal outside of functions. Compilers
nevertheless accept it without warning, except when in -pedantic
mode when e.g. Clang emits a -Wextra-semi warning. Therefore
remove the unnecessary ';'.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Creating CFHD RL VLC tables works by first extending
the codes by the sign, followed by creating a VLC,
followed by deriving the RL VLC from this VLC (which
is then discarded). Extending the codes uses stack arrays.
The tables used to initialize the VLC are already sorted
from left-to-right in the tree. This means that the
corresponding VLC entries are generally also ascending,
but not always: Entries from subtables always follow
the corresponding main table although it is possible
for the right-most node to fit into the main table.
This suggests that one can try to use the final destination
buffer as scratch buffer for the tables with sign included.
Unfortunately it works for neither of the tables if one
uses the right-most part of the RL VLC buffer as scratch buffer;
using the left-most part of the RL VLC buffer as scratch buffer
might work if one traverses the VLC entries from end to start.
But it works only for the little RL VLC (table 9), not for table 18.
Therefore this patch uses the RL VLC buffer for table 9
as scratch buffer for creating the bigger table 18.
Afterwards the left part of the buffer for table 9 is
used as scratch buffer to create table 9.
This fixes the cfhd part of ticket #9399 (if it is not already fixed).
Notice that I do not consider the previous stack usage excessive.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The VLC is only used to initialize RL VLC.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
cfhddata.c initializes a RL VLC table via code tables and
corresponding tables for length, run and level. code and length
tables are used to initialize a VLC, no symbol table is used.
Afterwards the symbols of said VLC are just the indices of
the corresponding entries in the code and length table that
were used for initialization; they can therefore be used
to get the matching level and run entry and they are not used
for anything else. Therefore one can just permute these tables
without changing the resulting RL VLC tables.
This commit does just this. It permutes these tables so that
the code tables are ordered from left to right in the resulting
tree and then switches to ff_init_vlc_from_lengths(), which
allows to remove the codes table altogether.
Given that these tables are constructed on the stack, this
also reduces stack usage, potentially fixing part of #9399.
(The size of the tables on the stack decreases from 4752 to
2640.)
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
cfhd.c checked for level being equal to a certain codebook-
dependent constant and to run being two. The first check is
actually redundant, as all codebooks contain only one (real)
entry with run == 2 (as is usual with VLCs, this one real entry
has several corresponding entries in the table). But given
that no entry has a run of zero (except incomplete entries
which just signal that one needs to do another round of parsing),
one can actually use that as sentinel. This patch does so.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The struct is quite small and the decoder and the encoder use different
fields from it, so benefits from reusing it are small.
This allows making the buf field const.
The function contains only two assignments, setting DVVideoContext.avctx
and AVCodecContext.chroma_sample_location. However, the decoder does not
use the former, and the encoder should not be setting the latter.
Therefore move the first assignment to dvenc and the second to dvdec.
Make the encoder warn if the user-signalled chroma sample location does
not match the supported one, and return an error on higher compliance
levels.
Fixes: out of array access
Fixes: 50014/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_SPEEDHQ_fuzzer-4748914632294400
Alternatively the buffer size can be increased
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>