Previously, the code allowed overwriting on 16-aligned blocks, which was suitable when there were
no picture's virtual boundaries because both CTU sizes and strides were 16-aligned. However, with
picture's virtual boundaries, each CTU is divided into four ALF blocks, leading to potential issues
with overwriting later CTUs.
In cases involving picture virtual boundaries, each ALF block is 8-pixel aligned.
For luma, we consistently ensure an 8-aligned width. For chroma in 4:2:0 format,
we need to account for a 4-aligned width.
now that we are reading ext_mapping_idc as the upper 8 bits of
el_bit_depth_minus8 we need to use get_ue_golomb_long rather than
get_ue_golomb_31 for reading it
These two fields are coded together into a single 16 bit integer with upper 8
bits for ext_mapping_idc and lower 8 bits for el_bit_depth_minus8.
Furthermore ext_mapping_idc has two components, upper 3 bits and lower 5 bits.
Co-authored-by: Niklas Haas <git@haasn.dev>
Signed-off-by: Niklas Haas <git@haasn.dev>
These tables are supposed to contain the number of bits needed
to encode a given (run, level) pair. Yet the number of bits
for pairs needing the escape code was wrong (it only contained
the escape code and not the bits needed for run and level).
Furthermore, H.261 (a format with explicit end-of-block codes)
does not work well together with the RLTable API from rl.c:
The EOB code is the first one in ff_h261_rl_tcoeff's VLC table
and has a run value of zero. Therefore the result of get_rl_index()
is off by one for run == 0 and level values with explicit
(run, level) pair.
Fixing this necessitated changing the ref files of the
vsynth*-h261-trellis tests. Both filesizes as well as PSNR
decreased. If one used a qscale value of 11 for this test,
one would have received files with about the same size as
before this patch (with qscale 12), but with better PSNR.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The RLTable API in rl.c is not well designed for codecs with
an explicit end-of-block code. ff_h261_rl_tcoeff's vlc has
the EOB code as first element (presumably so that the decoder
can check for it via "if (level == 0)") and this implies
that the indices returned by get_rl_index() are off by one
for run == 0 which is therefore explicitly checked.
This commit changes this by adding a simple LUT for the
values not requiring escaping. It is easy to directly
include the sign bit into this, so this has also been done.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
These must not be modified (even when they are initialized at runtime
and therefore modifiable).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is a better place for it; no non-h263-based decoder needs
these functions any more (both H.261 and the error resilience
code recently stopped doing so).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is an MPEG-4-only value; it is always five for the MPEG-4
encoder, so just hardcode this value and move the MpegEncContext
field to Mpeg4DecContext.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The RV10 and RV20 decoders use ff_h263_decode_mb() and also the
H.263 DSP and VLCs. Despite not calling ff_h263_decode_frame(),
it is nevertheless beneficial to call ff_h263_decode_init()
to reduce code duplication.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The error resilience code does not make up block coefficients
and therefore zeroes them in order to disable the IDCT.
But this can be done in a simpler manner, namely by setting
block_last_index to a negative value. Doing so also has
the advantage that the dct_unquantize functions are never even
called for those codecs that do not use ff_mpv_reconstruct_mb()
for ordinary decoding (namely RV-30/40 and the VC-1 family).
This approach would not work for intra macroblocks (there is always
at least one coefficient for them and therefore there is no check
for block_last_index for them), but this does not happen at all.
Add an assert for this.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Don't use a LUT to negate followed by a conditional ordinary
negation immediately thereafter. Instead fold the two.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This is beneficial for performance: When concatenating
the file from the vsynth1-h261 fate-test 100 times,
performance (measured by timing the codec's decode callback)
improved by 9.6%.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is unnecessary because ff_mpeg1_dc_scale_table is the default
for both dc_scale_tables.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
ff_init_block_index() sets MpegEncContext.dest and
MpegEncContext.block_index. The latter is unused by
ff_mpv_reconstruct_mb() (which is what this code is
preparatory for) and dest is overwritten a few lines below.
So don't initialize block_index at all.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
inter_scantable is only used by the dct_unquantize_h263_inter
functions, yet this is not used by the MPEG-4 decoder at all
(in case H.263 quantization is used, the unquantization already
happens in mpeg4_decode_block()).
Also move the common initialization of ff_permute_scantable()
out of the if.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The WMV2 decoder does not support lowres, so one can optimize
the WMV2 specific code away in the lowres version of this function.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
There are only two mpegvideo decoders that use another
(software) pixel format than YUV420: MPEG-1/2 and
the MPEG-4 studio profile. Neither of these use this part
of the code, so one can optimize the 422 code away when
this code is compiled for the decoder.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
h261_resync() can be completely removed, because
h261_decode_gob_header() checks for a GOB header itself if
gob_start_code_skipped is zero.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
last_resync_gb is never initialized, causing NULL + 0
in align_get_bits(). In addition to that, the loop is never
entered.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>