'li.s' is a synthesized instruction, it does not work properly
when compiled with clang on mips, and A segfault occurred.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Failed fate case: fate-h264-conformance-caba2_sony_e
Clang is more strict in the use of register constraint.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Clang report following error in aacsbr_mips.c,ac3dsp_mips.c and aacdec_mips.c:
"couldn't allocate output register for constraint 'r'"
Use 'f' constraint for float variable.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This option is directly copy-pasted from the SVT1-HEVC wrapper and has
no place in the options for an AV1 encoder.
AV1 has no H.264/5 IDR frames nor anything like them.
All this option does is change all real keyframes to an intra-only
AV1 frame, which is not seekable. Hence, any streams encoded with
this option enabled will not be seekable.
instead of get_ue_golomb(). The difference between the two is that the
latter also has to take into account the case in which the read code is
more than 9 bits (four preceding zeroes + at most five value bits) long,
leading to more code.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
get_ue_golomb_31() reads nine bits and an array with 512 entries to
parse golomb codes. The longest golomb codes that fit into 9 bits use
four leading zeroes and five value bits and can encode numbers in the
0..30 range. 31 meanwhile is encoded on 11 bits and if the nine bits
read coincide with the first nine bits of the encoding of 31,
get_ue_golomb_31() returns 31 (and skips 11 bits).
But looking at the first nine bits only makes it impossible to distinguish
31 from 32..34. Therefore the documentation of get_ue_golomb_31() simply
states that the return value is undefined if the value of the encountered
exp golomb code was outside the 0..31 range.
But actually get_ue_golomb_31() does not behave that bad: If the returned
value is in the range of 0..30, then this is the actually encountered value,
so that this function can be used without any problems to parse and validate
parameters whose legal values are a subset of the 0..30 range.
This commit documents this fact.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
This happened in get_ue_golomb() if the cached bitstream reader was in
use, because there was no check to handle the case of the read value
not being in the supported range.
For consistency with the uncached bitstream reader and for compliance
with the documentation, every value not in the 0-8190 range is treated as
error although the cached bitstream reader could actually read values in
the range 0..65534 without problems.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
This patch adds support for PPM marker for JPEG2000
decoder. It allows the samples p1_03.j2k and p1_05.j2k
to be decoded.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Said error message is not very informative and lacked a proper logging
context; furthermore, many callers already provided more descriptive
error messages of their own. So just drop this one.
Suggested-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
This reverts commit 489c5db079.
Treating EQUAL_MULTI_ROWS in the same way as the arbitrary-size cases is
just wrong. Consider 9 rows, 4 slices - we pick 4 slices with sizes
{ 3, 2, 2, 2 }, which EQUAL_MULTI_ROWS does not allow. It isn't possible
to split the frame into 4 slices at all with the EQUAL_MULTI_ROWS
structure - the closest options are 3 slices with sizes { 3, 3, 3 } or 5
slices with sizes { 2, 2, 2, 2, 1 }.
The codeblock decoder checks whether the mqc decoder
has decoded the right number of bytes. However, this
check does not account for the fact that the mqc encoder's
flush routine adds 2 bytes of data which does not have to be
read by the decoder. The check is modified to account for
this. This patch solves issue #4827
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Later the decorrelate_stereo call is guarded by channels == 2
and non-zero decorr_left_weight. Make sure decorr_shift is in
the expected shift range for that case.
Fixes: shift exponent 128 is too large for 32-bit type 'int'
Fixes: 23860/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_ALAC_fuzzer-5751138914402304
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: Alexander Strasser <eclipse7@gmx.net>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
GCC complains:
warning: listing the stack pointer register ‘$29’ in a clobber
list is deprecated [-Wdeprecated]
Actually stack pointer was restored at the end of the inline assembly
so there is no reason to add it to the clobber list.
Also use $sp insted of $29 to make our intention much more clear.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Apply optimized functions according to cpuflags.
MSA is usually put after MMI as it's generally faster than MMI.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
To enable runtime detection for MIPS, we need to refine ffbuild
part to support buildding these feature together.
Firstly, we fixed configure, let it probe native ability of toolchain
to decide wether a feature can to be enabled, also clearly marked
the conflictions between loongson2 & loongson3 and Release 6 & rest.
Secondly, we compile MMI and MSA C sources with their own flags to ensure
their flags won't pollute the whole program and generate illegal code.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This uses av_image_fill_plane_sizes instead of av_image_fill_pointers
when we are getting plane sizes to avoid UB from adding offsets to NULL.
Signed-off-by: Brian Kim <bkkim@google.com>
Signed-off-by: James Almer <jamrial@gmail.com>
similar to:
36e51c190b avcodec/libaomenc: use pix_fmt descriptors where useful
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: James Zern <jzern@google.com>
Fixes: out of array access
Fixes: crash.asf
Found-by: anton listov <greyfarn7@yandex.ru>
Reviewed-by: anton listov <greyfarn7@yandex.ru>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
lzwenc stores a function pointer to either put_bits or put_bits_le;
however, after the recent change, the function pointer's prototype
would depend on BitBuf. BitBuf is defined in put_bits.h, whose
definition depends on whether BITSTREAM_WRITER_LE is #defined or not.
For safety, we set a boolean flag for little/big endian instead,
which also allows the definition to be inlined.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Add functions to initialize tile slice structure and make tile slice:
- vaapi_encode_init_tile_slice_structure
- vaapi_encode_make_tile_slice
Tile slice is not allowed to cross the boundary of a tile due to
the constraints of media-driver. Currently adding support for one
slice per tile.
N x N tile encoding is supposed to be supported with the the
capability of ARBITRARY_MACROBLOCKS slice structures.
N X 1 tile encoding should also work in ARBITRARY_ROWS slice
structure.
Signed-off-by: Linjie Fu <linjie.justin.fu@gmail.com>
Wrap current whole-row slice codes into following functions:
- vaapi_encode_make_row_slice()
- vaapi_encode_init_row_slice_structure()
Signed-off-by: Linjie Fu <linjie.justin.fu@gmail.com>
Change BitBuf into uint64_t on 64-bit x86. This means we need to flush the
buffer less often, which is a significant speed win. All other platforms,
including all 32-bit ones, are unchanged. Output bitstream is the same.
All API constraints are kept in place, e.g., you still cannot put_bits()
more than 31 bits at a time. This is so that codecs cannot accidentally
become 64-bit-only or similar.
Benchmarking on transcoding to various formats shows consistently
positive results:
dnxhd 25.60 fps -> 26.26 fps ( +2.6%)
dvvideo 24.88 fps -> 25.17 fps ( +1.2%)
ffv1 14.32 fps -> 14.58 fps ( +1.8%)
huffyuv 58.75 fps -> 63.27 fps ( +7.7%)
jpegls 6.22 fps -> 6.34 fps ( +1.8%)
magicyuv 57.10 fps -> 63.29 fps (+10.8%)
mjpeg 48.65 fps -> 49.01 fps ( +0.7%)
mpeg1video 76.41 fps -> 77.01 fps ( +0.8%)
mpeg2video 75.99 fps -> 77.43 fps ( +1.9%)
mpeg4 80.66 fps -> 81.37 fps ( +0.9%)
prores 12.35 fps -> 12.88 fps ( +4.3%)
prores_ks 16.20 fps -> 16.80 fps ( +3.7%)
rv20 62.80 fps -> 62.99 fps ( +0.3%)
utvideo 68.41 fps -> 76.32 fps (+11.6%)
Note that this includes video decoding and all other encoding work,
such as DCTs. If you isolate the actual bit-writing routines, it is
likely to be much more.
Benchmark details: Transcoding the first 30 seconds of Big Buck Bunny
in 1080p, Haswell 2.1 GHz, GCC 8.3, generally quantizer locked to
5.0. (Exceptions: DNxHD needs fixed bitrate, and JPEG-LS is so slow
that I only took the first 10 seconds, not 30.) All runs were done
ten times and single-threaded, top and bottom two results discarded to
get rid of outliers, arithmetic mean between the remaining six.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Preparatory patch for making the bit buffer different size on different
platforms; make a typedef and make all the hardcoded sizes into expressions
deriving from this size.
No functional change; generated assembler is near-identical.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The JPEG2000 standard reserves marker values 0xFF30
to 0xFF3F to be used as parameterless markers. This
patch adds support to decode codestream with such
markers. This allows decoding of p0_02.j2k.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
VA_ENC_SLICE_STRUCTURE_EQUAL_MULTI_ROWS is added to in the latest
libva (1.8.0) which matches the hardware behaviour:
/** \brief Driver supports any number of rows per slice but they must
* be the same for all slices except for the last one, which must be
* equal or smaller to the previous slices.
*/
And VA_ENC_SLICE_STRUCTURE_EQUAL_ROWS is kind of deprecated for iHD
since it's somehow introduced in [1] which is misleading from what we
actually handles.
[1]<0e6d5441f1>
Signed-off-by: Linjie Fu <linjie.justin.fu@gmail.com>
RGB pixel formats are one occasion where by pixel format we mean
pixel format, primaries, transfer characteristic, and matrix coeffs,
so we have to manually set them as they're set to unspecified by
default, despite there only being a single possible combination.