Apply optimized functions according to cpuflags.
MSA is usually put after MMI as it's generally faster than MMI.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Add MMI & MSA runtime detection for MIPS.
Basically there are two code pathes. For systems that
natively support CPUCFG instruction or kernel emulated
that instruction, we'll sense this feature from HWCAP and
report the flags according to values grab from CPUCFG. For
systems that have no CPUCFG (or not export it in HWCAP),
we'll parse /proc/cpuinfo instead.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
That helper grab from kernel code can allow us to inline
newer instructions (not implemented by the assembler) in
a elegant manner.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
To enable runtime detection for MIPS, we need to refine ffbuild
part to support buildding these feature together.
Firstly, we fixed configure, let it probe native ability of toolchain
to decide wether a feature can to be enabled, also clearly marked
the conflictions between loongson2 & loongson3 and Release 6 & rest.
Secondly, we compile MMI and MSA C sources with their own flags to ensure
their flags won't pollute the whole program and generate illegal code.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This uses av_image_fill_plane_sizes instead of av_image_fill_pointers
when we are getting plane sizes to avoid UB from adding offsets to NULL.
Signed-off-by: Brian Kim <bkkim@google.com>
Signed-off-by: James Almer <jamrial@gmail.com>
This uses av_image_fill_plane_sizes instead of av_image_fill_pointers
when we are getting plane sizes to avoid UB from adding offsets to NULL.
Signed-off-by: Brian Kim <bkkim@google.com>
Signed-off-by: James Almer <jamrial@gmail.com>
This utility helps avoid undefined behavior when doing things like
checking how much memory we need to allocate for an image before we have
allocated a buffer.
Signed-off-by: Brian Kim <bkkim@google.com>
Signed-off-by: James Almer <jamrial@gmail.com>
similar to:
36e51c190b avcodec/libaomenc: use pix_fmt descriptors where useful
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: James Zern <jzern@google.com>
Fixes: out of array access
Fixes: crash.asf
Found-by: anton listov <greyfarn7@yandex.ru>
Reviewed-by: anton listov <greyfarn7@yandex.ru>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: 33986707200000000 + 9195561788997000192 cannot be represented in type 'long'
Fixes: 23790/clusterfuzz-testcase-minimized-ffmpeg_DEMUXER_fuzzer-6554232198266880
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: Nicolas George <george@nsup.org>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
lzwenc stores a function pointer to either put_bits or put_bits_le;
however, after the recent change, the function pointer's prototype
would depend on BitBuf. BitBuf is defined in put_bits.h, whose
definition depends on whether BITSTREAM_WRITER_LE is #defined or not.
For safety, we set a boolean flag for little/big endian instead,
which also allows the definition to be inlined.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Add functions to initialize tile slice structure and make tile slice:
- vaapi_encode_init_tile_slice_structure
- vaapi_encode_make_tile_slice
Tile slice is not allowed to cross the boundary of a tile due to
the constraints of media-driver. Currently adding support for one
slice per tile.
N x N tile encoding is supposed to be supported with the the
capability of ARBITRARY_MACROBLOCKS slice structures.
N X 1 tile encoding should also work in ARBITRARY_ROWS slice
structure.
Signed-off-by: Linjie Fu <linjie.justin.fu@gmail.com>
Wrap current whole-row slice codes into following functions:
- vaapi_encode_make_row_slice()
- vaapi_encode_init_row_slice_structure()
Signed-off-by: Linjie Fu <linjie.justin.fu@gmail.com>
Because the newpos variable is set value before use it.
The newpos variable declared at the head partition of crypto_seek.
Make the code clean.
Signed-off-by: Steven Liu <lq@chinaffmpeg.org>
Change BitBuf into uint64_t on 64-bit x86. This means we need to flush the
buffer less often, which is a significant speed win. All other platforms,
including all 32-bit ones, are unchanged. Output bitstream is the same.
All API constraints are kept in place, e.g., you still cannot put_bits()
more than 31 bits at a time. This is so that codecs cannot accidentally
become 64-bit-only or similar.
Benchmarking on transcoding to various formats shows consistently
positive results:
dnxhd 25.60 fps -> 26.26 fps ( +2.6%)
dvvideo 24.88 fps -> 25.17 fps ( +1.2%)
ffv1 14.32 fps -> 14.58 fps ( +1.8%)
huffyuv 58.75 fps -> 63.27 fps ( +7.7%)
jpegls 6.22 fps -> 6.34 fps ( +1.8%)
magicyuv 57.10 fps -> 63.29 fps (+10.8%)
mjpeg 48.65 fps -> 49.01 fps ( +0.7%)
mpeg1video 76.41 fps -> 77.01 fps ( +0.8%)
mpeg2video 75.99 fps -> 77.43 fps ( +1.9%)
mpeg4 80.66 fps -> 81.37 fps ( +0.9%)
prores 12.35 fps -> 12.88 fps ( +4.3%)
prores_ks 16.20 fps -> 16.80 fps ( +3.7%)
rv20 62.80 fps -> 62.99 fps ( +0.3%)
utvideo 68.41 fps -> 76.32 fps (+11.6%)
Note that this includes video decoding and all other encoding work,
such as DCTs. If you isolate the actual bit-writing routines, it is
likely to be much more.
Benchmark details: Transcoding the first 30 seconds of Big Buck Bunny
in 1080p, Haswell 2.1 GHz, GCC 8.3, generally quantizer locked to
5.0. (Exceptions: DNxHD needs fixed bitrate, and JPEG-LS is so slow
that I only took the first 10 seconds, not 30.) All runs were done
ten times and single-threaded, top and bottom two results discarded to
get rid of outliers, arithmetic mean between the remaining six.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Preparatory patch for making the bit buffer different size on different
platforms; make a typedef and make all the hardcoded sizes into expressions
deriving from this size.
No functional change; generated assembler is near-identical.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The JPEG2000 standard reserves marker values 0xFF30
to 0xFF3F to be used as parameterless markers. This
patch adds support to decode codestream with such
markers. This allows decoding of p0_02.j2k.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
VA_ENC_SLICE_STRUCTURE_EQUAL_MULTI_ROWS is added to in the latest
libva (1.8.0) which matches the hardware behaviour:
/** \brief Driver supports any number of rows per slice but they must
* be the same for all slices except for the last one, which must be
* equal or smaller to the previous slices.
*/
And VA_ENC_SLICE_STRUCTURE_EQUAL_ROWS is kind of deprecated for iHD
since it's somehow introduced in [1] which is misleading from what we
actually handles.
[1]<0e6d5441f1>
Signed-off-by: Linjie Fu <linjie.justin.fu@gmail.com>
remove the timeout option docs part for HTTP protocol and add
auth_type option part.
Reviewed-by: Gyan Doshi <ffmpeg@gyani.pro>
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
When there are potentially annotation (i.e. metadata) fields to write,
au_get_annotations() is called to produce a string with them. To do so,
it uses an AVBPrint which is finalized to create the string. This is
wasteful, because it always leads to an allocation even if the string
actually fits into the internal buffer of the AVBPrint. This commit
changes this by making au_get_annotations() modify an AVBPrint that
resides on the stack of the caller (i.e. of au_write_header()).
Furthermore, the AVBPrint is now checked for truncation; limiting
the allocations implicit in the AVBPrint allowed to offload the overflow
checks. Notice that these were not correct before: The size parameter of
avio_write() is an int, yet the string in the AVBPrint was allowed to
grow bigger than INT_MAX. And if the length of the string was so near
UINT_MAX that the length + 32 overflowed, the old code would write the
first eight bytes of the string and nothing more, leading to an invalid
file.
Finally, the special case in which the metadata dictionary of the
AVFormatContext is empty (in which case one still has to write eight
binary zeroes) is now no longer treated specially, because this case
no longer incurs any allocation.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
This av_freep(&key) in conjunction with the fact that the loop condition
checks for key != NULL was equivalent to a av_freep(&key) + a break
immediately thereafter. But given that there is an av_freep(&key)
directly after the loop, the av_freep(&key) is unnecessary and the break
can also be added explicitly.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
RGB pixel formats are one occasion where by pixel format we mean
pixel format, primaries, transfer characteristic, and matrix coeffs,
so we have to manually set them as they're set to unspecified by
default, despite there only being a single possible combination.