This function is so extremely simple that it is preferable to make it
inline rather than deal with all the complications arising from it being
an exported symbol.
Keep avpriv_align_put_bits() around until the next major bump to
preserve ABI compatibility.
Before c63c303a1f (the commit which
introduced a typedef for the type of the buffer of a PutBitContext)
skip_put_bits() was as follows:
static inline void skip_put_bits(PutBitContext *s, int n)
{
s->bit_left -= n;
s->buf_ptr -= 4 * (s->bit_left >> 5);
s->bit_left &= 31;
}
If s->bit_left was negative after the first subtraction, then the next
line will divide this by 32 with rounding towards -inf and multiply by
four; the result will be negative, of course.
The aforementioned commit changed this to:
static inline void skip_put_bits(PutBitContext *s, int n)
{
s->bit_left -= n;
s->buf_ptr -= sizeof(BitBuf) * ((unsigned)s->bit_left / BUF_BITS);
s->bit_left &= (BUF_BITS - 1);
}
Casting s->bit_left to unsigned meant that the rounding is still towards
-inf; yet the right side is now always positive (it transformed the
arithmetic shift into a logical shift), so that s->buf_ptr will always
be decremented (by about UINT_MAX / 8 unless n is huge) which leads to
segfaults on further usage and is already undefined pointer arithmetic
before that. This can be reproduced with the mpeg4 encoder with the
AV_CODEC_FLAG2_NO_OUTPUT flag set.
Furthermore, the earlier version as well as the new version share
another bug: s->bit_left will be in the range of 0..(BUF_BITS - 1)
afterwards, although the assumption throughout the other PutBitContext
functions is that it is in the range of 1..BUF_BITS. This might lead to
a shift by BUF_BITS in little-endian mode. This has been fixed, too.
The new version is furthermore able to skip zero bits, too.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Change BitBuf into uint64_t on 64-bit x86. This means we need to flush the
buffer less often, which is a significant speed win. All other platforms,
including all 32-bit ones, are unchanged. Output bitstream is the same.
All API constraints are kept in place, e.g., you still cannot put_bits()
more than 31 bits at a time. This is so that codecs cannot accidentally
become 64-bit-only or similar.
Benchmarking on transcoding to various formats shows consistently
positive results:
dnxhd 25.60 fps -> 26.26 fps ( +2.6%)
dvvideo 24.88 fps -> 25.17 fps ( +1.2%)
ffv1 14.32 fps -> 14.58 fps ( +1.8%)
huffyuv 58.75 fps -> 63.27 fps ( +7.7%)
jpegls 6.22 fps -> 6.34 fps ( +1.8%)
magicyuv 57.10 fps -> 63.29 fps (+10.8%)
mjpeg 48.65 fps -> 49.01 fps ( +0.7%)
mpeg1video 76.41 fps -> 77.01 fps ( +0.8%)
mpeg2video 75.99 fps -> 77.43 fps ( +1.9%)
mpeg4 80.66 fps -> 81.37 fps ( +0.9%)
prores 12.35 fps -> 12.88 fps ( +4.3%)
prores_ks 16.20 fps -> 16.80 fps ( +3.7%)
rv20 62.80 fps -> 62.99 fps ( +0.3%)
utvideo 68.41 fps -> 76.32 fps (+11.6%)
Note that this includes video decoding and all other encoding work,
such as DCTs. If you isolate the actual bit-writing routines, it is
likely to be much more.
Benchmark details: Transcoding the first 30 seconds of Big Buck Bunny
in 1080p, Haswell 2.1 GHz, GCC 8.3, generally quantizer locked to
5.0. (Exceptions: DNxHD needs fixed bitrate, and JPEG-LS is so slow
that I only took the first 10 seconds, not 30.) All runs were done
ten times and single-threaded, top and bottom two results discarded to
get rid of outliers, arithmetic mean between the remaining six.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Preparatory patch for making the bit buffer different size on different
platforms; make a typedef and make all the hardcoded sizes into expressions
deriving from this size.
No functional change; generated assembler is near-identical.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The earlier requirement was for the new buffer to be bigger than the old
one. This has been relaxed to only demand that the new buffer can hold
all the data written so far. This is in preparation for further commits.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
put_bits64() can write up to 64 bits into a bitstream.
Reviewed-by: Mark Thompson <sw@jkqxz.net>
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Jun Zhao <jun.zhao@intel.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This causes a overall slowdown of 0.1 % (tested with mpeg4 single thread encoding of matrixbench at QP=3)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
If flush_put_bits() is called when the 32-bit buffer is empty,
e.g. after writing a multiple of 32 bits, and invalid shift by
32 is performed. Since flush_put_bits() is called infrequently,
this additional check should have negligible performance impact.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Other parts of FFmpeg use NE (native endian) rather than ME (machine).
This makes it consistent.
Originally committed as revision 24169 to svn://svn.ffmpeg.org/ffmpeg/trunk
Passing an explicit filename to this command is only necessary if the
documentation in the @file block refers to a file different from the
one the block resides in.
Originally committed as revision 22921 to svn://svn.ffmpeg.org/ffmpeg/trunk
undefined shift behaviour.
Document this, fix the assert and add a put_bits32 to handle writing 32
bits and use that where necessary.
Originally committed as revision 20124 to svn://svn.ffmpeg.org/ffmpeg/trunk
improve plain text doxy readability.
See the thread: "[RFC] Should we use doxygen markup?".
Originally committed as revision 19122 to svn://svn.ffmpeg.org/ffmpeg/trunk