The ARM crc32 instructions will be used only if an architecture is
explicitly specified at compile time that has those instructions.
For example, -march=armv8.1-a or -march=armv8-a+crc, or if the
machine being compiled on has the instructions, -march=native.
Define the macro Z_ARM_CRC32 at compile time to use the ARMv8
(aarch64) crc32x and crc32b instructions. This code does not check
for the presence of the crc32 instructions. Those instructions are
optional for ARMv8.0, though mandatory for ARMv8.1 and later. The
use of the crc32 instructions is about ten times as fast as the
software braided calculation of the CRC-32. This can noticeably
speed up the decompression of gzip streams.
inflateSync() is used to skip invalid deflate data, which means
that the check value that was being computed is no longer useful.
This commit turns off the check value computation, and furthermore
allows a successful return if the compressed data terminated in a
graceful manner. This commit also fixes a bug in the case that
inflateSync() is used before a header is ever processed. In that
case, there is no knowledge of a trailer, so the remainder is
treated as raw.
Use the interleaved method of Kadatch and Jenkins in order to make
use of pipelined instructions through multiple ALUs in a single
core. This also speeds up and simplifies the combination of CRCs,
and updates the functions to pre-calculate and use an operator for
CRC combination.
When the same len2 is used repeatedly, it is faster to use
crc32_combine_gen() to generate an operator, that is then used to
combine CRCs with crc32_combine_op().
There is no assurance that all prefix codes are reachable as
optimal Huffman codes for the numbers of symbols encountered in
a deflate block. This code considers all possible prefix codes,
which might be a larger set than all possible Huffman codes,
depending on the constraints.
This bug was reported by Danilo Ramos of Eideticom, Inc. It has
lain in wait 13 years before being found! The bug was introduced
in zlib 1.2.2.2, with the addition of the Z_FIXED option. That
option forces the use of fixed Huffman codes. For rare inputs with
a large number of distant matches, the pending buffer into which
the compressed data is written can overwrite the distance symbol
table which it overlays. That results in corrupted output due to
invalid distances, and can result in out-of-bound accesses,
crashing the application.
The fix here combines the distance buffer and literal/length
buffers into a single symbol buffer. Now three bytes of pending
buffer space are opened up for each literal or length/distance
pair consumed, instead of the previous two bytes. This assures
that the pending buffer cannot overwrite the symbol table, since
the maximum fixed code compressed length/distance is 31 bits, and
since there are four bytes of pending space for every three bytes
of symbol space.
This isn't the right type anyway to assure that it contains a
pointer. That type would be intptr_t or uintptr_t. However the C99
standard says that those types are optional, so their use would not
be portable. This commit simply uses size_t or whatever configure
decided to use for size_t. That would be the same length as
ptrdiff_t, and so will work just as well. The code checks to see if
the length of the type used is the same as the length of a void
pointer, so there is already protection against the use of the
wrong type. The use of size_t (or ptrdiff_t) will almost always
work, as all modern architectures have an array size that is the
same as the pointer size. Only old segmented architectures would
have to fall back to the slower CRC-32 calculation, where the
amount of memory that can be accessed is larger than the maximum
array size.
If zlib and/or gzip header processing was requested, but a header
was never provided and inflateSync was used successfully, then the
inflate state would be inconsistent, trying to compute a check
value but with no flags set. This commit sets the inflate mode to
raw in this case, since there is no other assumption that can be
made if a header was requested but never seen.
This is a problem in the odd case that the second argument of
LSEEK is a larger type than off_t. Apparently MinGW defines off_t
to be 32 bits, but _lseeki64 has a 64-bit second argument.
Also undo a previous commit to permit MinGW to use _lseeki64.