This starts with one-time initialisation of the 26 constant factors
like 08edacc248. That is done with
the scalar instruction set. While the formula can readily be vectored,
the gains would (probably) be more than lost in transfering the results
back to FP registers (or suitably reshuffling them into vector
registers).
Note that the main loop could likely be scheduled sligthly better by
expanding the filter macro and interleaving loads with arithmetic.
It is not clear yet if that would be relevant for vector processing (as
opposed to traditional SIMD).
We could also use fewer vectors, but there is not much point in sparing
them (they are *all* callee-clobbered).
This uses the following vectorisation:
for (i = 0; i < blocksize; i++) {
ang[i] = mag[i] - copysignf(fmaxf(ang[i], 0.f), mag[i]);
mag[i] = mag[i] - copysignf(fminf(ang[i], 0.f), mag[i]);
}
RVV defines a total of 12 different extensions, including:
- 5 different instruction subsets:
- Zve32x: 8-, 16- and 32-bit integers,
- Zve32f: Zve32x plus single precision floats,
- Zve64x: Zve32x plus 64-bit integers,
- Zve64f: Zve32f plus Zve64x,
- Zve64d: Zve64f plus double precision floats.
- 6 different vector lengths:
- Zvl32b (embedded only),
- Zvl64b (embedded only),
- Zvl128b,
- Zvl256b,
- Zvl512b,
- Zvl1024b,
- and the V extension proper: equivalent to Zve64f and Zvl128b.
In total, there are 6 different possible sets of supported instructions
(including the empty set), but for convenience we allocate one bit for
each type sets: up-to-32-bit ints (RVV_I32), floats (RVV_F32),
64-bit ints (RVV_I64) and doubles (RVV_F64).
Whence the vector size is needed, it can be retrieved by reading the
unprivileged read-only vlenb CSR. This should probably be a separate
helper macro if needed at a later point.
RV64G supports MIN & MAX instructions natively only on floating point
registers, not general purpose ones. The later would require the Zbb
extension. Due to that, it is actually faster to perform the clipping
"properly" in FPU.
Benchmarks on SiFive U74-MC (courtesy of Shanghai StarFive Tech):
audiodsp.vector_clipf_c: 29551.5
audiodsp.vector_clipf_rvf: 17871.0
Also tried unrolling with 2 or 8 elements but it gets worse either way.
This introduces compile-time and run-time CPU detection on RISC-V. In
practice, I doubt that FFmpeg will ever see a RISC-V CPU without all of
I, F and D extensions, and if it does, it probably won't have run-time
detection. So the flags are essentially always set.
But as things stand, checkasm wants them that way. Compare the ARMV8
flag on AArch64. We are nowhere near running short on CPU flag bits.
When force_original_aspect_ratio and force_divisible_by are both
used, dimensions are now rounded to the nearest allowed multiple of
force_divisible_by rather than first rounding to the nearest integer and
then rounding in a static direction. This results in less distortion of
the aspect ratio.
Reviewed-by: Thierry Foucu <tfoucu@google.com>
Signed-off-by: Tristan Schmelcher <tschmelcher@google.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The general demuxing API uses parsers and decoders. Therefore
FFStream contains pointers to AVCodecContexts and
AVCodecParserContext and lavf/internal.h includes lavc/avcodec.h.
Yet actually only a few files files really use these; and it is best
when this number stays small. Therefore this commit uses opaque
structs in lavf/internal.h for these contexts and stops including
avcodec.h.
This also avoids including lavc/codec_desc.h implicitly. All other
headers are implicitly included as now (mostly through codec.h).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
avcodec_enum_to_chroma_pos() and avcodec_chroma_pos_to_enum()
deal with enum AVChromaLocation which is defined in lavu.
These functions are therefore replaced by
av_chroma_location_enum_to_pos() and av_chroma_location_pos_to_enum().
This commit provides the necessary deprecations. Also already make
these functions wrappers around the corresponding lavu functions
as not doing so would force one to disable deprecation warnings.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
They are intended as replacements for avcodec_enum_to_chroma_pos()
and avcodec_chroma_pos_to_enum().
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
They are also frequently used in libavformat.
This change does not cause any breakage as avcodec.h
includes defs.h.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This also tests writing slice data in the unaligned mode
(some of these files use CAVLC) as well as updating
side data as well as parsing ISOBMFF avcc extradata.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is unnecessary because an av_shrink_packet() a few lines below
will set the size; furthermore, it is actually harmful, because
av_shrink_packet() does nothing in case the size already matches,
so that the packet's padding is not correctly zeroed.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is e.g. legal for an ISOBMFF avcc to contain zero parameter sets.
In this case the annex B that we produce would be empty and therefore
useless. This happens e.g. with mov/frag_overlap.mp4 from the
FATE-suite.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
There is no check for whether these supposedly redundant PPS
are actually redundant. One could check via memcmp which would
work in practice* (because all content buffers are initially
zero-allocated), but this is not portable as compilers may
trash padding inside structures as they wish.
In case the PPS is not really redundant the output is garbage.
This happens with several files from the FATE-suite. E.g.
h264-conformance/CVCANLMA2_Sony_C.jsv doesn't decode correctly
any more, whereas h264-conformance/CABA3_TOSHIBA_E.264 even
fails in ff_cbs_write_packet(), because the inferred value
of num_ref_idx_l0_active_minus1 mismatches with the value set
in the slice (this happens when num_ref_idx_l0_default_active_minus1
changes in the PPS; the value in the slice header is inferred from
the original PPS's num_ref_idx_l0_default_active_minus1).
*: Unless slice_group_id is used, i.e. unless slice_group_map_type
is six.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>