ff_aac_coder_init_mips() modifies a static const structure of
function pointers. This will crash if the binary uses relro
and is a data race in any case.
Furthermore it points to a maintainability issue: The
AACCoefficientsEncoder structures have been constified
in commit fd9212f2ed,
a Libav commit merged in 318778de9e.
Libav did not have the MIPS-specific AAC code and so this was
fine for them; yet FFmpeg had them, but this was not recognized.
Commit 75a099fc73 points to another
maintainability issue: Contrary to ordinary DSP code, this code
here is way more complex and needs to be constantly kept in sync
with the ordinary code which it mimicks and replaces. Said commit
is the only commit actually changing aaccoder.c in the last few
years and the same change has not been performed for the MIPS
clone; before that, it even happened several times that the mips
code was broken due to changes of the generic code (see commits
97437bd17a and
de262d018d or
860dbe0275 or
933309a6ca or
b65ffa316e). This might even lead
to scenarios where someone changing non-dsp aacenc code would
have to modify mips inline asm in order to keep them in sync.
This is obviously a significant burden (if the AAC encoder were
actively developed).
Finally, the code does not even compile here due to errors like
"Error: float register should be even, was 1".
Reviewed-by: Lynne <dev@lynne.ee>
Reviewed-by: Jean-Baptiste Kempf <jb@videolan.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
All versions of MSVC that support C11 (namely >= v19.27)
also support the restrict keyword, therefore av_restrict
is no longer necessary since 75697836b1.
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Otherwise aacenc.o gets pulled in by the aacencdsp checkasm
test and it in turn pulls the rest of lavc in.
Besides being bad size-wise this also has the downside that
it pulls in avpriv_(cga|vga16)_font from libavutil which are
marked as being imported from another library when building
libavcodec as a DLL and this breaks checkasm because it links
both lavc and lavu statically.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is decoder-only; furthermore, there is already
an AACContext in use by libfdk-aacenc.
Also make aacdec.h provide the typedef for AACContext;
up until now, this has been done by sbr.h.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Besides simplifying address computations (it saves 432B of .text
in hevcdsp.o alone here) it also fixes undefined behaviour that
occurs if mx or my are 0 (happens when the filters are unused)
because they lead to an array index of -1 in the old code.
This happens in the checkasm-hevc_pel FATE-test.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Reviewed-by: Nuo Mi <nuomi2021@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
They were replaced by TX from libavutil; the tremendous work
to get to this point (both creating TX as well as porting
the users of the components removed in this commit) was
completely performed by Lynne alone.
Removing the subsystems from configure may break some command lines,
because the --disable-fft etc. options are no longer recognized.
Co-authored-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This patch replaces the transform used in AAC with lavu/tx and removes
the limitation on only being able to decode 960-sample files
with the float decoder.
This commit also removes a whole bunch of unnecessary and slow
lifting steps the decoder did to compensate for the poor accuracy
of the old integer transformation code.
Overall float decoder speedup on Zen 3 for 64kbps: 32%
Possible since 48ac225db2.
Furthermore, the current code would not work on mips
in case ff_lsp2polyf() were used outside of lsp.c,
because it is not compiled on mips since commit
3827a86eac at all;
instead it is overridden with a static av_always_inline
function which only works for the callers in lsp.c.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The patch fixes the bugs that occurred when running
fate-checkasm-hevc_pel on MIPS paltform.
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This patch fixes a bug where the fate-checkasm-motion fails when
h is not a multiple of 8.
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Reviewed-by: Peter Ross <pross@xvid.org>
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The latest commit of Loongson MMI macro replaces were incorrect.
It makes a mass of green tints on HEVC videos when playing. I've
compared it with the older MMI implementation, and found out that
several lines have been replaced by wrong macros.
Signed-off-by: Qi Tiezheng <qitiezheng@360.cn>
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The HEVC decoder has both HEVCContext and HEVCLocalContext
structures. The latter is supposed to be the structure
containing the per-slicethread state.
Yet that is not how it is handled in practice: Each HEVCLocalContext
has a unique HEVCContext allocated for it and each of these
coincides except in exactly one field: The corresponding
HEVCLocalContext. This makes it possible to pass the HEVCContext
everywhere where logically a HEVCLocalContext should be used.
This commit stops doing this for lavc/hevcpred as well as
the corresponding mips code; the latter is untested.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Move AC3HeaderInfo into ac3_parser_internal.h and the rest
into a new header ac3defs.h.
This also breaks an include cycle of ac3.h and ac3tab.h
(the latter now only needs ac3defs.h).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The existing x86 assembly for loop filters uses the stride as a
full register without clearing/sign extending the upper half
of the registers on x86_64.
This avoids crashes if the caller would have passed nonzero bits
in the previously undefined upper 32 bits of the parameters.
Signed-off-by: Martin Storsjö <martin@martin.st>
Some of these were made possible by moving several common macros to
libavutil/macros.h.
While just at it, also improve the other headers a bit.
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
mips has several headers that are only meant for inclusion in another
non-arch specific file; they do not even try to be standalone. So don't
test them in checkheaders.
Also fix vp9dsp_mips.h, an ordinary header missing some includes.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
wsbh is only avilable for MIPS R2+.
Provide a fallback for older processors.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Loongson3's extention instructions (prefixed with gs) are widely used
in our MMI codebase. However, these instructions are not avilable on
Loongson-2E/F while MMI code should work on these processors.
Previously we introduced mmiutils marcos to provide backward compactbility
but newly commited code didn't follow that. In this patch I revised the
codebase and converted all these instructions into MMI marcos to get
Loongson2 supproted again.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The data width of gsldrc1/gsldlc1 should be 8 bytes wide.
Signed-off-by: Jin Bo <jinbo@loongson.cn>
Reviewed-by: yinshiyou-hf@loongson.cn
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Clang is more strict on the type of asm operands, float or double
type variable should use constraint 'f', integer variable should
use constraint 'r'.
Signed-off-by: Jin Bo <jinbo@loongson.cn>
Reviewed-by: yinshiyou-hf@loongson.cn
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
1.'xor,or,and' to 'pxor,por,pand'. In the case of operating FPR,
gcc supports both of them, clang only supports the second type.
2.'dsrl,srl' to 'ssrld,ssrlw'. In the case of operating FPR, gcc
supports both of them, clang only supports the second type.
Signed-off-by: Jin Bo <jinbo@loongson.cn>
Reviewed-by: yinshiyou-hf@loongson.cn
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Using mask to avoid judgment, H264 4K decoding speed
improved about 0.1fps tested on 3A4000
Signed-off-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
1. Refined function get_cabac_inline_mips.
2. Optimize function get_cabac_bypass and get_cabac_bypass_sign.
Speed of decoding h264: 4.89x ==> 5.05x(tested on 3A4000).
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The MSA optimization has been refined in commit 93218c2 and ce0a52e.
It is better than MMI version now.
Speed of decoding H264: 4.83x ==> 4.89x (tested on 3A4000).
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Some files currently rely on libavutil/cpu.h to include it for them;
yet said file won't use include it any more after the currently
deprecated functions are removed, so include attributes.h directly.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The csa_tables (which always consist of 32 entries of four byte each,
but the type depends upon whether the decoder is fixed or
floating-point) are currently initialized once during decoder
initialization; yet it turns out that this is actually no benefit: The
code used to initialize these tables takes up 153 (fixed point) and 122
(floating point) bytes when compiled with GCC 9.3 with -O3 on x64, so it
is better to just hardcode these tables.
Essentially the same applies to the is_tables: They have a size of 128B
each and the code to initialize them occupies 149 (fixed point) resp.
140 (floating point) bytes. So hardcode them, too.
To make the origin of the tables clear, references to the code used to
create them have been added.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Commit 1af615683e put initializing
the ff_fft_offsets_lut (which is typically used if FFT_FIXED_32)
behind an ff_thread_once() to make ff_fft_init() thread-safe; yet
there is a second place where said table may be initialized which
is not guarded by this AVOnce: ff_fft_init_mips(). MIPS uses this LUT
even for ordinary floating point FFTs, so that ff_fft_init() is not
thread-safe (on MIPS) for both 32bit fixed-point as well as
floating-point FFTs; e.g. ff_mdct_init() inherits this flaw and
therefore initializing e.g. the AAC decoders is not thread-safe (on
MIPS) despite them having FF_CODEC_CAP_INIT_CLEANUP set.
This commit fixes this by moving the AVOnce to fft_init_table.c and
using it to guard all initializations of ff_fft_offsets_lut.
(It is not that bad in practice, because every entry of
ff_fft_offsets_lut is never read during initialization and is only once
ever written to (namely to its final value); but even these are
conflicting actions which are (by definition) data races and lead to
undefined behaviour.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Test case fate-checkasm-h264pred failed in latest community code.
This patch fixed the bug.
Signed-off-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
'li.s' is a synthesized instruction, it does not work properly
when compiled with clang on mips, and A segfault occurred.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Failed fate case: fate-h264-conformance-caba2_sony_e
Clang is more strict in the use of register constraint.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>