Some functions have slightly different indentation styles; try
to match the surrounding code.
libavcodec/aarch64/vc1dsp_neon.S is skipped here, as it intentionally
uses a layered indentation style to visually show how different
unrolled/interleaved phases fit together.
Signed-off-by: Martin Storsjö <martin@martin.st>
This patch changes the return instruction in the tr_32x4 macro from
BR to RET.
Function returns should always use the RET instruction instead of BR,
to avoid interfering with branch prediction.
On devices that support BTI, this is observeable as a landing pad is
required when branching with BR. The change fixes
fate-hevc-hdr-vivid-metadata when on hardware with BTI support.
Signed-off-by: Casey Smalley <casey.smalley@arm.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
ret can be given an argument instead.
This is also consistent with how other assembler code
in FFmpeg does it.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
This is marginally slower, but correct for all input values.
The previous implementation failed with certain input seeds, e.g.
"checkasm --test=hevc_idct 98".
Signed-off-by: Martin Storsjö <martin@martin.st>
Makes SIMD-optimized 8x8 and 16x16 idcts for 8 and 10 bit depth
available on aarch64.
For a UHD HDR (10 bit) sample video these were consuming the most time
and this optimization reduced overall decode time from 19.4s to 16.4s,
approximately 15% speedup.
Test sample was the first 300 frames of "LG 4K HDR Demo - New York.ts",
running on Apple M1.
Signed-off-by: Josh Dekker <josh@itanimul.li>