Since the horizontal and vertical filters are identical except for a
transposition, this uses a common subprocedure with an ad-hoc ABI.
To preserve return-address stack prediction, a link register has to be
used (c.f. the "Control Transfer Instructions" from the
RISC-V ISA Manual). The alternate/temporary link register T0 is used
here, so that the normal RA is preserved (something Arm cannot do!).
To load the strength value based on `qscale`, the shortest possible
and PIC-compatible sequence is used: AUIPC; ADD; LBU. The classic
LLA; ADD; LBU sequence would add one more instruction since LLA is a
convenience alias for AUIPC; ADDI. To ensure that this trick works,
relocation relaxation is disabled.
To implement the two signed divisions by a power of two toward zero:
(x / (1 << SHIFT))
the code relies on the small range of integers involved, computing:
(x + (x >> (16 - SHIFT))) >> SHIFT
rather than the more general:
(x + ((x >> (16 - 1)) & ((1 << SHIFT) - 1))) >> SHIFT
Thus one ANDI instruction is avoided.
T-Head C908:
h263dsp.h_loop_filter_c: 228.2
h263dsp.h_loop_filter_rvv_i32: 144.0
h263dsp.v_loop_filter_c: 242.7
h263dsp.v_loop_filter_rvv_i32: 114.0
(C is probably worse in real use due to less predictible branches.)
This patch adds MSA (MIPS-SIMD-Arch) optimizations for H263 lpf functions in new file h263dsp_msa.c
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This allows us to remove FF_IDCT_WMV2, which serves no practical purpose
other than to be able to select the WMV2 IDCT for MPEG (or vice versa)
and get corrupt output.
Fate tests for all wmv2-related tests change, because (for some obscure
reason) they forced use of the MPEG IDCT. You would get the same changes
previously by not using -idct simple in the fate test (or replacing it
with -idct auto).
These decoders use a special non-MPEG2 IDCT. Call it directly
instead of going through dsputil. There is never any reason
to use a regular IDCT with these decoders or to use the EA IDCT
with other codecs.
This also fixes the bizarre situation of eamad and eatqi decoding
incorrectly if eatgq is disabled.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The were broken since August of 2010 without anyone noticing until
three weeks ago. Nobody cares about it anymore and hopefully Marvell
will support NEON like in the PXA978 from now on.
It contains optimizations that are not specific to i386 and
libavutil uses this naming scheme already.
Originally committed as revision 16270 to svn://svn.ffmpeg.org/ffmpeg/trunk
c is 1.9x faster than previous c (on various x86 cpus), sse is 1.6x faster than previous sse.
Originally committed as revision 14698 to svn://svn.ffmpeg.org/ffmpeg/trunk
include paths in the source files.
mostly from a patch by Ronald S. Bultje, rbultje ronald.bitfreak net
Originally committed as revision 9034 to svn://svn.ffmpeg.org/ffmpeg/trunk
Patch by Zuxy Meng, zuxy <<dot>> meng >>at<< gmail <<dot>> com
Minor non-functional diff-related fixes by me.
Originally committed as revision 5125 to svn://svn.ffmpeg.org/ffmpeg/trunk