The fields can be accessed directly, so these are not needed anymore.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
Use immediate unsigned saturation for clip to max saving one vector register.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Technically _tzcnt* intrinsics are only available when the BMI
instruction set is present. However the instruction encoding
degrades to "rep bsf" on older processors.
Clang for Windows debatably restricts the _tzcnt* instrinics behind
the __BMI__ architecture define, so check for its presence or
exclude the usage of these intrinics when clang is present.
See also:
https://ffmpeg.org/pipermail/ffmpeg-devel/2015-November/183404.htmlhttps://bugs.llvm.org/show_bug.cgi?id=30506http://lists.llvm.org/pipermail/cfe-dev/2016-October/051034.html
Signed-off-by: Dale Curtis <dalecurtis@chromium.org>
Reviewed-by: Matt Oliver <protogonoi@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The extra space got included as part of the expansion of ELF, which
later interfered with gas-preprocessor which earlier only stripped out
leftover lines starting with '#' if the line started with that char.
Signed-off-by: Martin Storsjö <martin@martin.st>
Replace generic with block size specific function.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The initialisation should be common. For libmfx, it was previously
happening in the derivation function and this moves it out. For VAAPI,
it fixes some failures when deriving from a DRM device because this
initialisation did not run.
Replace generic with block size specific function.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Load the specific destination bytes instead of MSA load and pack.
Pack the data to half word before clipping.
Use immediate unsigned saturation for clip to max saving one vector register.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Preload data in band filter 0-8 for better pipeline parallelization.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Refer to "checkasm: use perf API on Linux ARM*" commit for the
rationale.
The implementation is somehow duplicated with checkasm, but so is the
current usage of AV_READ_TIME(). Until these implementations and
heuristics are made consistent, I don't see a way of sharing that code.
Note: when using libavutil/timer.h, it is now important to include
before any other include due to the _GNU_SOURCE requirement.
Load the specific destination bytes instead of MSA load and pack.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
We currently only have exported data symbols within libavcodec, but
the concept is easy to extend to other libraries if necessary.
The attribute declaration needs to be in a private header though,
since we can't use CONFIG_SHARED in public installed headers.
Signed-off-by: Martin Storsjö <martin@martin.st>