Load the specific destination bytes instead of MSA load and pack.
Pack the data to half word before clipping.
Use immediate unsigned saturation for clip to max saving one vector register.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Preload data in band filter 0-8 for better pipeline parallelization.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Refer to "checkasm: use perf API on Linux ARM*" commit for the
rationale.
The implementation is somehow duplicated with checkasm, but so is the
current usage of AV_READ_TIME(). Until these implementations and
heuristics are made consistent, I don't see a way of sharing that code.
Note: when using libavutil/timer.h, it is now important to include
before any other include due to the _GNU_SOURCE requirement.
Load the specific destination bytes instead of MSA load and pack.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
We currently only have exported data symbols within libavcodec, but
the concept is easy to extend to other libraries if necessary.
The attribute declaration needs to be in a private header though,
since we can't use CONFIG_SHARED in public installed headers.
Signed-off-by: Martin Storsjö <martin@martin.st>
The toolchain for this target is unmaintained since many years.
While it has been continuously build tested on fate, it hasn't
actually been tested at runtime since many, many years (and back
then, only a few codecs in libavcodec were tested).
So far, keeping support for it has been mostly effortless, but
the compiler does seem to have issues with dllimported data symbols,
ending up as internal compiler errors in some cases. Instead of
jumping through further hoops to work around that, just remove the
target.
Signed-off-by: Martin Storsjö <martin@martin.st>
On windows, the offset for the relocation doesn't get stored in
the relocation itself, but as an unsigned immediate in the opcode.
Therefore, negative offsets has to be handled via a separate sub
instruction, just as on MachO.
Signed-off-by: Martin Storsjö <martin@martin.st>
Improved version of VBROADCASTSS that works like the avx2 instruction.
Emulation of vpbroadcastd.
Horizontal sum HSUMPS that places the result in all elements.
Emulation of blendvps and pblendvb.
Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com>
If using the winstore compat library, a fallback LoadLibrary
function does exist, that only calls LoadPackagedLibrary though
(which doesn't work for dynamically loading d3d11 DLLs).
Therefore explicitly check the targeted API family instead.
Make this check a reusable HAVE_* component which other parts
of the libraries can check when necessary as well.
Signed-off-by: Martin Storsjö <martin@martin.st>
Merged from Libav commit 4d330da006.
Black isn't always just memset(ptr, 0, size). Limited YUV in particular
requires relatively non-obvious values, and filling a frame with
repeating 0 bytes is disallowed in some contexts. With component sizes
larger than 8 or packed YUV, this can become relatively complicated. So
having a generic function for this seems helpful.
In order to handle the complex cases in a generic way without destroying
performance, this code attempts to compute a black pixel, and then uses
that value to clear the image data quickly by using a function like
memset.
Common cases like yuv410p10 or rgba can't be handled with a simple
memset, so there is some code to fill memory with 2/4/8 byte patterns.
For the remaining cases, a generic slow fallback is used.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Merged from Libav commit 45df7adc1d.
Black isn't always just memset(ptr, 0, size). Limited YUV in particular
requires relatively non-obvious values, and filling a frame with
repeating 0 bytes is disallowed in some contexts. With component sizes
larger than 8 or packed YUV, this can become relatively complicated. So
having a generic function for this seems helpful.
In order to handle the complex cases in a generic way without destroying
performance, this code attempts to compute a black pixel, and then uses
that value to clear the image data quickly by using a function like
memset.
Common cases like yuv410p10 or rgba can't be handled with a simple
memset, so there is some code to fill memory with 2/4/8 byte patterns.
For the remaining cases, a generic slow fallback is used.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Many image formats support embedding of ICC profiles directly in
their bitstreams. Add a new side data type to allow exposing them to
API users.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Reduced msa load-store code.
Removed inline asm of GP load-store for 64 bit.
Updated variable names in GP load-store macros for naming consistency.
Corrected macro descriptions.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Rework it to improve performance. Now mutex is not shared by workers,
instead each worker has its own mutex and condition variable. This
reduces lock contention between workers. Also use atomic variable for
counter.
The interface also allows execute to run special function on main
thread, requested by Ronald.
Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>
Hardware pixel formats do not tell you anything about their actual
contents, but should still score higher than formats with completely
unknown properties, which in turn should score higher than invalid
formats.
Do not return an AVERROR code as a score.
Fixes a hang in libavfilter where format negotiation gets stuck in a
loop because AV_PIX_FMT_NONE scores more highly than all other
possibilities.
If using the winstore compat library, a fallback LoadLibrary
function does exist, that only calls LoadPackagedLibrary though
(which doesn't work for dynamically loading d3d11 DLLs).
Therefore explicitly check the targeted API family instead.
Make this check a reusable HAVE_* component which other parts
of the libraries can check when necessary as well.
Signed-off-by: Martin Storsjö <martin@martin.st>
Some devices (some phones, apparently) will support only this opaque
format. Of course this won't work with CLI, because copying data
directly is not supported.
Automatic frame allocation (setting AVCodecContext.hw_device_ctx) does
not support this mode, even if it's the only supported mode. But since
opaque surfaces are generally less useful, that's probably ok.
Merges Libav commit 5030e3856c.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Makes dealing with formats that can not be used for staging textures
easier (DXGI_FORMAT_420_OPAQUE). It also saves memory if the staging
texture is never needed, so this is a good thing.
Merges Libav commit 98d73e4174.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
It appears in this case, frames_ininit is called twice (once by
av_hwframe_ctx_init(), and again by unreffing the frames ctx ref).
Merges Libav commit 086321c612.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>