FFmpeg

Commit Graph

Author	SHA1	Message	Date
Rémi Denis-Courmont	9aeb6aca3a	lavu/floatdsp: RISC-V V vector_fmul_reverse	2 years ago
Rémi Denis-Courmont	47ce9735cc	lavu/floatdsp: RISC-V V butterflies_float	2 years ago
Rémi Denis-Courmont	f4ea45040f	lavu/floatdsp: RISC-V V vector_fmul_add	2 years ago
Rémi Denis-Courmont	d120ab5b91	lavu/floatdsp: RISC-V V vector_dmac_scalar	2 years ago
Rémi Denis-Courmont	c3db27ba95	lavu/floatdsp: RISC-V V vector_fmac_scalar	2 years ago
Rémi Denis-Courmont	da169a210d	lavu/floatdsp: RISC-V V vector_dmul	2 years ago
Rémi Denis-Courmont	7058af9969	lavu/floatdsp: RISC-V V vector_fmul	2 years ago
Rémi Denis-Courmont	89b7ec65a8	lavu/floatdsp: RISC-V V vector_dmul_scalar	2 years ago
Rémi Denis-Courmont	a6c10d05fe	lavu/floatdsp: RISC-V V vector_fmul_scalar This is based on existing code from the VLC git tree with two minor changes to account for the different function prototypes.	2 years ago
Rémi Denis-Courmont	39357cad37	lavu/riscv: fallback macros for SH{1, 2, 3}ADD Those mnemonics require the very latest binutils release at the time of writing. These macros provide seamless backward compatibility.	2 years ago
Rémi Denis-Courmont	0c0a3deb18	lavu/cpu: CPU flags for the RISC-V Vector extension RVV defines a total of 12 different extensions, including: - 5 different instruction subsets: - Zve32x: 8-, 16- and 32-bit integers, - Zve32f: Zve32x plus single precision floats, - Zve64x: Zve32x plus 64-bit integers, - Zve64f: Zve32f plus Zve64x, - Zve64d: Zve64f plus double precision floats. - 6 different vector lengths: - Zvl32b (embedded only), - Zvl64b (embedded only), - Zvl128b, - Zvl256b, - Zvl512b, - Zvl1024b, - and the V extension proper: equivalent to Zve64f and Zvl128b. In total, there are 6 different possible sets of supported instructions (including the empty set), but for convenience we allocate one bit for each type sets: up-to-32-bit ints (RVV_I32), floats (RVV_F32), 64-bit ints (RVV_I64) and doubles (RVV_F64). Whence the vector size is needed, it can be retrieved by reading the unprivileged read-only vlenb CSR. This should probably be a separate helper macro if needed at a later point.	2 years ago
Rémi Denis-Courmont	746f1ff36a	lavu/riscv: initial common header for assembler macros	2 years ago
Rémi Denis-Courmont	b95e2fbd85	lavu/cpu: detect RISC-V base extensions This introduces compile-time and run-time CPU detection on RISC-V. In practice, I doubt that FFmpeg will ever see a RISC-V CPU without all of I, F and D extensions, and if it does, it probably won't have run-time detection. So the flags are essentially always set. But as things stand, checkasm wants them that way. Compare the ARMV8 flag on AArch64. We are nowhere near running short on CPU flag bits.	2 years ago
Andreas Rheinhardt	8be6552aa4	avutil/pixdesc: Add av_chroma_location_(enum_to_pos\|pos_to_enum) They are intended as replacements for avcodec_enum_to_chroma_pos() and avcodec_chroma_pos_to_enum(). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Paul B Mahol	7bb0afc245	avutil: add RGBA single-float precision packed formats	2 years ago
Paul B Mahol	63bb6d6a9b	avutil: add RGB single-precision float formats	2 years ago
Lynne	f21899db7d	x86/tx_float: enable AVX-only split-radix FFT codelets Sandy Bridge, Ivy Bridge and Bulldozer cores don't support FMA3.	2 years ago
James Almer	d2f482965f	x86/tx_float: fix some symbol names Should fix compilation on MacOS Signed-off-by: James Almer <jamrial@gmail.com>	2 years ago
James Almer	0d8f43c74d	x86/tx_float: change a condition in a preprocessor check Fixes compilation with yasm. Signed-off-by: James Almer <jamrial@gmail.com>	2 years ago
James Almer	750f378bec	x86/tx_float: add missing preprocessor wrapper for AVX2 functions Fixes compilation with old assemblers. Signed-off-by: James Almer <jamrial@gmail.com>	2 years ago
Lynne	e7a987d7c9	lavu/tx: remove special -1 inverted lookup mode It was somewhat hacky and unnecessary.	2 years ago
Lynne	74e8541bab	x86/tx_float: generalize iMDCT To support non-aligned buffers during the post-transform step, just iterate backwards over the array. This allows using the 15xN-point FFT, with which the speed is 2.1 times faster than our old libavcodec implementation.	2 years ago
Lynne	ace42cf581	x86/tx_float: add 15xN PFA FFT AVX SIMD ~4x faster than the C version. The shuffles in the 15pt dim1 are seriously expensive. Not happy with it, but I'm contempt. Can be easily converted to pure AVX by removing all vpermpd/vpermps instructions.	2 years ago
Lynne	3241e9225c	x86/tx_float: adjust internal ASM call ABI again There are many ways to go about it, and this one seems optimal for both MDCTs and PFA FFTs without requiring excessive instructions or stack usage.	2 years ago
Lynne	7e7baf8ab8	lavu/tx: do not steal lookup tables of subcontexts in the iMDCT As it happens, some still need their contexts.	2 years ago
James Almer	05cff214b9	avutil/channel_layout: mention how the API user should treat channel orders it does not understand In case new orders are added in the future, existing library users can still use the layout simply by ignoring everything but the channel count in it, so make this explicit. Reviewed-by: Anton Khirnov <anton@khirnov.net> Signed-off-by: James Almer <jamrial@gmail.com>	2 years ago
Andreas Rheinhardt	187cd27832	avutil/dict: Error out in case of key == NULL Up until now, using NULL as key in av_dict_get() on a non-empty AVDictionary would crash; using NULL as key in av_dict_set() would also crash for a non-empty AVDictionary unless AV_DICT_MULTIKEY was set; in case the dictionary was initially empty or AV_DICT_MULTIKEY was set, it was even possible for av_dict_set() to succeed when adding a NULL key, namely when one uses a value != NULL and the AV_DICT_DONT_STRDUP_VAL flag. Using av_dict_get() on such an AVDictionary will usually lead to crashes, though. Fix this by actually checking for key in both functions; error out if they are NULL. While just at it, also stop relying on av_strdup(NULL) to return NULL in av_dict_set(). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Lynne	4ba68639ca	x86/tx_float: add asm call versions of the 2pt and 4pt transforms Verified to be working.	2 years ago
Lynne	892548e6a1	x86/tx_float: fully support 128bit regs in LOAD64_LUT The gather path didn't support 128bit registers. It's not faster on Zen 3, but it's here for completeness.	2 years ago
Lynne	af42bb3d61	x86/tx_float: simplify and describe the intra-asm call convention	2 years ago
Philip Langdale	ed83a3a5bd	lavu/pixdesc: favour formats where depth and subsampling exactly match Since introducing the various packed formats used by VAAPI (and p012), we've noticed that there's actually a gap in how av_find_best_pix_fmt_of_2 works. It doesn't actually assign any value to having the same bit depth as the source format, when comparing against formats with a higher bit depth. This usually doesn't matter, because av_get_padded_bits_per_pixel() will account for it. However, as many of these formats use padding internally, we find that av_get_padded_bits_per_pixel() actually returns the same value for the 10 bit, 12 bit, 16 bit flavours, etc. In these tied situations, we end up just picking the first of the two provided formats, even if the second one should be preferred because it matches the actual bit depth. This bug already existed if you tried to compare yuv420p10 against p016 and p010, for example, but it simply hadn't come up before so we never noticed. But now, we actually got a situation in the VAAPI VP9 decoder where it offers both p010 and p012 because Profile 3 could be either depth and ends up picking p012 for 10 bit content due to the ordering of the testing. In addition, in the process of testing the fix, I realised we have the same gap when it comes to chroma subsampling - we do not favour a format that has exactly the same subsampling vs one with less subsampling when all else is equal. To fix this, I'm introducing a small score penalty if the bit depth or subsampling doesn't exactly match the source format. This will break the tie in favour of the format with the exact match, but not offset any of the other scoring penalties we already have. I have added a set of tests around these formats which will fail without this fix.	2 years ago
Rémi Denis-Courmont	6df3ad9687	lavu/riscv: fix off-by-one in bit-magnitude clip	2 years ago
Rémi Denis-Courmont	a90e5335b3	avutil/lfg: fix comment typo	2 years ago
Rémi Denis-Courmont	a5ce44f301	lavu/riscv: fix av_clip_int16 Some serious copy-paste / squash / rebase mismanipulation here. Signed-off-by: James Almer <jamrial@gmail.com>	2 years ago
Andreas Rheinhardt	e867a29ec1	avutil/dict: Improve appending values When appending two values (due to AV_DICT_APPEND), the earlier code would first zero-allocate a buffer of the required size and then copy both parts into it via av_strlcat(). This is problematic, as it leads to quadratic performance in case of frequent enlargements. Fix this by using av_realloc() (which is hopefully designed to handle such cases in a better way than simply throwing the buffer we already have away) and by copying the string via memcpy() (after all, we already calculated the strlen of both strings). Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Andreas Rheinhardt	c15dd31d2a	avutil/dict: Fix memleak when using AV_DICT_APPEND If a key already exists in an AVDictionary and the AV_DICT_APPEND flag is set, the old entry is at first discarded from the dictionary, but a pointer to the value is kept. Lateron enough memory to store the appended string is allocated; should this allocation fail, the old string is not freed and hence leaks. This commit changes this by moving creating the combined value to an earlier point in the function, which also ensures that the AVDictionary is unchanged in case of errors. Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Andreas Rheinhardt	f976ed7fcf	avutil/dict: Avoid check whose result is known in advance We know that an AVDictionary is not empty if we have just added an entry to it, so only check for it being empty on the branch that does not do so. Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Andreas Rheinhardt	e402bd65b1	Revert "avcodec/loongarch/h264chroma, vc1dsp_lasx: Add wrapper for __lasx_xvldx" This reverts commit `2c8dc7e953`. The loongarch headers have been fixed, so that this wrapper is no longer necessary. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Andreas Rheinhardt	1234df7501	Revert "avcodec/loongarch: Add wrapper for __lsx_vldx" This reverts commit `6c9a60ada4`. The loongarch headers have been fixed, so that this workaround is no longer necessary. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Rémi Denis-Courmont	c177108ae1	lavu/riscv: add <intmath.h> optimisations This provides some micro-optimisations for signed integer clipping, and support for bit weight with the Zbb extension.	2 years ago
Rémi Denis-Courmont	df2057041b	lavu/riscv: byte-swap operations If the target supports the Basic bit-manipulation (Zbb) extension, then the REV8 instruction is available to reverse byte order. Note that this instruction only exists at the "XLEN" register size, so we need to right shift the result down to the data width. If Zbb is not supported, then this patchset does nothing. Support for run-time detection is left for the future. Currently, there are no bits in auxv/ELF HWCAP for Z-extensions, so there are no clean ways to do this.	2 years ago
Rémi Denis-Courmont	d808070547	lavu/riscv: AV_READ_TIME cycle counter This uses the architected RISC-V 64-bit cycle counter from the RISC-V unprivileged instruction set. In 64-bit and 128-bit, this is a straightforward CSR read. In 32-bit mode, the 64-bit value is exposed as two CSRs, which cannot be read atomically, so a loop is necessary to detect and fix up the race condition where the bottom half wraps exactly between the two reads.	2 years ago
James Almer	bda3a9faf4	x86/float_dsp: use three operand form for some instructions Fixes compilation with old yasm Signed-off-by: James Almer <jamrial@gmail.com>	2 years ago
Paul B Mahol	72acff9f59	avutil/x86/float_dsp: add fma3 for scalarproduct	2 years ago
Andreas Rheinhardt	29c4c0886d	avutil/x86/intreadwrite: Add ability to detect whether MMX code is used It can be used to call emms_c() only when needed. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Lynne	f1b35fc8f0	lavu/tx: remove av_cold from table definitions How did this get here?	2 years ago
Lynne	c92edd969a	lavu/tx: rotate 3 & 15-point exptabs This just inverts their signs. Simplifies SIMD.	2 years ago
Lynne	51172223fd	lavu/tx: generalize MDCTs The same code can perform any-length MDCTs with minimal changes.	2 years ago
Lynne	645a1f4422	lavu/tx: add the inplace flag to PFA FFTs They support in-place, because they have to use a temporary buffer.	2 years ago
Lynne	8c283e8fe6	lavu/tx: propagate the codelet flags into the context The field is documented as a combination of both.	2 years ago

... 6 7 8 9 10 ...

5996 Commits (8661b5e8f9e63d2775978f2aa3ee6fae4d515c53)