FFmpeg

Commit Graph

Author	SHA1	Message	Date
Martin Storsjö	eabc5abf94	arm: vp9itxfm16: Do a simpler half/quarter idct16/idct32 when possible This work is sponsored by, and copyright, Google. This avoids loading and calculating coefficients that we know will be zero, and avoids filling the temp buffer with zeros in places where we know the second pass won't read. This gives a pretty substantial speedup for the smaller subpartitions. The code size increases from 14516 bytes to 22484 bytes. The idct16/32_end macros are moved above the individual functions; the instructions themselves are unchanged, but since new functions are added at the same place where the code is moved from, the diff looks rather messy. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub1_add_10_neon: 454.0 270.7 418.5 295.4 vp9_inv_dct_dct_16x16_sub2_add_10_neon: 3840.2 3244.8 3700.1 2337.9 vp9_inv_dct_dct_16x16_sub4_add_10_neon: 4212.5 3575.4 3996.9 2571.6 vp9_inv_dct_dct_16x16_sub8_add_10_neon: 5174.4 4270.5 4615.5 3031.9 vp9_inv_dct_dct_16x16_sub12_add_10_neon: 5676.0 4908.5 5226.5 3491.3 vp9_inv_dct_dct_16x16_sub16_add_10_neon: 6403.9 5589.0 5839.8 3948.5 vp9_inv_dct_dct_32x32_sub1_add_10_neon: 1710.7 944.7 1582.1 1045.4 vp9_inv_dct_dct_32x32_sub2_add_10_neon: 21040.7 16706.1 18687.7 13193.1 vp9_inv_dct_dct_32x32_sub4_add_10_neon: 22197.7 18282.7 19577.5 13918.6 vp9_inv_dct_dct_32x32_sub8_add_10_neon: 24511.5 20911.5 21472.5 15367.5 vp9_inv_dct_dct_32x32_sub12_add_10_neon: 26939.5 24264.3 23239.1 16830.3 vp9_inv_dct_dct_32x32_sub16_add_10_neon: 29419.5 26845.1 25020.6 18259.9 vp9_inv_dct_dct_32x32_sub20_add_10_neon: 31146.4 29633.5 26803.3 19721.7 vp9_inv_dct_dct_32x32_sub24_add_10_neon: 33376.3 32507.8 28642.4 21174.2 vp9_inv_dct_dct_32x32_sub28_add_10_neon: 35629.4 35439.6 30416.5 22625.7 vp9_inv_dct_dct_32x32_sub32_add_10_neon: 37269.9 37914.9 32271.9 24078.9 After: vp9_inv_dct_dct_16x16_sub1_add_10_neon: 454.0 276.0 418.5 295.1 vp9_inv_dct_dct_16x16_sub2_add_10_neon: 2336.2 1886.0 2251.0 1458.6 vp9_inv_dct_dct_16x16_sub4_add_10_neon: 2531.0 2054.7 2402.8 1591.1 vp9_inv_dct_dct_16x16_sub8_add_10_neon: 3848.6 3491.1 3845.7 2554.8 vp9_inv_dct_dct_16x16_sub12_add_10_neon: 5703.8 4831.6 5230.8 3493.4 vp9_inv_dct_dct_16x16_sub16_add_10_neon: 6399.5 5567.0 5832.4 3951.5 vp9_inv_dct_dct_32x32_sub1_add_10_neon: 1722.1 938.5 1577.3 1044.5 vp9_inv_dct_dct_32x32_sub2_add_10_neon: 15003.5 11576.8 13105.8 9602.2 vp9_inv_dct_dct_32x32_sub4_add_10_neon: 15768.5 12677.2 13726.0 10138.1 vp9_inv_dct_dct_32x32_sub8_add_10_neon: 17278.8 14825.4 14907.5 11185.7 vp9_inv_dct_dct_32x32_sub12_add_10_neon: 22335.7 21544.5 20379.5 15019.8 vp9_inv_dct_dct_32x32_sub16_add_10_neon: 24165.6 23881.7 21938.6 16308.2 vp9_inv_dct_dct_32x32_sub20_add_10_neon: 31082.2 30860.9 26835.3 19711.3 vp9_inv_dct_dct_32x32_sub24_add_10_neon: 33102.6 31922.8 28638.3 21161.0 vp9_inv_dct_dct_32x32_sub28_add_10_neon: 35104.9 34867.5 30411.7 22621.2 vp9_inv_dct_dct_32x32_sub32_add_10_neon: 37438.1 39103.4 32217.8 24067.6 Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Martin Storsjö	d564c9018f	aarch64: vp9itxfm16: Move the load_add_store macro out from the itxfm16 pass2 function This allows reusing the macro for a separate implementation of the pass2 function. Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Martin Storsjö	0f2705e66b	aarch64: vp9itxfm16: Make the larger core transforms standalone functions This work is sponsored by, and copyright, Google. This reduces the code size of libavcodec/aarch64/vp9itxfm_16bpp_neon.o from 26288 to 21512 bytes. This gives a small slowdown of a couple of tens of cycles, but makes it more feasible to add more optimized versions of these transforms. Before: vp9_inv_dct_dct_16x16_sub4_add_10_neon: 1887.4 vp9_inv_dct_dct_16x16_sub16_add_10_neon: 2801.5 vp9_inv_dct_dct_32x32_sub4_add_10_neon: 9691.4 vp9_inv_dct_dct_32x32_sub32_add_10_neon: 16154.9 After: vp9_inv_dct_dct_16x16_sub4_add_10_neon: 1899.5 vp9_inv_dct_dct_16x16_sub16_add_10_neon: 2827.2 vp9_inv_dct_dct_32x32_sub4_add_10_neon: 9714.7 vp9_inv_dct_dct_32x32_sub32_add_10_neon: 16175.9 Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Martin Storsjö	0ea603203d	arm: vp9itxfm16: Make the larger core transforms standalone functions This work is sponsored by, and copyright, Google. This reduces the code size of libavcodec/arm/vp9itxfm_16bpp_neon.o from 17500 to 14516 bytes. This gives a small slowdown of a couple tens of cycles, up to around 150 cycles for the full case of the largest transform, but makes it more feasible to add more optimized versions of these transforms. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub4_add_10_neon: 4237.4 3561.5 3971.8 2525.3 vp9_inv_dct_dct_16x16_sub16_add_10_neon: 6371.9 5452.0 5779.3 3910.5 vp9_inv_dct_dct_32x32_sub4_add_10_neon: 22068.8 17867.5 19555.2 13871.6 vp9_inv_dct_dct_32x32_sub32_add_10_neon: 37268.9 38684.2 32314.2 23969.0 After: vp9_inv_dct_dct_16x16_sub4_add_10_neon: 4375.1 3571.9 4283.8 2567.2 vp9_inv_dct_dct_16x16_sub16_add_10_neon: 6415.6 5578.9 5844.6 3948.3 vp9_inv_dct_dct_32x32_sub4_add_10_neon: 22653.7 18079.7 19603.7 13905.3 vp9_inv_dct_dct_32x32_sub32_add_10_neon: 37593.2 38862.2 32235.8 24070.9 Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Martin Storsjö	b76533f105	aarch64: vp9itxfm16: Restructure the idct32 store macros This avoids concatenation, which can't be used if the whole macro is wrapped within another macro. Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Martin Storsjö	d613251622	aarch64: vp9itxfm16: Avoid .irp when it doesn't save any lines This makes the code a bit more readable. Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Martin Storsjö	25ced1eb1c	aarch64: vp9itxfm16: Fix a typo in a comment Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Martin Storsjö	32e273c111	arm: vp9itxfm16: Avoid reloading the idct32 coefficients Keep the idct32 coefficients in narrow form in q6-q7, and idct16 coefficients in lengthened 32 bit form in q0-q3. Avoid clobbering q0-q3 in the pass1 function, and squeeze the idct16 coefficients into q0-q1 in the pass2 function to avoid reloading them. The idct16 coefficients are clobbered and reloaded within idct32_odd though, since that turns out to be faster than narrowing them and swapping them into q6-q7. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_32x32_sub4_add_10_neon: 22653.8 18268.4 19598.0 14079.0 vp9_inv_dct_dct_32x32_sub32_add_10_neon: 37699.0 38665.2 32542.3 24472.2 After: vp9_inv_dct_dct_32x32_sub4_add_10_neon: 22270.8 18159.3 19531.0 13865.0 vp9_inv_dct_dct_32x32_sub32_add_10_neon: 37523.3 37731.6 32181.7 24071.2 Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Martin Storsjö	c1619318e5	arm: vp9itxfm16: Fix vertical alignment Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Martin Storsjö	b46d37e93a	arm: vp9itxfm16: Use the right lane size This makes the code slightly clearer, but doesn't make any functional difference. Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Martin Storsjö	21c89f3a26	arm/aarch64: vp9: Fix vertical alignment Align the second/third operands as they usually are. Due to the wildly varying sizes of the written out operands in aarch64 assembly, the column alignment is usually not as clear as in arm assembly. This is cherrypicked from libav commit `7995ebfad1`. Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Martin Storsjö	70317b25aa	arm/aarch64: vp9itxfm: Skip loading the min_eob pointer when it won't be used In the half/quarter cases where we don't use the min_eob array, defer loading the pointer until we know it will be needed. This is cherrypicked from libav commit `3a0d5e206d`. Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Martin Storsjö	b7a565fe71	arm: vp9itxfm: Template the quarter/half idct32 function This reduces the number of lines and reduces the duplication. Also simplify the eob check for the half case. If we are in the half case, we know we at least will need to do the first three slices, we only need to check eob for the fourth one, so we can hardcode the value to check against instead of loading from the min_eob array. Since at most one slice can be skipped in the first pass, we can unroll the loop for filling zeros completely, as it was done for the quarter case before. This allows skipping loading the min_eob pointer when using the quarter/half cases. This is cherrypicked from libav commit `98ee855ae0`. Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
James Almer	54b19aaaeb	Merge commit '4ab496261b12e20ef293b7adca4fcaef1a67c538' * commit '4ab496261b12e20ef293b7adca4fcaef1a67c538': libvpx: Cast a pointer to const to squelch a warning This commit is a noop, see `09b3bbe605` Merged-by: James Almer <jamrial@gmail.com>	8 years ago
James Almer	6966a5e4d7	Merge commit '721d57e608dc4fd6c86f27c5ae76ef559d646220' * commit '721d57e608dc4fd6c86f27c5ae76ef559d646220': vp56: Separate VP5 and VP6 dsp initialization Merged-by: James Almer <jamrial@gmail.com>	8 years ago
James Almer	663640d745	Merge commit '3fd22538bc0e0de84b31335266b4b1577d3d609e' * commit '3fd22538bc0e0de84b31335266b4b1577d3d609e': prores: Change type of stride parameters to ptrdiff_t Merged-by: James Almer <jamrial@gmail.com>	8 years ago
James Almer	aec42ebc27	Merge commit 'f81be06cf614919d71ded29b8f595bef40123ad8' * commit 'f81be06cf614919d71ded29b8f595bef40123ad8': cavs: Change type of stride parameters to ptrdiff_t Merged-by: James Almer <jamrial@gmail.com>	8 years ago
James Almer	4e4dfcac58	Merge commit '802727b538b484e3f9d1345bfcc4ab24cfea8898' * commit '802727b538b484e3f9d1345bfcc4ab24cfea8898': vp8: Update some assembly comments left unchanged in `bd66f073fe` Merged-by: James Almer <jamrial@gmail.com>	8 years ago
James Almer	e5623aafd8	Merge commit '87c6c78604e4dd16f1f45862b27ca006da010527' * commit '87c6c78604e4dd16f1f45862b27ca006da010527': vp8: Change type of stride parameters to ptrdiff_t Merged-by: James Almer <jamrial@gmail.com>	8 years ago
James Almer	4004d33fcb	Merge commit 'd9d26a3674f31f482f54e936fcb382160830877a' * commit 'd9d26a3674f31f482f54e936fcb382160830877a': vp56: Change type of stride parameters to ptrdiff_t Merged-by: James Almer <jamrial@gmail.com>	8 years ago
Clément Bœsch	6a42a54b9d	Merge commit '6892df9294d93322d43255ada299507465bc93c8' * commit '6892df9294d93322d43255ada299507465bc93c8': vp3: Change type of stride parameters to ptrdiff_t Merged-by: Clément Bœsch <u@pkh.me>	8 years ago
Clément Bœsch	33dc6fcc4c	Merge commit '963b3ab11f98fcc4a311f0dc7b268890c5675da2' * commit '963b3ab11f98fcc4a311f0dc7b268890c5675da2': doc: Document FATE option HWACCEL Merged-by: Clément Bœsch <u@pkh.me>	8 years ago
Clément Bœsch	ed223eeab3	Merge commit 'd42809f9835a4e9e5c7c63210abb09ad0ef19cfb' * commit 'd42809f9835a4e9e5c7c63210abb09ad0ef19cfb': av1: Add codec_id and basic demuxing support Merged-by: Clément Bœsch <u@pkh.me>	8 years ago
Clément Bœsch	f4a39ceea0	Merge commit '24130234cd9dd733116d17b724ea4c8e12ce097a' * commit '24130234cd9dd733116d17b724ea4c8e12ce097a': rtpdec_mpeg4: validate fmtp fields Merged with fixed log message. Merged-by: Clément Bœsch <u@pkh.me>	8 years ago
Clément Bœsch	206a9fb29c	Merge commit '46e3936fb04d06550151e667357065e3f646da1a' * commit '46e3936fb04d06550151e667357065e3f646da1a': configure: Set __MSVCRT_VERSION__to 0x0700 for MinGW Merged-by: Clément Bœsch <u@pkh.me>	8 years ago
Clément Bœsch	a45a623d46	Merge commit '6755eb5b212384e0599f7f2c5de42df49fff57de' * commit '6755eb5b212384e0599f7f2c5de42df49fff57de': mss12: validate display dimensions This commit is a noop, see `ee9151b616` Merged-by: Clément Bœsch <u@pkh.me>	8 years ago
Clément Bœsch	d2f68be1e8	Merge commit '33f10546ec012ad4e1054b57317885cded7e953e' * commit '33f10546ec012ad4e1054b57317885cded7e953e': vc1: check that slices have a positive height This commit is a noop, see `e985cfd18b` Merged-by: Clément Bœsch <u@pkh.me>	8 years ago
Clément Bœsch	81bc1782b6	Merge commit '09b23786b3986502ee88d4907356979127169bdd' * commit '09b23786b3986502ee88d4907356979127169bdd': pcx: use the bytestream2 API for reading from input This commit is a noop, see `8cd1c0febe` Merged-by: Clément Bœsch <u@pkh.me>	8 years ago
Clément Bœsch	ca619cdf54	Merge commit '221402c1c88b9d12130c6f5834029b535ee0e0c5' * commit '221402c1c88b9d12130c6f5834029b535ee0e0c5': pcx: check that the packet is large enough before reading the header See `8cd1c0febe` Merged-by: Clément Bœsch <u@pkh.me>	8 years ago
Clément Bœsch	2da66630dc	Merge commit '15ee419b7abaf17f8c662c145fe93d3dbf43282b' * commit '15ee419b7abaf17f8c662c145fe93d3dbf43282b': pcx: properly pad the scanline This commit is a noop, see `d24de4596c` Merged-by: Clément Bœsch <u@pkh.me>	8 years ago
Clément Bœsch	d707e667c5	Merge commit '409d1cd2c955485798f8b0b0147c2b899b9144ec' * commit '409d1cd2c955485798f8b0b0147c2b899b9144ec': cook: use the bytestream2 API for reading extradata Merged-by: Clément Bœsch <u@pkh.me>	8 years ago
Clément Bœsch	a0220d949f	Merge commit 'bba9d8bdfb208b0ec2ccf182530347151ee3528b' * commit 'bba9d8bdfb208b0ec2ccf182530347151ee3528b': qpeg: fix an off by 1 error in the MV check See `dd3bfe3cc1` Merged-by: Clément Bœsch <u@pkh.me>	8 years ago
Clément Bœsch	f09aa73b30	Merge commit '796dca027be09334d7bbf4f2ac1200e06bb054cb' * commit '796dca027be09334d7bbf4f2ac1200e06bb054cb': alac: do not return success if nothing was decoded See `e11983bda0` Merged-by: Clément Bœsch <u@pkh.me>	8 years ago
Clément Bœsch	1080b7162f	Merge commit 'f5d46d332258dcd8ca623019ece1d5e5bb74142b' * commit 'f5d46d332258dcd8ca623019ece1d5e5bb74142b': vmnc: check that subrectangles fit into their containing rectangles See `6ba02602aa` This merge keeps our condition against w-i and h-j instead of bw and bh. One may be more correct than the other, but I'm keeping our behaviour here for safety reasons. The style and formatting is merged. Merged-by: Clément Bœsch <u@pkh.me>	8 years ago
Clément Bœsch	01e188762f	Merge commit '83b92a855e8e08bdec484e13ee5a7c8996224772' * commit '83b92a855e8e08bdec484e13ee5a7c8996224772': golomb: Drop disabled cruft Merged-by: Clément Bœsch <u@pkh.me>	8 years ago
Clément Bœsch	a754fae4a7	Merge commit '014852e932dab6e9cf2a53e7a17ce8321f3e922c' * commit '014852e932dab6e9cf2a53e7a17ce8321f3e922c': simple_idct: arm: Drop disabled code variant Merged-by: Clément Bœsch <u@pkh.me>	8 years ago
Clément Bœsch	8695ce73ca	Merge commit 'e2b9993558b6adee42dcc6eb385a14943aaca974' * commit 'e2b9993558b6adee42dcc6eb385a14943aaca974': simple_idct: x86: Drop disabled IDCT implementation Merged-by: Clément Bœsch <u@pkh.me>	8 years ago
Clément Bœsch	ff66ba6feb	Merge commit '7effebde78977fafce935776153ea2f7c0981fa3' * commit '7effebde78977fafce935776153ea2f7c0981fa3': dvbsubdec: Remove disabled, near-duplicate debug code Merged-by: Clément Bœsch <u@pkh.me>	8 years ago
Clément Bœsch	b2c5f5054b	Merge commit '93fed46a92bab8be176d3e67be4354189a8dbe7f' * commit '93fed46a92bab8be176d3e67be4354189a8dbe7f': timefilter: test: Drop some disabled debug cruft Merged-by: Clément Bœsch <u@pkh.me>	8 years ago
Clément Bœsch	adef752f1b	Merge commit '0e285c2f908789e96e29bfd969ad5eaaa0eece65' * commit '0e285c2f908789e96e29bfd969ad5eaaa0eece65': mpegvideo: Kill some disabled code Merged-by: Clément Bœsch <u@pkh.me>	8 years ago
Clément Bœsch	95a29b1a82	Merge commit 'f2f145f3032bc8808708a4bd694fbce5f1b8b63c' * commit 'f2f145f3032bc8808708a4bd694fbce5f1b8b63c': msmpeg4: Drop disabled debug cruft Merged-by: Clément Bœsch <u@pkh.me>	8 years ago
Clément Bœsch	87cd8dc0b0	Merge commit 'be1db21ba88fe86036fea9f8d2c1a5f47c2a0a7e' * commit 'be1db21ba88fe86036fea9f8d2c1a5f47c2a0a7e': mathops: Drop disabled alternative mid_pred() implementation Merged-by: Clément Bœsch <u@pkh.me>	8 years ago
Clément Bœsch	56d63208d8	Merge commit 'be3363f664d7314d55b42860bd4077154752d769' * commit 'be3363f664d7314d55b42860bd4077154752d769': nsv: Drop disabled cruft Merged-by: Clément Bœsch <u@pkh.me>	8 years ago
Clément Bœsch	1a48a51bfc	Merge commit 'b53d8c3ccfeff77874f5ca7c68136b6d87a0a69c' * commit 'b53d8c3ccfeff77874f5ca7c68136b6d87a0a69c': mjpegdec: Drop disabled code The last chunk is replaced with a comment describing the structure. Merged-by: Clément Bœsch <u@pkh.me>	8 years ago
Clément Bœsch	7a6514861e	Merge commit '34c22a9ca656603428b2c3490d1339c5a5966961' * commit '34c22a9ca656603428b2c3490d1339c5a5966961': faan(i)dct: Kill some disabled code Merged-by: Clément Bœsch <u@pkh.me>	8 years ago
Clément Bœsch	83706367e2	Merge commit 'a4b1b5aa281cacde8351d9947b54ccf82ff10cd0' * commit 'a4b1b5aa281cacde8351d9947b54ccf82ff10cd0': wc3movie: Drop unused cruft Merged-by: Clément Bœsch <u@pkh.me>	8 years ago
Clément Bœsch	2f42aef3e4	Merge commit '17cb56b35672a2cd6ad7abe926e6cc772b8f4710' * commit '17cb56b35672a2cd6ad7abe926e6cc772b8f4710': ffv1: Remove broken disabled cruft Merged-by: Clément Bœsch <u@pkh.me>	8 years ago
Clément Bœsch	b6e88bf323	Merge commit 'b96f0ab3d29cdd9ea9ddabfb2052f72bf8615661' * commit 'b96f0ab3d29cdd9ea9ddabfb2052f72bf8615661': h264: Kill broken disabled cruft Merged-by: Clément Bœsch <u@pkh.me>	8 years ago
Clément Bœsch	842e7853c7	Merge commit '42c4c2d2a6dc48adb0e901ef5617acfba0a3a18e' * commit '42c4c2d2a6dc48adb0e901ef5617acfba0a3a18e': aac: Drop broken cruft Merged-by: Clément Bœsch <u@pkh.me>	8 years ago
Clément Bœsch	92cd2c04b1	Merge commit '263efc095e6c7ec2902119118b084cea29ea8916' * commit '263efc095e6c7ec2902119118b084cea29ea8916': jfdct: Kill broken cruft Merged-by: Clément Bœsch <u@pkh.me>	8 years ago

1 2 3 4 5 ...

84151 Commits (eabc5abf949bb8cadafe33df73adacf87ee4c5e0) All Branches Search

84151 Commits (eabc5abf949bb8cadafe33df73adacf87ee4c5e0)

All Branches