This patch modifies the encode frame function to
retry encoding the frame when the resulting bit count
is too far off target, but only adjusting lambda
in small, incremental step. It also makes the logic
more conservative - otherwise it will contend with
bit reservoir-related variations in bit allocation,
and result in artifacts when frame have to be truncated
(usually at high bit rates transitioning from low
complexity to high complexity).
Avoid clipping due to quantization noise to produce audible
artifacts, by detecting near-clipping signals and both attenuating
them a little and encoding escape-encoded bands (usually the
loudest) rounding towards zero instead of nearest, which tends to
decrease overall energy and thus clipping.
Currently fate tests measure numerical error so this change makes
tests using asynth (which are near clipping) report higher error
not less, because of window attenuation. Yet, they sound better,
not worse (albeit subtle, other samples aren't subtle at all).
Only measuring psychoacoustically weighted error would make for
a representative test, so that will be left for a future patch.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This commit modifies 02dbed6 to use band->active_lines to better gauge how much information is contained within a single band and thus allow the perceptual noise subsitution to more accurately determine which bands to code as noise. The spread[w+g] used before this patch behaved more like a low-pass filter for PNS band_types, which could mistakingly mark some low frequency bands as noise.
Reviewed-by: Claudio Freire <klaussfreire@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This commit adds the energy spread to the struct for each band and removes 2 unused fields.
distortion and perceptual_weight were not referenced in any file nor were they set to any value,
so it was safe to remove them. The energy spread is currently only used in the aac psy model.
It's defined as being proportional to the tonality of each band.
Reviewed-by: Claudio Freire <klaussfreire@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The minimum of the ath(x, ATH_ADD) function depends on ATH_ADD.
This patch uses the first order approximation to determine it.
For ATH_ADD = 4 this results in the value at 3407.06812 (-5.24241638)
not the one at 3410 (-5.24237967).
CC: libav-stabl@libav.org
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
The minimum of the ath(x, ATH_ADD) function depends on ATH_ADD.
This patch uses the first order approximation to determine it.
For ATH_ADD = 4 this results in the value at 3407.06812 (-5.24241638)
not the one at 3410 (-5.24237967).
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
Approved-by: Claudio Freire <klaussfreire@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
If band->thr is 0.0f, the division is undefined, making norm_fac not a
number or infinity, which causes psy_band->threshold to become NaN.
This is passed on to other variables until it finally reaches
sce->sf_idx and is converted to an integer (-2147483648).
This causes a segmentation fault when it is used as array index.
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
Reviewed-by: Claudio Freire <klaussfreire@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This is a small change, but it does have a big impact on bit allocation.
all the regressions marked in the report have no audible
difference (I didn't check them all though), but the improvements can
be heard.
This affects mostly high bit rates. It's related to issue #2686.
In the report, A is the patched version, B is unpatched, all
comparisons show deltas in the form (A-B), so a positive pSNR delta
means a better quality in the patched version, and negative a
regression. Regressions are only considered for pSNR deltas below
-1db, they're considered serious below -6db.
All measurements were done with tiny_psnr.
The summary of the report inline for quick reading:
Files: 58
Bitrates: 6
Tests: 347
Serious Regressions: 0 (0%)
Regressions: 10 (2%)
Improvements: 54 (15%)
Big improvements: 26 (7%)
Worst regression - sine_tester.flac - 384k
- StdDev: 1.68 pSNR: -3.05 maxdiff: -178.00
Best improvement - 07 - Bound.flac - 384k
- StdDev: -1700.05 pSNR: 20.64 maxdiff: -29595.00
Average - StdDev: -55.67 pSNR: 1.20 maxdiff: -1593.00
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This patch changes existing mathematical functions with faster
ones. Speeds up encoding more than 10%. Tested on x86 and
MIPS platforms.
Signed-off-by: Bojan Zivkovic <bojan@mips.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
There is still are still a few sections missing relating to TNS (not present)
and mid/side (contains other bugs).
Overall this improves quality, and vastly improves rate-control.
Signed-off-by: Martin Storsjö <martin@martin.st>
3GPP:
Remove ffac from and move min_snr out of AacPsyBand.
Rearrange AacPsyCoeffs to make it easier to implement energy spreading.
Rename the band[] array to bands[]
Copy energies and thresholds at the end of analysis.
LAME:
Use a loop instead of an if chain in LAME windowing.
The 3GPP spec uses the following calculation for high spreading:
thr'_spr = max(thr_scaled, s_h(n) * thr_scaled(n-1))
where, n is defined as the current band, and s_h() is defined as "[...] the
distance of adjacent bands in Bark and a constant slope that is 15 dB/Bark
[...]". This is a little ambiguous as you would assume you want the Bark
width of the previous band for this calculation. However, this assumption
appears to be incorrect, and you really want the Bark width of the current
band. Coincidentally this is exactly what the spec calls for! =P
This noticeably improves Tom's Diner at low bitrates (I tested at 64kbps,
with mid/side disabled).
Patch by: Nathan Caldwell <saintdev@gmail.com>
Originally committed as revision 25622 to svn://svn.ffmpeg.org/ffmpeg/trunk
This greatly improves bitrate handling. You will now get within a few
kbps of your requested bitrate instead of 20-40kbps higher.
There is absolutely no analog to this line in the 3GPP spec, that I
can find.
patch by Nathan Caldwell saintdev (at) gmail
Originally committed as revision 25589 to svn://svn.ffmpeg.org/ffmpeg/trunk
Removing the modification vastly improves quality (at a slight bitrate
cost) for some samples. castanets.wav is a good example. The closest
equivalent I see to the modification in the 3GPP spec is a similar
modification (over a specific frequency range) when TNS is used.
This also changes the threshold-in-quiet calculation to match the
3GPP spec.
patch by Nathan Caldwell saintdev (at) gmail
Originally committed as revision 25588 to svn://svn.ffmpeg.org/ffmpeg/trunk
According to the 3GPP spec:
"Thus the pre-echo control is inactive for the first short window (but
not all short windows in a short frame) after a start block and for
all frames with a stop window sequence."
Currently, pre-echo control is only run when the current frame is not
a short frame, and the previous frame is not a short frame.
patch by Nathan Caldwell saintdev (at) gmail
Originally committed as revision 25587 to svn://svn.ffmpeg.org/ffmpeg/trunk
I used the same loop counter for the inner and outer initalization loops.
This caused initalization to only run for the first channel. This in turn lead
to any channel other than the first using only short blocks.
Patch by Nathan Caldwell, saintdev at gmail
Originally committed as revision 25566 to svn://svn.ffmpeg.org/ffmpeg/trunk
This performs quite a bit better than the current 3GPP-inspired window decision
on all the samples I have tested. On the castanets.wav sample it performs very
similar to iTunes window selection, and seems to perform better than Nero.
On fatboy.wav, it seems to perform at least as good as iTunes, if not better.
Nero performs horribly on this sample.
Patch by: Nathan Caldwell <saintdev@gmail.com>
Originally committed as revision 24892 to svn://svn.ffmpeg.org/ffmpeg/trunk
This allows cleaner implementation of other psymodels using the existing
structs. It also will make it easier to interchange individual parts of
the psymodel to create hybrid models.
Patch by: Nathan Caldwell <saintdev@gmail.com>
Originally committed as revision 24890 to svn://svn.ffmpeg.org/ffmpeg/trunk