This commit implements intensity stereo coding support
to the native aac encoder. This is a way to increase the efficiency
of the encoder by zeroing the right channel's spectral coefficients
(in a channel pair) and rederiving them in the decoder using information
from the scalefactor indices of special band types. This commit
confomrs to the official ISO 13818-7 specifications, although due to
their ambiguity certain deviations have been taken to ensure maximum
sound quality. This commit has been extensively tested and has shown
to not result in audiable audio artifacts unless in extreme cases.
This commit also adds an option, aac_is, which has the value of
0 by default. Intensity Stereo is part of the scalable aac profile
and is thus non-default.
The way IS coding works is that it rederives the right channel's
spectral coefficients from the left channel via the scalefactor
index values left in the right channel. Since an entire band's
spectral coefficients do not need to be coded, the encoder's
efficiency jumps up and it unzeroes some high frequency values
which it previously did not have enough bits to encode. That way
less information is lost than the information lost by rederiving
the spectral coefficients with some error. This is why the
filesize of files encoded with IS do not decrease significantly.
Users wishing that IS coding should reduce filesize are expected
to reduce their encoding bitrates appropriately.
This is V2 of the commit. The old version did not mark ms_mask as
0 since M/S and IS coding are incompactible, which resulted in
distortions with M/S coding enabled. This version also improves
phase detection by measuring it for every spectral coefficient in
the band and using a simple majority rule to determine whether the
coefficients are in or out of phase. Also, the energy values per
spectral coefficient were changed as to reflect the
official specifications.
Reviewed-by: Claudio Freire <klaussfreire@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This commit finalizes the PNS implementation previously added to the encoder
by moving it to a seperate function search_for_pns() and thus making it
coder-generic. This new implementation makes use of the spread field of
the psy bands and the lambda quality feedback paremeter. The spread of the
spectrum in a band prevents PNS from being used excessively and thus preserve
more phase information in high frequencies. The lambda parameter allows
the number of PNS-marked bands to vary based on the lambda parameter and the
amount of bits available, making better choices on which bands are to be marked
as noise. Comparisons with the previous PNS implementation can be found
here: https://trac.ffmpeg.org/attachment/wiki/Encode/AAC/
This is V2 of the patch, the changes from the previous version being that this
version uses the new band->spread metric from aacpsy and normalizes the
energy using the group size. These changes were suggested by Claudio Freire
on the mailing list. Another change is the use of lambda to alter the
frequency threshold. This change makes the actual threshold frequencies
vary between +-2Khz of what's specified, depending on frame encoding performance.
Reviewed-by: Claudio Freire <klaussfreire@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This commit undoes commit c5d4f87e81
and removes PNS band marking from the twoloop coder, which has
been reimplemented in a better way in this series of patches.
Reviewed-by: Claudio Freire <klaussfreire@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This commit enables the function added with commit 7c10b87 and uses that
new function for setting any special scalefactor indices. This commit does
not change the behaviour of the encoder since no bands are being marked as
either NOISE_BT(due to the previous PNS implementation removed in the
previous commit) or INTENSITY_BT2/INTENSITY_BT.
Reviewed-by: Claudio Freire <klaussfreire@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
There were some mistakes in the code for M/S stereo, this commit fixes them.
The start variable was not being reset for every window and every access to
the coefficients was incorrect as well. This fixes that by properly
addressing the coefficients using both windows and setting the start on every window to zero.
Reviewed-by: Claudio Freire <klaussfreire@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This commit adds support for the coding of intensity stereo scalefactor indices.
It does not do any marking of such bands and as such does no functional changes
to the encoder. It removes any old twoloop specific code for PNS and moves it
into a seperate function which handles setting of scalefactor indices for
PNS and IS bands.
Reviewed-by: Claudio Freire <klaussfreire@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This commit adds support for both PNS and IS (intensity stereo) codebooks to the
encode_window_bands_info() quantizer, used by the faast, faac and anmr non-default,
native coders. This does not mean that both extensions now work with those coders,
some are simply unsuited and will trigger an assertion in the encoder while
others simply ignore the changed scalefactor indices and band types.
This commit simply adds support for encoding said band types with the alternative
coders. Future commits to the coders will be required to make them suitable.
Reviewed-by: Claudio Freire <klaussfreire@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This commit extends the trellis quantizer (used by the default twoloop coder)
to accept and correctly encode codebooks needed for intensity stereo and perceptual noise substitution.
Reviewed-by: Claudio Freire <klaussfreire@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This commit implements the perceptual noise substitution AAC extension. This is a proof of concept
implementation, and as such, is not enabled by default. This is the fourth revision of this patch,
made after some problems were noted out. Any changes made since the previous revisions have been indicated.
In order to extend the encoder to use an additional codebook, the array holding each codebook has been
modified with two additional entries - 13 for the NOISE_BT codebook and 12 which has a placeholder function.
The cost system was modified to skip the 12th entry using an array to map the input and outputs it has. It
also does not accept using the 13th codebook for any band which is not marked as containing noise, thereby
restricting its ability to arbitrarily choose it for bands. The use of arrays allows the system to be easily
extended to allow for intensity stereo encoding, which uses additional codebooks.
The 12th entry in the codebook function array points to a function which stops the execution of the program
by calling an assert with an always 'false' argument. It was pointed out in an email discussion with
Claudio Freire that having a 'NULL' entry can result in unexpected behaviour and could be used as
a security hole. There is no danger of this function being called during encoding due to the codebook maps introduced.
Another change from version 1 of the patch is the addition of an argument to the encoder, '-aac_pns' to
enable and disable the PNS. This currently defaults to disable the PNS, as it is experimental.
The switch will be removed in the future, when the algorithm to select noise bands has been improved.
The current algorithm simply compares the energy to the threshold (multiplied by a constant) to determine
noise, however the FFPsyBand structure contains other useful figures to determine which bands carry noise more accurately.
Some of the sample files provided triggered an assertion when the parameter to tune the threshold was set to
a value of '2.2'. Claudio Freire reported the problem's source could be in the range of the scalefactor
indices for noise and advised to measure the minimal index and clip anything above the maximum allowed
value. This has been implemented and all the files which used to trigger the asserion now encode without error.
The third revision of the problem also removes unneded variabes and comparisons. All of them were
redundant and were of little use for when the PNS implementation would be extended.
The fourth revision moved the clipping of the noise scalefactors outside the second loop of the two-loop
algorithm in order to prevent their redundant calculations. Also, freq_mult has been changed to a float
variable due to the fact that rounding errors can prove to be a problem at low frequencies.
Considerations were taken whether the entire expression could be evaluated inside the expression
, but in the end it was decided that it would be for the best if just the type of the variable were
to change. Claudio Freire reported the two problems. There is no change of functionality
(except for low sampling frequencies) so the spectral demonstrations at the end of this commit's message were not updated.
Finally, the way energy values are converted to scalefactor indices has changed since the first commit,
as per the suggestion of Claudio Freire. This may still have some drawbacks, but unlike the first commit
it works without having redundant offsets and outputs what the decoder expects to have, in terms of the
ranges of the scalefactor indices.
Some spectral comparisons: https://trac.ffmpeg.org/attachment/wiki/Encode/AAC/Original.png (original),
https://trac.ffmpeg.org/attachment/wiki/Encode/AAC/PNS_NO.png (encoded without PNS),
https://trac.ffmpeg.org/attachment/wiki/Encode/AAC/PNS1.2.png (encoded with PNS, const = 1.2),
https://trac.ffmpeg.org/attachment/wiki/Encode/AAC/Difference1.png (spectral difference).
The constant is the value which multiplies the threshold when it gets compared to the energy, larger
values means more noise will be substituded by PNS values. Example when const = 2.2:
https://trac.ffmpeg.org/attachment/wiki/Encode/AAC/PNS_2.2.png
Reviewed-by: Claudio Freire <klaussfreire@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This patch fixes a pointer arithmetic bug in adjust_frame_information that resulted in heavily corrupted audio when using M/S encoding. Also, a backup copy of untransformed coefficients has to be kept around or attempts at re-processing the frame (which happens when hevavily overspending bits during transients) will result in re-encoding of the coefficients and subsequent corruption of the resulting stream.
A/B testing shows the bug as corrected, but still cannot prove that M/S coding is a win at least in numbers. Limited listening tests do show improvement on M/S encoded samples in lower bitrates, but they're hidden among the other artifacts that remain to be corrected in the encoder.
Some of the regressions flagged in the report do show poor stereo image (but not buggy), so M/S encoding is clearly not good enough yet to be defaulted to auto.
In numbers, Patched against Unpatched, stereo_mode auto:
Files: 114
Bitrates: 6
Tests: 683
Serious Regressions: 0 (0%)
Regressions: 0 (0%)
Improvements: 227 (33%)
Big improvements: 92 (13%)
Worst regression - mybloodrusts.wv - 256k
- StdDev: 28.61 pSNR: -0.43 maxdiff: 1372.00
Best improvement - 60.wv - 384k
- StdDev: -369.57 pSNR: 45.02 maxdiff: -13322.00
Average - StdDev: -80.56 pSNR: 2.49 maxdiff: -8858.00
Patched against Unpatched stereo_mode ms_off shows no difference.
Patched stereo_mode auto vs Unpatched stereo_mode ms_off shows a small average improvement, just not too significant:
Serious Regressions: 0 (0%)
Regressions: 10 (1%)
Improvements: 45 (6%)
Big improvements: 2 (0%)
Worst regression - Illinois.wv - 256k
- StdDev: 33.20 pSNR: -2.03 maxdiff: 477.00
Best improvement - song_of_circomstances.flac - 384k
- StdDev: -3.97 pSNR: 7.61 maxdiff: -826.00
Average - StdDev: -10.25 pSNR: 0.20 maxdiff: -281.00
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This fixes a case where multichannel bitrate isn't accurately
targetted by psy model alone, never achieving the target bitrate.
Signed-off-by: Martin Storsjö <martin@martin.st>
Fixes a case where multichannel bitrate isn't accurately
targetted by psy model alone, never achieving the target bitrate.
Now fixed.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Do not pointlessly call ff_alloc_packet multiple times,
and fix an infinite loop by clamping the maximum
number of bits to target in the algorithm that does
not use lambda.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Do not pointlessly call ff_alloc_packet2 multiple times,
and fix an infinite loop by clamping the maximum
number of bits to target in the algorithm that does
not use lambda.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
On cygwin, math.h needs to be included before float.h because of a bug
in the system headers. Including libavutil/libm.h first works around
this issue.
Longer discussion of the topic:
http://thread.gmane.org/gmane.comp.video.ffmpeg.devel/128582
Reduce scalefactors in non-zero bands that underflow by twice as much as those
in bands that just fail to hit psy targets.
Originally committed as revision 24482 to svn://svn.ffmpeg.org/ffmpeg/trunk