There is no reason why this should copy the audio data in a very
complicated way. Also, strictly write the first plane, instead of
writing the whole buffer. This is more helpful in context of the
example. This way a user can clearly confirm that it works by playing
the written data as raw audio.
This assumes one audio packet is decoded one time. This is not true:
packets can be partially decoded. Then you have to "adjust" the packet
and pass the undecoded part of the packet to the decode function again.