The code needs only a few definitions from cuda.h, so define them
directly when CUDA is not enabled. CUDA is still required for accepting
HW frames as input.
Based on the code by Timo Rothenpieler <timo@rothenpieler.org>.
When there is a non-zero decoding delay due to reordering, the first dts
should be lower than the first pts (since the first packet fed to the
decoder does not produce any output).
Use the same scheme used in mpegvideo_enc (which comes from x264
originally) -- wait for first two timestamps and extrapolate linearly to
the past to produce the first dts value.