Previously, we just used the native sample rate for encoding. However, some encoders like libmp3lame doesn't support it. Therefore, we now use a supported sample rate (preferring the native one if possible).
FFmpeg requires audio data to be sent in a sequence of frames, each containing the same specific number of samples. Previously, we buffered input samples in FFmpegBackend. However, as the source and destination sample rates can now be different, we should buffer resampled data instead. swresample have an internal input buffer, so we now just forward all data to it and 'gradually' receive resampled data, at most one frame_size at a time. When there is not enough resampled data to form a frame, we will record the current offset and request for less data on the next call.
Additionally, this commit also fixes a flaw. When an encoder supports variable frame sizes, its frame size is reported to be 0, which breaks our buffering system. Now we treat variable frame size encoders as having a frame size of 160 (the size of a HLE audio frame).
We previously assumed that the first preferred sample format is planar, but that may not be true for all codecs. Instead we should find a supported sample format that is planar.
While YUV420P is widely used, not all encoders accept it (e.g. Intel QSV only accepts NV12). We should use the codec's preferred pixel format instead as we need to rescale the frame anyway.
This uses the mailbox model to move pixel downloading to its own thread, eliminating Nvidia's warnings and (possibly) making use of GPU copy engine.
To achieve this, we created a new mailbox type that is different from the presentation mailbox in that it never discards a rendered frame.
Also, I tweaked the projection matrix thing so that it can just draw the frame upside down instead of having the CPU flip it.