Anatomy of a Video Codec
Speaker | Timothy Terriberry |
---|---|
Time | 2008-01-30 15:30 |
Conference | LCA2008 |
Color space:
- Y’ - Luma
- greyscale
- Cb - Chroma blue
- Cr - Chroma red
nonlinear
headroom:
- pure black (16,16,16) to (219,240,240)
- room for values to overflow
- values still overflow regardless
Video is 4:2:0, signal bandwidth ratios from analog TV.
- 4 pixels Luma : 1 Cb : 1 Cr
Frame size must be multiple of 16 bytes.
- if it isn’t, make it larger, and then crop it.
Superblocks from one plane, 4x4 pixels. 32x32.
macroplane cuts across all three planes. 16x16.
INTRA frames, no motion compensation, key frames, golden frames, can decode by itself.
INTER frames, uses motion compensation, reduces bandwidth. Need previous frame + INTRA frame to decode. In order to decode previous frame, need the previous frame before that, etc.
INTRA frames, aid seeking to particular point.
Scene changes are typically INTRA frames.
Motion compensation
diff = frame - previous frame
Coding
not coding or skipping a block is very efficient
- frame data is copied directly from previous frame
- RLE - run length encoding
- VLC - variable length codes
Three phase process
- partition superlocks into “partially coded” and “the rest”
- partition “the rest” of the superblocks into “fully coded” and “not coded”
- partition the blocks ???
For block flags, the longest run is length 30.
Motion search
Identify the best motion vector. How good a match vs how good the signal. Rate distortion optimisation.
How do you measure distortion.
- Sum of absolute differences. sum(x[i] - y[i])
- Only on the Luma plane, Chroma ignored.
Half pixel resolution.
Find best 4 pixel value, then refine to half pixel resolution. Linear interpolation for backward compatibility, even though the half way point is where you get the most aliasing.
Not interpolation quality that matters, it is the ability to remove noise that matters.
Macro block modes
Which mode should be used?
Ideal would be to transform and quantise the frame, and compare. This is not practical.
Instead, estimate what the error will be.
Each macro block gets 0-4 MV (Motion Vectors).
Coded with a fixed VLC code
DCT Transform
MC (Motion compensation) removed temporal correlation.
DCT removed spatial correlation from the residual.
8x8 block
- DC block - average pixels in block
- others are AC
Y = G.X.G^T
No function that looks like a hard edge. Hard edge, next to smooth edge, will not encode with optimal results.
DCT closely related to the Fourier transform.
1-D: 16 multiples, 26 additions. 2-D: 256 multiples, 416 additions.
Quantisation
Contrast perception varies according to spatial frequency.
This is the only lossy step in the entire process.
H263 - constant coefficient.
MPEG4 - allow you to switch between extremes.
Theora allows interpolation between the extremes.
DC coefficients look like 1/8th resolution coy of the original image.