Anatomy of a Video Codec

Speaker	Timothy Terriberry
Time	2008-01-30 15:30
Conference	LCA2008

https://www.theora.org/

Color space:

Y’ - Luma
- greyscale
Cb - Chroma blue
Cr - Chroma red

nonlinear

headroom:

pure black (16,16,16) to (219,240,240)
room for values to overflow
values still overflow regardless

Video is 4:2:0, signal bandwidth ratios from analog TV.

4 pixels Luma : 1 Cb : 1 Cr

Frame size must be multiple of 16 bytes.

if it isn’t, make it larger, and then crop it.

Superblocks from one plane, 4x4 pixels. 32x32.

macroplane cuts across all three planes. 16x16.

INTRA frames, no motion compensation, key frames, golden frames, can decode by itself.

INTER frames, uses motion compensation, reduces bandwidth. Need previous frame + INTRA frame to decode. In order to decode previous frame, need the previous frame before that, etc.

INTRA frames, aid seeking to particular point.

Scene changes are typically INTRA frames.

Motion compensation

diff = frame - previous frame

Coding

not coding or skipping a block is very efficient

frame data is copied directly from previous frame
RLE - run length encoding
VLC - variable length codes

Three phase process

partition superlocks into “partially coded” and “the rest”
partition “the rest” of the superblocks into “fully coded” and “not coded”
partition the blocks ???

For block flags, the longest run is length 30.

Motion search

Identify the best motion vector. How good a match vs how good the signal. Rate distortion optimisation.

How do you measure distortion.

Sum of absolute differences. sum(x[i] - y[i])
Only on the Luma plane, Chroma ignored.

Half pixel resolution.

Find best 4 pixel value, then refine to half pixel resolution. Linear interpolation for backward compatibility, even though the half way point is where you get the most aliasing.

Not interpolation quality that matters, it is the ability to remove noise that matters.

Macro block modes

Which mode should be used?

Ideal would be to transform and quantise the frame, and compare. This is not practical.

Instead, estimate what the error will be.

Each macro block gets 0-4 MV (Motion Vectors).

Coded with a fixed VLC code

DCT Transform

MC (Motion compensation) removed temporal correlation.

DCT removed spatial correlation from the residual.

8x8 block

DC block - average pixels in block
others are AC

Y = G.X.G^T

No function that looks like a hard edge. Hard edge, next to smooth edge, will not encode with optimal results.

DCT closely related to the Fourier transform.

1-D: 16 multiples, 26 additions. 2-D: 256 multiples, 416 additions.

Quantisation

Contrast perception varies according to spatial frequency.

This is the only lossy step in the entire process.

H263 - constant coefficient.

MPEG4 - allow you to switch between extremes.

Theora allows interpolation between the extremes.

DC coefficients look like 1/8th resolution coy of the original image.