Speaker Timothy Terriberry
Time 2008-01-30 15:30
Conference LCA2008

https://www.theora.org/

Color space:

  • Y’ - Luma
    • greyscale
  • Cb - Chroma blue
  • Cr - Chroma red

nonlinear

headroom:

  • pure black (16,16,16) to (219,240,240)
  • room for values to overflow
  • values still overflow regardless

Video is 4:2:0, signal bandwidth ratios from analog TV.

  • 4 pixels Luma : 1 Cb : 1 Cr

Frame size must be multiple of 16 bytes.

  • if it isn’t, make it larger, and then crop it.

Superblocks from one plane, 4x4 pixels. 32x32.

macroplane cuts across all three planes. 16x16.

INTRA frames, no motion compensation, key frames, golden frames, can decode by itself.

INTER frames, uses motion compensation, reduces bandwidth. Need previous frame + INTRA frame to decode. In order to decode previous frame, need the previous frame before that, etc.

INTRA frames, aid seeking to particular point.

Scene changes are typically INTRA frames.

Motion compensation

diff = frame - previous frame

Coding

not coding or skipping a block is very efficient

  • frame data is copied directly from previous frame
  • RLE - run length encoding
  • VLC - variable length codes

Three phase process

  1. partition superlocks into “partially coded” and “the rest”
  2. partition “the rest” of the superblocks into “fully coded” and “not coded”
  3. partition the blocks ???

For block flags, the longest run is length 30.

Identify the best motion vector. How good a match vs how good the signal. Rate distortion optimisation.

How do you measure distortion.

  • Sum of absolute differences. sum(x[i] - y[i])
  • Only on the Luma plane, Chroma ignored.

Half pixel resolution.

Find best 4 pixel value, then refine to half pixel resolution. Linear interpolation for backward compatibility, even though the half way point is where you get the most aliasing.

Not interpolation quality that matters, it is the ability to remove noise that matters.

Macro block modes

Which mode should be used?

Ideal would be to transform and quantise the frame, and compare. This is not practical.

Instead, estimate what the error will be.

Each macro block gets 0-4 MV (Motion Vectors).

Coded with a fixed VLC code

DCT Transform

MC (Motion compensation) removed temporal correlation.

DCT removed spatial correlation from the residual.

8x8 block

  • DC block - average pixels in block
  • others are AC

Y = G.X.G^T

No function that looks like a hard edge. Hard edge, next to smooth edge, will not encode with optimal results.

DCT closely related to the Fourier transform.

1-D: 16 multiples, 26 additions. 2-D: 256 multiples, 416 additions.

Quantisation

Contrast perception varies according to spatial frequency.

This is the only lossy step in the entire process.

H263 - constant coefficient.

MPEG4 - allow you to switch between extremes.

Theora allows interpolation between the extremes.

DC coefficients look like 1/8th resolution coy of the original image.