29 October 2014

Video encoding

I was reading about VP8 and H.264 because my application will have to handle video, and I wanted to be familiar with what's going on. As part of that, I considered a method to "encode" a video in a way that is highly parallel, maybe low on space, and with little-to-no loss. Being just a simple man from the South, I'm sure someone a lot smarter than me has already thought of this, but here are my ideas.

Moving pictures

In simplest terms (at least in my limited knowledge of the video industry), raw video is an ordered sequence of pictures (frames). So each frame is a 2-dimensional grid of pixels. From video gaming, at least 60 frames per second is considered roughly optimal to make the human eye unable to perceive frame changes (it just looks like motion). 30 fps and up actually works for this, but an occasional frame change is still observable. At 1080p (1920x1080 resolution) there are 2,073,600 pixels in each frame if you look at it as an individual picture. A pixel's color is represented by a combination of color values. From the web, I know that one way to represent RGB colors is by using 6 hexadecimal digits; 2 for red, 2 for green, and 2 for blue. 6 hex digits takes up roughly 3 bytes. So that adds up to 5.9MB per frame to represent all the pixels. At 60 frames per second, that's 355MB per second of space. A 2 hr movie would be 42.7GB in raw form, not counting audio, delimiters, and format info.


If you look at an RBG color as a vector, then each frame is vector field. Though not quite, because RGB coordinates would always be integers. It'd be more like a vector ring, but probably assuming it's a real number would be sufficient. So anyway, if we extend that further, we could theoretically find a formula to exactly fit the changes of one pixel over time. (A small part of me wishes I had not dropped Vector Analysis in college.)

With the way most movies are actually edited, a single function would probably be pretty hairy, take a long time to generate, and take a lot of calculation to get the value for each pixel. But inside of a single shot, the function would probably look pretty smooth and have a fairly easy formula. Some exceptions are likely (like a night scene with gunshots, causing wide swings of color and intensity). So the one end of the spectrum you have one monolithic formula per pixel, and on the other end is a piece-wise function, with formulas for each shot.

Formula-fitting each pixel is an inherently parallel process, with each pixel considered separately. Optimization can occur afterwards. Video cards could be used (both encode and decode) for this since their computational power is all about parallelization.

Considering there are over 2 million pixels at 1080p, would this method actually save any space? A movie like The Matrix with over 2300 shots and using an accurate piece-wise function could be 10x larger than the original (my rough guesstimate based on a 50 ASCII character formula per shot per pixel). However, there are some "nerd knobs" that can be tweaked to make it theoretically small enough to be consumable. For one you can always increase the margin of error on the function, which will generally make the functions smaller and simpler while sacrificing accuracy. There are also optimizations you could make like sharing a particular formula across multiple pixels. Consider a night shot where many of the pixels will be the same shade of grey. Or consider that many pixels for a given scene will share a similar intensity, but will just be color shifted from one another. Or vice versa, same color, but different intensities (black and white film, for instance). By identifying those formulas which are simple transformations of others, there is an opportunity to conserve space. Although the sheer number of addressable pixels creates a lot of overhead for such optimizations. This volume of data is as much an exercise in organizational efficiency as anything.

Anyway, it is an interesting though experiment. Looking at a video as a series of vector fields is something I hadn't considered before.

No comments: