Colour is a powerful thing. It can identify a brand, imply eco-friendliness, gender a toy, raise our blood pressure, calm us down. But what exactly is colour? How and why do we see it? And how do cameras record it? Let’s find out.
The Meaning of “Light”
One of the many weird and wonderful phenomena of our universe is the electromagnetic wave, an electric and magnetic oscillation which travels at 186,000 miles per second. Like all waves, EM radiation has the inversely-proportional properties of wavelength and frequency, and we humans have devised different names for it based on these properties.
EM waves with a low frequency and therefore a long wavelength are known as radio waves or, slightly higher in frequency, microwaves; we used them to broadcast information and heat ready-meals. EM waves with a high frequency and a short wavelength are known as x-rays and gamma rays; we use them to see inside people and treat cancer.
In the middle of the electromagnetic spectrum, sandwiched between infrared and ultraviolet, is a range of frequencies between 430 and 750 terahertz (wavelengths 400-700 nanometres). We call these frequencies “light”, and they are the frequencies which the receptors in our eyes can detect.
If your retinae were instead sensitive to electromagnetic radiation of between 88 and 91 megahertz, you would be able to see BBC Radio 2. I’m not talking about magically seeing into Ken Bruce’s studio, but perceiving the FM radio waves which are encoded with his silky-smooth Scottish brogue. Since radio waves can pass through solid objects though, perceiving them would not help you to understand your environment much, whereas light waves are absorbed or reflected by most solid objects, and pass through most non-solid objects, making them perfect for building a picture of the world around you.
Within the range of human vision, we have subdivided and named smaller ranges of frequencies. For example, we describe light of about 590-620nm as “orange”, and below about 450nm as “violet”. This is all colour really is: a small range of wavelengths (or frequencies) of electromagnetic radiation, or a combination of them.
In the eye of the beholder
The inside rear surfaces of your eyeballs are coated with light-sensitive cells called rods and cones, named for their shapes.
The human eye has about five or six million cones. They come in three types: short, medium and long, referring to the wavelengths to which they are sensitive. Short cones have peak sensitivity at about 420nm, medium at 530nm and long at 560nm, roughly what we call blue, green and red respectively. The ratios of the three cone types vary from person to person, but short (blue) ones are always in the minority.
Rods are far more numerous – about 90 million per eye – and around a hundred times more sensitive than cones. (You can think of your eyes as having dual native ISOs like a Panasonic Varicam, with your rods having an ISO six or seven stops faster than your cones.) The trade-off is that they are less temporally and spatially accurate than cones, making it harder to see detail and fast movement with rods. However, rods only really come into play in dark conditions. Because there is just one type of rod, we cannot distinguish colours in low light, and because rods are most sensitive to wavelengths of 500nm, cyan shades appear brightest. That’s why cinematographers have been painting night scenes with everything from steel grey to candy blue light since the advent of colour film.
The three types of cone are what allow us – in well-lit conditions – to have colour vision. This trichromatic vision is not universal, however. Many animals have tetrachromatic (four channel) vision, and research has discovered some rare humans with it too. On the other hand, some animals, and “colour-blind” humans, are dichromats, having only two types of cone in their retinae. But in most people, perceptions of colour result from combinations of red, green and blue. A combination of red and blue light, for example, appears as magenta. All three of the primaries together make white.
Compared with the hair cells in the cochlea of your ears, which are capable of sensing a continuous spectrum of audio frequencies, trichromacy is quite a crude system, and it can be fooled. If your red and green cones are triggered equally, for example, you have no way of telling whether you are seeing a combination of red and green light, or pure yellow light, which falls between red and green in the spectrum. Both will appear yellow to you, but only one really is. That’s like being unable to hear the difference between, say, the note D and a combination of the notes C and E. (For more info on these colour metamers and how they can cause problems with certain types of lighting, check out Phil Rhode’s excellent article on Red Shark News.)
Artificial eye
Mimicking your eyes, video sensors also use a trichromatic system. This is convenient because it means that although a camera and TV can’t record or display yellow, for example, they can produce a mix of red and green which, as we’ve just established, is indistinguishable from yellow to the human eye.
Rather than using three different types of receptor, each sensitive to different frequencies of light, electronic sensors all rely on separating different wavelengths of light before they hit the receptors. The most common method is a colour filter array (CFA) placed immediately over the photosites, and the most common type of CFA is the Bayer filter, patented in 1976 by an Eastman Kodak employee named Dr Bryce Bayer.
The Bayer filter is a colour mosaic which allows only green light through to 50% of the photosites, only red light through to 25%, and only blue to the remaining 25%. The logic is that green is the colour your eyes are most sensitive to overall, and that your vision is much more dependent on luminance than chrominance.
The resulting image must be debayered (or more generally, demosaiced) by an algorithm to produce a viewable image. If you’re recording log or linear then this happens in-camera, whereas if you’re shooting RAW it must be done in post.
This system has implications for resolution. Let’s say your sensor is 2880×1620. You might think that’s the number of pixels, but strictly speaking it isn’t. It’s the number of photosites, and due to the Bayer filter no single one of those photosites has more than a third of the necessary colour information to form a pixel of the final image. Calculating that final image – by debayering the RAW data – reduces the real resolution of the image by 20-33%. That’s why cameras like the Arri Alexa or the Blackmagic Cinema Camera shoot at 2.8K or 2.5K, because once it’s debayered you’re left with an image of 2K (cinema standard) resolution.
colour Compression
Your optic nerve can only transmit about one percent of the information captured by the retina, so a huge amount of data compression is carried out within the eye. Similarly, video data from an electronic sensor is usually compressed, be it within the camera or afterwards. Luminance information is often prioritised over chrominance during compression.
You have probably come across chroma subsampling expressed as, for example, 444 or 422, as in ProRes 4444 (the final 4 being transparency information, only relevant to files generated in postproduction) and ProRes 422. The three digits describe the ratios of colour and luminance information: a file with 444 chroma subsampling has no colour compression; a 422 file retains colour information only in every second pixel; a 420 file, such as those on a DVD or BluRay, contains one pixel of blue info and one of red info (the green being derived from those two and the luminance) to every four pixels of luma.
Whether every pixel, or only a fraction of them, has colour information, the precision of that colour info can vary. This is known as bit depth or colour depth. The more bits allocated to describing the colour of each pixel (or group of pixels), the more precise the colours of the image will be. DSLRs typically record video in 24-bit colour, more commonly described as 8bpc or 8 bits per (colour) channel. Images of this bit depth fall apart pretty quickly when you try to grade them. Professional cinema cameras record 10 or 12 bits per channel, which is much more flexible in postproduction.
The third attribute of recorded colour is gamut, the breadth of the spectrum of colours. You may have seen a CIE (Commission Internationale de l’Eclairage) diagram, which depicts the range of colours perceptible by human vision. Triangles are often superimposed on this diagram to illustrate the gamut (range of colours) that can be described by various colour spaces. The three colour spaces you are most likely to come across are, in ascending order of gamut size: Rec.709, an old standard that is still used by many monitors; P3, used by digital cinema projectors; and Rec.2020. The latter is the standard for ultra-HD, and Netflix are already requiring that some of their shows are delivered in it, even though monitors capable of displaying Rec.2020 do not yet exist. Most cinema cameras today can record images in Rec.709 (known as “video” mode on Blackmagic cameras) or a proprietary wide gamut (“film” mode on a Blackmagic, or “log” on others) which allows more flexibility in the grading suite. Note that the two modes also alter the recording of luminance and dynamic range.
To summarise as simply as possible: chroma subsampling is the proportion of pixels which have colour information, bit depth is the accuracy of that information and gamut is the limits of that info.
That’s all for today. In future posts I will look at how some of the above science leads to colour theory and how cinematographers can make practical use of it.