Temporal Rate Conversion
This paper is of great relevance to anyone who wants to display video-originated material on desktop CRT displays at a flicker-free refresh rate (such as 75Hz) with good quality.
It is of slightly less importance to people building systems with displays intended for a viewing distance of something like 10 feet for the 60Hz market or for people that plan to use flat-panel desktop displays. It is also of less importance if you are planning only to display film-originated material at high quality and are happy with the video-originated material being juddery.
Temporal rate definition
Temporal rate is the least ambiguous term for the time axis in the video format. Field rate should only be used when referring to an interlaced signal. Frame rate should only be used to describe a progressive signal. Refresh rate is best used to refer to displays.
It is best when quoting a TV standard to use the temporal rate figure because this is the rate that the motion was sampled at and is the figure that has the greatest bearing on how the video is subsequently processed.
It is worth noting that in an existing NTSC video-originated interlaced signal, even though the frame rate is only 30Hz, the temporal rate is actually 60Hz and this is the rate at which the motion in the original scene was originally sampled. In this case, however, only a half-resolution image is captured on each temporal sample. Video-originated NTSC should be quoted as "Interlaced active lives with a temporal rate of 60 Hz." To help stress the full consequences of interlace, it is common to actually quote the NTSC format as 480i30, where 30 refers to the frame rate, but this is very misleading when trying to assess the temporal rate conversion problem. In this paper, the NTSC format will be referred to as 480i60, where the 60 refers to the temporal sampling frequency, otherwise known as the "Temporal Rate."
480i60 - Interlaced sampling of ball movement. Temporal sampling rate is 60Hz.
480p24 - Film sampling of ball movement. Temporal sampling rate is 24Hz.
Describing film-originated NTSC material is even more confusing and should best be quoted as "480p24 pseudo-interlaced using 3:2 pulldown." The important figure to quote is the rate at which the motion in the original scene was captured, for example, in the film case, 24Hz. Everything else is really just an implementation detail.
480p60 - Progressive 60Hz sampling provides the best of both worlds for sports action.
The origins of the 60Hz (and 50Hz) field rate
The good thing about the existing field rate of 60Hz (and to a lesser extent, the 50Hz European standard) is that it allows the analog transmission bandwidth to be kept to a minimum and yet is not so low that the picture would fail to do a reasonable job of portraying motion or be seen as excessively flickery when viewed on the small screened TVs envisioned back in the 1930s when the TV system was being designed. Ideally it would have been good to have a higher field rate (above 70Hz), but analog transmission bandwidth was (and still is) too expensive to make this practical.
When TV was being designed in the 1930s and electronics was in its infancy, it was difficult to design oscillator and power regulation circuits, so it was necessary to make the TV field rate the same as the power rate. This power frequency had been arrived at back in the Victorian era, because it was an efficient rate to run a power generation turbine and worked well for the transformers necessary for power distribution. From the 1970s onward, with the advent of modern electronics, the requirement to base TV designs on the power frequency went away, but of course the TV standard was well established by then.
60Hz generators, so 60Hz video.
The current video standards are well entrenched
The following is a run-down of the video standards currently used in each geographic area. Note that each area actually uses two distinct standards that at first sight seem the same, but are actually very different. Some TV shows actually consist of a mixture of the two. (The number of lines in the NTSC signal is simplified to 480, even though it typically has a couple extra. The nominal 60Hz rate, where appropriate, is stated as its correct frequency of 59.94Hz, because this has relevance in discussion of temporal issues.)
480i59.94Hz (video-originated, that is, scene sampled at 59.94Hz). 480i59.94Hz (film originated, that is, scene sampled at 24Hz, changed to 59.94 using 3:2 pulldown run fractionally slow). In many ways this signal is best regarded as 480p23.47Hz.
480i60Hz (video-originated, that is, scene sampled at 60Hz). 480i60Hz (film originated, that is, scene sampled at 24Hz, changed to 60 using 3:2 pulldown). In many ways, this signal is best regarded as 480p24Hz.
Europe: 576i50Hz (video-originated, that is, scene sampled at 50Hz) 576i50Hz (film originated, that is, scene sampled at 24Hz, changed to 50 using frame repeat and run 4% fast). In many ways this signal is best regarded at 576p25Hz.
The issue of display rate needed to avoid flicker is independent of the rate needed to describe the motion in a scene The temporal rate needed to portray motion is totally independent of the temporal rate needed to stop a display device from flickering. Despite this independence, the answers, by chance, come out fairly similar.
Flicker refers to the eye's perception of flashing light Whether the light from the display is perceived to flash at you is a function of both the temporal characteristic of the eye and the temporal characteristics of the display technology. It won't be seen as flashing if the light is held stable by the display, as in the case of flat-panel displays that have a sample-and-hold pixel characteristic. Also it won't be seen to flash if the flashing rate of the display is faster than the temporal response of the eye.
CRT displays have an impulse light output characteristic
CRTs have an impulse light output characteristic, that is, light is given out only by the dot from the electron beam and from a short trail of glowing phosphor behind the moving dot. The phosphor light output from each phosphor target decays away in something like 50 microseconds after the beam has passed it. The persistence needed to turn the moving dot into an image is provided by your eye. Effectively, the moving spot does an impulse update of the image that is already on your retina. Because of eye tracking, the image being updated is a stationary image even if the actual object is moving. This is how you are able to see detail even in moving objects.
An ideal TV system would use impulse sampling of the scene, using an electronic camera with a fast shutter, and a display at the other end that also has an impulse response. Because of this, CRTs still produce the sharpest picture quality and dynamic resolution when compared with the newer flat-panel technologies. Obviously, there are other advantages with flat-panel displays that can make up for the lack of dynamic resolution, such as the ability to hang them on the wall. The other thing to note is that the impulse characteristic of a CRT, which produces good dynamic resolution, is also what causes flicker. To avoid this, you need a faster refresh rate than 60Hz, which means you run into the judder picture quality issue associated with trying to process a 60Hz signal into something faster.
Flat-panel displays have a sample-and-hold characteristic
All of the newer display technologies such as LCD, plasma, DLP, and so on, have essentially a sample-and-hold characteristic. When a pixel is addressed, it is loaded with a value and stays at that light output value until it is next addressed. From an image portrayal point of view, this is the wrong thing to do. The sample of the original scene is only valid for an instant in time. After that instant, the objects in the scene will have moved to different places. It is not valid to try to hold the images of the objects at a fixed position until the next sample comes along that portrays the object as having instantly jumped to a completely different place.
Your eye tracking will be trying to smoothly follow the movement of the object of interest and the display will be holding it in a fixed position for the whole frame. The result will inevitably be a blurred image of the moving object.
Sample and hold pixel characteristic causes blur.
The good thing about displays with a sample-and-hold characteristic is that they do not produce any flicker when driven at 60Hz. This is because the sample values from the video signal are held for the entire frame time, rather than being just flashed onto the screen. The fact that they can be driven at the same rate as the video source is very significant because it avoids the need for temporal rate conversion.
|Desktop LCD monitor||42" plasma display|
Leaving aside the temporal rate conversion difficulties, displays with a sample-and-hold characteristic, such as LCD and plasma, would produce better motion portrayal if operated at rates above 60Hz. Flat panels are normally run at 60Hz, because it is perceived that this is all you need to do since there is no flicker problem. The reality is that a faster update rate would be beneficial in order to reduce the blurring effect associated with the sample-and-hold characteristic. Pixels with a sample-and-hold characteristic effectively extend what should have been an instantaneous sample into a constant value that lasts for a whole frame. The result of this is motion smearing. This smearing is reduced if you can update the sample and hold circuits more often with new sample values.
There are two reasons why LCD and plasma displays almost all currently operate at 60Hz. Least importantly, the drive electronics only goes that fast. More importantly, there is no standardized faster rate, and the drive electronics designs are all designed for single frequency operation. LCD manufacturers would like it if there were a higher rate standard such as 75Hz, but they won't use it, as there is no standard.
The sample-and-hold pixel characteristic is also the reason why you cannot feed a flat-panel display with an interlaced signal. If you did, you would end up with both fields displayed at once, which would be seen as "feathering" (or "mice teeth") on vertical edges that are moving horizontally. If the movement between fields is far enough, then you actually see two separate images of the object.
|Interlace "feathering" ("mice teeth")||Interlace "double imaging"|
Flicker on CRTs is seen on large areas of uniform bright colors The flickering on CRTs is particularly evident where the picture contains wide areas of uniform bright color, such as an area of bright sky. The effect is called "large area flicker." The problem gets considerably worse when you magnify a picture to put it on a large screen.
The amount of perceived flicker increases as the light output from the CRT increases The problem of flicker is getting worse as the light output from modern CRT displays continues to increase.
The amount of perceived flicker on a CRT display increases as screens get larger and wider he human eye's peripheral vision is particularly sensitive to flicker. If you sit watching a wide screen, the edges of the screen are actually being seen by your peripheral vision. This makes the flicker very visible and annoying, particularly on wide aspect ratio 16:9 screens. Although the fast response time of your peripheral vision is a problem these days, it was very useful back in the caveman days for detecting wild animals leaping out from the side to eat you!
The human eye's sensitivity to flicker is determined by approximately a power of 4 law It has been determined in various scientific viewer tests that the amount of flicker you see is proportional to frequency to the power 4. A 60Hz field rate is not just a bit better than 50Hz, it is twice as good, since (60/50)PWR4 = 2 (approximately). A temporal rate of 72Hz is twice as good as a 60Hz rate, since (72/60)PWR4 = 2 (approximately). Although there are several aspects of the NTSC standard that are not as good as the PAL system, it is interesting to note that because the temporal rate is the hardest to change, the American-originated system has some advantages for the future.
Above 72Hz on a CRT display, whether flicker is seen, depends on the particular person, but many still see flicker if the rate is less than about 85Hz At medium rates, you won't see the flicker when you stare straight at the screen, but you will see it out of the corner of your eye. This continuous distraction from the edges of the screen can become very annoying. When viewing a screen, there is also the concept of sub-conscious flicker, that is, you are not directly aware of the flicker, but you come away with a headache.
If you sit close to a low refresh rate CRT screen, as when working on a desktop PC, you will see considerable flicker because of the high subtended angle and the high light output reaching your eye When sitting at a PC, your subtended angle of view is quite large since you are typically less than two picture widths away. It is often your peripheral vision that is seeing the screen edges, so you see the flicker more. There is also a lot of the light output from the screen entering your eyes.
PC graphics often has large areas of bright white and this causes considerable flicker on a CRT display at 60Hz Another reason why 60Hz on a PC looks worse than 60Hz on a TV is that on a PC it is common to use black letters on a predominantly white and therefore bright background. TV broadcasters deliberately try to avoid using bright backgrounds in order to minimize flicker. This is particularly true in 50Hz Europe.
Nobody would buy a PC with a 60Hz CRT display these days Most computer buyers know that a PC that uses a 60Hz refresh rate flickers badly. In practice these days, nobody tries to sell PCs with a refresh rate of less than 72Hz, because they know that the flickering would be so bad that nobody would buy them.
It is acceptable to use 60Hz for CRT displays intended for 10-foot viewing In a family living room environment when viewing a CRT screen from 10 feet away, it is acceptable to use 60Hz. In the future, when displaying computer graphics, the flicker will get worse. It will also get worse as the light output from displays increases, but even so it is acceptable. Any less than about 60Hz, even for 10-foot viewing distances, is not acceptable. 50Hz, for example, produces twice as much flicker as 60Hz and is not acceptable. For applications where the screen will be viewed close up, then a rate higher than about 70Hz is essential.
Temporal rate needed to describe the motion
The temporal rate needed to carry the motion information is a subjective issue and is very dependent on the type of motion being portrayed The best way to think about it is as if it were a cartoon. The high-paid lead cartoonist needs to draw just enough pictures to describe to the low-paid "inbetweener" artists, the motion of the cartoon character. If the motion is linear, such as a train on a track, then the number of drawings needed to fully describe the motion is small. If a cartoon character is being randomly flung about the screen by another cartoon character, then the number of drawings needed to describe the motion is high. A motion vector-steered temporal upconverter is able to do the job of an "inbetweener" artist and fill in the additional frames, but as with the "inbetweener" artist, it is only able to do this for steady motion between the temporal reference points.
What it comes down to is that the steadier the motion, the less the temporal sample rate needed to portray that motion. Fast motion that keeps changing direction, such as the drummer's drum sticks in a heavy metal rock band, needs a very fast temporal sampling rate if it is to be fully portrayed. Motion like this needs a temporal sampling rate of many hundreds of Hertz. Luckily most motion is slower and more linear than this and so can be represented by a slower temporal sampling rate.
Whatever temporal sampling rate you choose, it's unlikely to be fast enough There is no practical frame rate high enough to properly portray all the motion typically encountered. It is necessary to pick a sensible rate that is slow enough to allow the video signal to be stored, routed around, and of course broadcast.
The sampling theorem specifies that you need to sample at a minimum of twice the maximum frequency present in the signal The sampling theorem is jointly attributed to the work of Nyquist and Shannon. Failure to sample at this frequency will result in aliasing. Aliasing refers to the fact that frequencies in the signal being sampled that are higher than twice the sampling frequency will fold back and overlap with the valid required frequencies. They will be transposed and will appear as rogue lower frequencies mixed in with the proper signal. The frequency components due to aliasing become indistinguishable from the valid frequency components.
Correct sampling is when the Baseband does not extend past half the sampling frequency.
The theorem applies to all things being sampled including spatial frequencies (that is, detail in the scenes) and temporal frequencies (that is, the rate at which the objects in the scene are moving). There is no fundamental reason why you must avoid aliasing, but it is important to understand its consequences and artifacts.
The temporal sampling rate used by video is not fast enough to avoid temporal aliasing If the temporal sampling rate used by a video signal was more than twice the frequency of the fastest motion present in the scene, then the sample points transmitted would fully represent the motion information and so you could accurately pick additional information points on the curve. The reality, however, with any video signal that contains motion is that the temporal sampling rate is not sufficient and so video signals usually have temporal aliasing.
Quality of existing TV is OK because of eye tracking Temporal aliasing does not badly detract from your TV viewing pleasure at home on your existing TV because your eyes are able to track the motion of objects of interest. Because of the persistence of the eye, your eyes are able to accurately interpolate along the movement axis. Unfortunately, any processing of the signal also needs to track the motion if it is not going to be confused by the temporal aliasing.
The eye tracks the object of interest. In this case the object is moving up the screen.
If the temporal rate is excessively low, then the result will be "Temporal Sampling Judder" Temporal Sampling Judder is due to the sample rate being too slow to describe the motion in the scene. The effect is like shining a stroboscope light on the scene. Moving objects are seen to jump from one position to another, rather than being seen to move smoothly.
This type of judder is commonly seen in film-originated material because a 24Hz sampling rate is far too slow for much of the motion in many scenes. It is often referred to as "film judder." It is often seen in backgrounds when the film camera pans horizontally. It is also responsible for the amusing artifact in Westerns, where wagon wheels are seen to go backwards.
Professional film cameramen and other cinematographers try hard to avoid motion in the scenes that would cause Temporal Sampling Judder. Typically, the camera is accurately panned to follow the moving object of interest, thereby making it stationary relative to the camera picture. Also, a small depth of focus is used to avoid judder in the background as the camera pans past. Another factor that helps is the temporal characteristic of the film camera that keeps each film frame exposed for about half of each frame period, thus introducing some temporal smear. This is very different from CCD video cameras that have a very fast electronic shutter.
This Temporal Sampling Judder is one of the components of "the film look" and most people have become used to it, so it's not particularly annoying. Typically (given a constant bandwidth transmission medium) the loss in temporal resolution is made up for by a corresponding increase in spatial resolution.
Another form of judder is "3:2 pulldown judder" By far the most common technique for converting 24Hz film material to 60Hz is called 3:2 pulldown. In this method, a film frame is alternately repeated 2 times or 3 times. This alternating between 2 times and 3 times produces the necessary 2.5 multiple of 24Hz, thus producing 60Hz.
Although 3:2 pulldown judder is found annoying by many Europeans (who are used to seeing movies converted to 50Hz by consistently repeating each film frame twice), people in 60Hz countries are used to it and don't find it particularly annoying. Effectively a 3:2 judder filter has been learned from childhood.
Temporal rate conversion
Why temporal rate conversion judder occurs
It's all due to temporal aliasing The origins of the problem lie in the fact that the temporal sampling rate used by video is not fast enough to avoid temporal aliasing. This is not a big problem on your current TV at home because your eyes are able to track the motion. Because of the persistence of the eye, your eyes are able to accurately interpolate along the movement axis. The involuntary eye tracking makes the object of interest stationary on your retina.
Any processing of the video signal also needs to track the motion if it is not going to be confused by the temporal aliasing and therefore cause annoying judder.
Judder is the brain's way of saying: "What was that ?" The reason you see judder is that your eyes and brain are trying to track the smooth motion of the object. If the video electronics presents the object in the wrong place (because it is just repeating the previous frame) then your brain gets confused and does a double take. That is the perceived judder.
Judder is most noticeable on camera pans Judder can be seen on any medium-speed motion, but it is particularly noticeable when you arelooking at background objects such as advertising boardings as the camera pans past following the action at sporting events. It is also very noticeable with scrolling credit titles and ticker displays.
Small difference frequencies cause the most judder The amount of standard conversion judder you get decreases as the difference between the input frequency and the output frequency increases. The eye is most sensitive to judder at about 8Hz. When the difference frequency between the input rate and the output rate is small, that is, close to the 8Hz, the judder is very bad. You need a difference frequency of about 30-35Hz before the judder has reduced to an acceptable level. Effectively, however, all you are really doing is substituting blur for judder.
Film-originated material is OK When calculating the difference frequency, it is important to note that the relevant input rate is the rate at which the material was originated. Film material is normally originated at 24Hz. It is typically changed into 60Hz using the 3:2 pulldown technique.
This film-originated 60Hz material can be upconverted to, say, 75Hz without incurring standards conversion judder, because the material is really just 24Hz material that has been over-sampled at 60Hz. The actual difference frequency is 51Hz, not 15Hz.
Don't be fooled by demos that use film-originated material to "prove" to you that their system has no temporal rate conversion judder.
Receiver factors affect the amount of perceived judder Besides the fact that most demos are done with film material, there are other reasons why the PC world isn't more conscious of the judder problem.
Noise in the picture helps mask the problem. Obviously this is not the solution to the judder problem, particularly as we move to clean digital signals.
Poor video circuitry, such as currently used in PCs, causes soft pictures, helping to mask the problem. The video being fed to the PC's graphics subsystem for field rate conversion has very little high frequency detail, that is, it tends to be blurred. The aim now is to provide TV quality on a PC that is considerably better than a consumer TV, and therefore it is essential that the temporal rate conversion problem is fully understood and a solution found.
The characteristics of the display also affect the amount of perceived judder The amount of motion judder that you get on a CRT display when changing the temporal rate increases as the amount of light output from the display increases.
The amount of motion judder that you get on a CRT display when changing the temporal rate also increases as the size of the display increases. If the picture is large (and you stay the same distance away), then the judder will be more noticeable as the distance (angle of view will be greater) over which it judders.
When feeding a high refresh rate signal such as 75Hz that has been linearly converted from a 60Hz source into an flat-panel display, you get less judder than feeding that same signal into a CRT display. The flat-panel display acts as a temporal post filter and is able to reduce the judder. Of course, all that is really happening is that blur and smear are being substituted in place of the judder, but even so, the results can be quite acceptable. A good way to test this is to use an LCD light-valve projector, because it is not restricted to single-frequency operation. It has a thin film of LCD material onto which is written an image using a CRT projection tube.
Hughes/JVC LCD light-valve projector.
One way of reducing judder in CRTs would be to increase the persistence of the phosphors from the current time of about 1 line period to closer to a field or frame time. Currently a typical phosphor will decay to about a third of its peak value in about 50ms. The problem of course in increasing its persistence is that this temporal lag would cause smearing, thereby destroying the good thing that CRTs have going for them.
The amount of motion judder that you get when changing the temporal rate is dependent on how much motion blur was introduced in the video capture process If a camera samples the motion in the scene for a high percentage of the time between fields (or frames), then the moving objects will be slightly blurred. This is the typical characteristic of tube-based cameras that were still used a few years ago. These cameras sampled the image for most of the time between fields. If you use a modern CCD camera with its integral shutter, the image is only sampled for typically less than a hundredth of the time between fields. Instead of blurred moving objects, you get individual crisp images of the moving objects at their positions at each sample time. Fast shuttering on CCD cameras is used to reduce motion blur and to decrease the amount of light going into the camera. The effect is best seen with video material of sporting events shot outside on sunny days, such as at a motor race.
CCD camera chip
When watching fast-shuttered material in its native form on a display that does not introduce motion blur, such as on a CRT display, your eye tracks the motion of the object of interest and sees a crisp image of the object as it moves. The eye is able to do this because the object is stationary on the retina due to the eye moving and tracking as the object moves.
A linear temporal rate converter cannot track motion. If you don't track the motion and you leave the moving objects sharp, then the result will be judder. The only way a linear converter can reduce the judder is by introducing some of the blur that you would have got from a tube-based camera. The resulting video will have slightly reduced judder, but it will have lost the detail that was present in the original captured image.
Unfortunately, as we fix the other video quality problems on the PC, the judder will jump up and bite us
This is why we need to start thinking of solutions now.
Currently used linear temporal rate conversion methods
The least expensive way to change the refresh rate is the "pulldown" method In the simplest case, you can produce 72Hz from a 24Hz signal, for example, by just repeating a 24Hz film frame three times. This is done by writing into a frame buffer at a rate of 24Hz and then clocking it out at the faster rate of 72Hz.
The same basic method can be used to convert between rates that are not integer multiples. You can write into a frame store at any input rate and clock it out at any other rate. The implementation complexity does, however, increase slightly since it is necessary to double buffer to stop the operation of writing into the frame store being visible on the screen.
Because of the relative simplicity of the pulldown method, it is possible to make it operate faster than the temporal interpolation method (described in the following section). This is important for the fast display scan rates used in today's PCs.
A slightly more expensive way to change the refresh rate is to use linear temporal interpolation The more complex way to convert from one temporal rate to another is to do linear temporal interpolation, whereby each output pixel is individually built as a function of input pixels at different points in time. The function used to provide the temporal interpolation is typically a windowed sinx/x function.
The temporal interpolation method can produce marginally better results than the pulldown method because it allows you to substitute some blur-and-smear in place of some of the judder. The theory is that blur-and-smear is less objectionable than judder, but blur-and-smear needs to be used sparingly. A temporal interpolation converter could blur-and-smear away the judder, but the picture would be unwatchable. In practice, it is only possible to blur-and-smear out about 30% of the judder. The temporal interpolation method does not cure the judder problem; it just tries to make it marginally less objectionable. The problem with it is that because it is more complex to implement, it is difficult (expensive) to get it to operate at the pixel rate required for modern PCs.
Interpolation involves low pass filtering and is an averaging method. To create a required output frame, it averages together various percentages of the surrounding input frames. It is not hard, when thinking about it in these terms, to see why the technique has problems. Consider a video of a man moving his arm down. The first frame shows his arm up high, by the next frame it is half way down, and on the third frame it is down at his side.
Suppose you want to double the frame rate by creating 2 additional frames. The result will be as shown in the picture below, the man will look like he has grown extra arms. Your eye seeing this will no longer be able to properly track the motion of the arm.
Changing the temporal rate using either the pulldown method or the linear temporal interpolation method will produce "temporal rate conversion judder" If an object is moving, it will be in a different place on each successive field. As shown above, Interpolating (averaging) between 3 fields gives 3 images of the object on the output field. The position of the dominant image will not move smoothly; it will be seen as judder.
Relating this to the sampling theorem, Judder results from the fact that the temporal sampling rate is not two times the rate of the fastest motion in the scene. As explained by Shannon and Nyquist, this results in temporal aliasing. If the field rate is 60Hz, then by the sampling theorem, the maximum movement frequency allowable in the signal being sampled is 30Hz.
Unfortunately, objects move a lot faster than this, so temporal aliasing nearly always occurs. As stated earlier, this is not too much of a problem when a human views the material on a TV using the native video standard, because of the eye's ability to track moving objects. When the eye tracks the motion of the object of interest, the moving object is stationary relative to the eye's retina, so it's as if it were not moving. This means that the temporal aliases are not seen. Unfortunately, when the video signal passes through a linear temporal rate converter, the aliasing causes interpolation theory to break down. The converter cannot tell the aliasing from genuine signals and resamples both to produce the output fields. These multiple alias images are the cause of the perceived judder.
A linear temporal rate converter is faced with a dilemma of whether to keep the annoying judder or to apply considerable low pass filtering to change the object into a low resolution blur as it moves.
Stock ticker should have smooth motion. Often it's seen to judder, moving in disjointed jumps.