Closed captioning, it’s not just for accessibility! TextTrack in HTML5 video

When I hit the gym, I try to get a Stairmaster in front of a TV that’s showing closed captioning. Since I listen to music, I don’t want to listen to the TV. With closed captioning, I can enjoy my music, and keep an eye on world events.

Closed captioning was originally conceived for hard of hearing viewers and offering programming in other languages without re-dubbing. Captioning can be a word for word transcription or translation of screen content, or it can be used for additional content like titles, locations, or commentary.

The track element

To add closed captioning, add a track element and point to a track file in your HTML5 video code. This example simply plays a video with a track file.

<!DOCTYPE html>
<title>Simple track example</title>
<video id="video1" controls autoplay>
<source src=""  >
<track id='track1' label='English captions' src="entrack.vtt" kind='subtitles' srclang='en' default >


Pretty cool, huh? Not a lot of extra code, just add the track element and you’ve got closed captioning.


The built in controls that appear when you use the controls attribute on the video element lets you turn closed captioning on or off. If you’ve got more than one track you can select between them as well.


 Here’s an example of adding three translation tracks:

<video id="video1" controls autoplay loop>
< source src="video.mp4" type="video/mp4">
<track id="enTrack" src="entrack.vtt" label="English" kind="subtitles" srclang="en" default>
<track id="esTrack" src="estrack.vtt" label="Spanish" kind="subtitles" srclang="es">
<track id="deTrack" src="detrack.vtt" label="German" kind="subtitles" srclang="de">
HTML5 video not supported

Inside the video element it’s got a source element that points to the video file, and three track elements that point to track files. These tracks are written in WebVTT, a format for closed captioning. Lastly a message to display if HTML5 video isn’t supported.

Internet Explorer supports two formats for closed captioning, WebVTT and TTML.

WebVTT is a reasonably simple format that provides the in and out timing points to insert text, and the text.


00:00:01.878 --> 00:00:05.334
Good day everyone, my name is John Smith

00:00:08.608 --> 00:00:15.296
This video will teach you how to
build a sand castle on any beach

WebVTT has a few more features as outlined in the W3C community group spec, but not all are supported in IE.

The other format, TTML is an XML based format. A simple TTML file looks like this:

<?xml version='1.0' encoding='UTF-8'?>
<tt xmlns='' xml:lang='en'>

<p begin="00:00:01.878" end="00:00:05.334" >Good day everyone, my name is John Smith</p>
<p begin="00:00:08.608" end="00:00:15.296" >This video will teach you how to<br/>build a sand castle on any beach</p>


This is the original code that was supported in Internet Explorer 10. In IE11, support for Simple Delivery Profile (SDP) caption styling was added. SDP gives some pretty cool abilities to place the text within the video frame, change text color, and background, and control stroke color and width. The full spec is here:

TTML Simple Delivery Profile for Closed Captions (US)

The TTML gets a little more complicated, but it can really add power to your captioning.

<?xml version="1.0" encoding="utf-8"?>
<tt xml:lang="en-us" xmlns=""
<p:profile use=""/>
<!-- define styles for text color and position -->
<style xml:id="bottomMidStyle" s:textAlign="center" s:textOutline="red 1px" s:backgroundColor="#ff000044"
s:color="#ffffffff" s:origin='20% 78%' s:extent='30% 10%'/>
<style xml:id="topMidStyle" s:textAlign="center" s:textOutline="black 1px" s:backgroundColor="#00ff0088"
s:color="#ff11ffff" s:origin='20% 40%' s:extent='60% 18%'/>
<style xml:id="topLeftStyle" s:textAlign="left" s:textOutline="blue 1px" s:backgroundColor="transparent"
s:color="#ff11ffff" s:origin='10% 10%' s:extent='30% 10%'/>
<style xml:id="bottomRightStyle" s:textAlign="right" s:textOutline="black 1px" s:backgroundColor="white"
s:color="green" s:origin='70% 70%' s:extent='30% 10%'/>

<!-- define regions for locating text -->
<region xml:id="bottomMid" style='bottomMidStyle' />
<region xml:id="topMid" style='topMidStyle' />
<region xml:id="topLeft" style='topLeftStyle' />
<region xml:id="bottomRight" style='bottomRightStyle' />
<div style='defaultFont'>
<p region="bottomMid" begin='00:00:00.101' end='00:00:05.000'> This is a Pop-up caption</p>
<p region="topMid" begin='00:00:05.000' end='00:00:10.000'> This is another Pop-up caption</p>
<p region="topLeft" begin='00:00:10.000' end='00:00:15.000'> Hello from up top</p>
<p region="bottomRight" begin='00:00:15.000' end='00:00:20.000'> And back down</p>

Using SDP is no different than plain captioning:

<!DOCTYPE html>
<title>SDP Test</title>
<video src="video.mp4" controls muted autoplay width="800">
<track src="SDPTest.ttml" label="SDP Examples" default/>

That’s it.

Adding code for supporting a track file is not very hard. However, you do need to work a little to create the .vtt or .ttml file. To make that a little easier, check out HTML5 Video Caption Maker.  How to use Caption Maker and more on Text Track is described in Make your videos accessible with Timed Text Tracks.


Caption Maker lets you load a video file and step through it, typing or pasting in captions or comments. Try it first with the Load Sample button. This loads up an MP4 file to play with. Once the video is loaded, you get the option to load a sample VTT file. You can load or save WebVTT or TTML format files. 

Caption Maker was written in the IE10 timeframe, so it doesn’t support SDP, but it creates a good starting point. It only takes MP4 files, and not OGG or WebM. If you use IE10 or IE11, run Caption Maker, and press F12. You’ll be able to download the various files to experiment yourself.

Using track for other cool stuff

I teased it in the title, but you can get pretty creative by using JavaScript to fetch cues (captions) from your tracks. This article outlines in the series described above shows you some of the script things you can do. One thing is to created your own styled text. Since you can overlay a text field onto a video element, you can place text anywhere you want and style it with CSS. To be sure text lines up correctly, you’ll have to use the CSS position attribute. Once you have your layout, that’s not that hard.

This example uses the “metadata” as the kind of track. When you specify the track in HTML, you set the kind= attribute. Typically you set it to captions or subtitles, so the video element takes care of displaying the track text. However, by changing that to “metadata”, you now can get cues in JavaScript, but they don’t display on the screen. Here’s some code that takes captions and puts them in color (with the much hated Comic Sans font) at the top of the video element.

<!DOCTYPE html >
<title>Styled text example</title>
<!-- only force Internet Explorer 10 standards for testing on local machine -->

text-align :center;
font-family:Comic Sans MS;
text-shadow: 0.1em 0.1em 0.15em #333;
z-index:100; /* set z-index to be sure div is on top */

<div id="player" style='width:640px;'> <!-- container div that is sized by script -->
<video id="video1" controls style='width:100%;'>
<source src="">
<!-- by using "metadata" as the kind, it will suppress the video player's own caption display -->
<track id="track1" label="English captions" src="entrack.vtt" kind="metadata" srclang="en" default>
<div id="display">

var video;
// get objects associated with the video, track, and div elements
video = document.getElementById("video1");
var disp = document.getElementById("display");
var track = document.getElementById("track1");

        video.addEventListener("loadedmetadata", function () {
document.getElementById("player").style.width = video.videoWidth + "px"; // make enclosure div width == video width = ( + (video.videoHeight * .05)) + "px"; // set the text to appear at 5% from the top of the video =; // set the text to appear relative to the left edge of the video = video.videoWidth + "px"; // set text box to the width of the video
}, false);

        track.addEventListener("cuechange", function () {
var myTrack = this.track; // track element is "this"
var myCues = myTrack.activeCues; // activeCues is an array of current cues.
if (myCues.length > 0) {
disp.innerText = myCues[0].text; // write the text
}, false);



 In this example, the metadata cues are a series of URLs synced to the video. When the guy shows the IE Testdrive site, you see the live site on the screen. This one is a little contrived, but you get the idea.

So that’s all there is to add accessibility and other cool features to your online videos. Give it a try.