Timed text tracks

Internet Explorer 10 and Windows apps using JavaScript introduce support for the track element as described in Section 4.8.9 of the World Wide Web Consortium (W3C)'s HTML5 standard.The track element enables you to add timed text tracks, such as closed captioning, translations, or text commentary, to HTML5 video elements.

  • The track element
  • Track file formats
    • WebVTT
    • TTML
  • Multiple track files
  • Scripting the track element
    • The textTrack and textTrackList objects
    • The textTrackCueList and textTrackCue objects
  • Working with cues
    • Get the Current cue
    • Get all the cues
  • API Reference
  • Samples and tutorials
  • Internet Explorer Test Drive demos
  • IEBlog posts
  • Specification
  • Related topics

The track element

The syntax for the element is as follows.

<video id="mainvideo" controls autoplay loop>
  <track src="en_track.vtt" srclang="en" label="English" kind="caption" default>

The <track> element represents a timed text file to provide users with multiple languages or commentary for videos. You can use multiple tracks, and set one as default to be used when the video starts.

The text is displayed in the lower portion of the video player. At this time the position and color can't be controlled, but you can retrieve text through script and display it in your own way.

Track file formats

Text tracks use a simplified version of the Web Video Text Track (WebVTT) or Timed Text Markup Language (TTML) timed text file formats.Internet Explorer 10 and Windows apps using JavaScript currently support only timing cues and text captions.


WebVTT files are 8-bit Unicode Transformation Format (UTF-8) format text files that look like the following.


00:00:01.878 --> 00:00:05.334
Good day everyone, my name is John Smith

00:00:08.608 --> 00:00:15.296
This video will teach you how to 
build a sand castle on any beach

The file starts with the tag WEBVTT on the first line, followed by a line feed. The timing cues are in the format HH:MM:SS.sss. The Start and End cues are separated by a space, two hyphens and a greater-than sign ( --> ), and another space. The timing cues are on a line by themselves with a line feed. Immediately following the cue is the caption text. Text captions can be one or more lines. The only restriction is that there must be no blank lines between lines of text. The MIME type for WebVTT files is "text/vtt".


Internet Explorer 10 and Windows apps using JavaScript use a subset of the TTML (Timed Text Markup Language) file format, which is defined in the TTML specification. Windows Internet Explorer and Windows apps using JavaScript support the following structure.

<?xml version='1.0' encoding='UTF-8'?>
<tt xmlns='http://www.w3.org/ns/ttml' xml:lang='en' >

<p begin="00:00:01.878" end="00:00:05.334" >Good day everyone, my name is John Smith</p>
<p begin="00:00:08.608" end="00:00:15.296" >This video will teach you how to<br/>build a sand castle on any beach</p>


The TTML file uses a namespace declaration and the language attribute in the root element ( tt). This is followed by the body and a div element. Within the div element are the timing cues. The actual times are set as attributes (begin, end) of the opening paragraph tag (<p>) and the text is delineated by the closing </p> tag. Blank lines and white space are ignored. If there are multiple lines, they are separated by <br/> tags.

The MIME type for TTML files is "application/ttml+xml", or "text/xml". See Section 5.2 of the TTML specification for more information.

Multiple track files

More than one timed text file can be used—for instance, to provide your users with multiple languages or alternate commentary. If you're using multiple tracks, you set one as default to be used if your page doesn't specify or the user hasn't picked a language. Within the video player, the user can choose alternate tracks through a built-in user interface.

The following example shows a video element with three track elements.

<video id="mainvideo" controls autoplay loop>
  <source src="video.mp4" type="video/mp4">
  <track id="enTrack" src="engtrack.vtt" label="English" kind="subtitles" srclang="en" default>
  <track id="esTrack" src="spntrack.vtt" label="Spanish" kind="subtitles" srclang="es">
  <track id="deTrack" src="grmtrack.vtt" label="German" kind="subtitles" srclang="de">

In this example, the source element is used to define the video file, and the track elements each specify a text translation. The track elements are children of the video element. The track element accepts the following attributes.

Attribute Description


Defines the type of text content. Possible values are: subtitles, captions, descriptions, chapters, metadata.


URL of the timed text file.


The language of the timed text file. For information purposes; not used in the player.


Provides a label that can be used to identify the timed text. Each track must have a unique label.


Specifies the default track element. If not specified, no track is displayed.


Scripting the track element

Like most elements, the track element can be manipulated through scripting. The following objects, methods, and properties are available to manage track content and cues. A track is a collection of cues that provides times and text content related to a video.

The textTrack and textTrackList objects

Object Description


Represents the timed text track of a track element. The track consists of a collection of cues.

var texttract = document.getElementByID("trackElement");


Represents the list of tracks associated with a specific video element.

var texttracklist = document.getElementById("videoelement").textTracks;


The textTrackList is an object associated with the video element that contains a list of the textTrack objects. To get a list of tracks used with a certain video (if there are any), the video object provides the textTracks property. The textTracks property is an object of type textTrackList, and is an array of the textTrack objects associated with the video.

  var oVideo = document.getElementById(“videoElement”);
  var oTTlist = oVideo.textTracks;   // get the textTrackList 
  var oTrack = oTTlist[0];          // get the first text track on the video object
  var oTrack.track
Property Description


Returns the number of textTrack objects associated with a video element.


Returns the nth textTrack in the video element's list of tracks.


Returns the track element's text track.


The following example shows how to get a textTrackList object, which provides a list of all tracks of an element. The textTracks property on the video object returns the textTrackList object. The textTrackList object is an array of textTrack objects.

<!DOCTYPE html >

<html >
    <title>Track list example</title>
  <script type="text/javascript">
      function getTracks() {
         // get list of tracks
          var allTracks = document.getElementById("video1").textTracks;
         //append track label
          for (var i = 0; i < allTracks.length; i++) {
              document.getElementById("display").innerHTML += (allTracks[i].label + "<br/>");  
<video id="video1" >
<source src="video.mp4">
  <track id="entrack" label="English subtitles" kind="captions" src="example.vtt" srclang="en">
  <track id="sptrack" label="Spanish subtitles" kind="captions" src="examplesp.vtt" srclang="es">
  <track id="detrack" label="German subtitles" kind="captions" src="examplede.vtt" srclang="de">
<button onclick="getTracks();">click</button>
<div id="display"></div>

The textTrackCueList and textTrackCue objects

The textTrack.cues property returns an array of textTrackCue objects. A textTrackCue object, or cue, includes an identifier, a start and end time, and other data.

Object Description


An array object that represents all the cues for a specific track.

var texttrackcuelist = trackelement.track.cues;


Represents a cue in a track.

var texttrackcue = texttrackcuelist[i];  // where i == an index into the track cue list array.


These objects expose the following properties.

Property Description


Returns the text track cue that corresponds to a given index.


Returns the number of text track cues in the list.


Returns a textTrackCueList object.


Returns the cues from the text track list of cues that are currently active, as a textTrackCueList object.


Returns the starting time of a timed text cue.


Returns the ending time of a timed text cue.


Returns a unique identifier for an individual cue


Indicates whether the video should stop when it reaches the endTime specified.


Returns the text value of a TextTrackCue.


Returns the text track object to which the textTrackCue belongs, or "null" otherwise.


These objects expose the following methods.

Method Description


Gets a cue from the textTrackCueList by the ID.


Returns the text tract cue text as a document fragment that consists of HTML elements and other Document Object Model (DOM) nodes.


The textTrackCue object exposes the following events.

Event Description


Fires when a cue is done.


Fires when a cue is active.


Working with cues

Using the cues property on the track element, you can get an array or list of all the cues on that track. The textTrack.cues property returns an array of textTrackCue objects. The textTrackCue object, or cue, includes an ID, the start and end time, and text.

Get the Current cue

In contrast to the cues property, which gets all cues associated with a track, the activeCues property gets you just the ones that are currently being displayed. The following example displays the startTime and endTime of the subtitle being displayed.

<!DOCTYPE html >

<html >
    <title>Current cue example</title>
  <script type="text/javascript">
      function eID(elm) {
          return document.getElementById(elm);  // create short cut to getElementById()

      // after elements are loaded, hook the cuechange event on the track element
      window.addEventListener("load", function () {
          eID("track").addEventListener("cuechange", function (e) {
              var myTrack = e.target.track;  // the target property is the track element
              var myCues = myTrack.activeCues;   // activeCues is an array of current cues.
              //  display the start and end times
              eID("display").innerHTML = myCues[0].startTime + " --> " + myCues[0].endTime;
          }, false);

      }, false);
<video id="video1"  controls autoplay>
 <source src="movie.mp4"  >
 <track id='track' label='English captions' src='captions.vtt' kind='subtitles' srclang='en' default  > 
<div id="display"></div>

Get all the cues

The following example gets all the cues associated with a track, and displays them in a select box. When you click an item in the box, that cue is played on the video. Use the search to filter the results to a specific keyword.

<!DOCTYPE html>
<title>All cues example</title>

function loadCaptions(track) {
// retrive cues for track element
    var cues = track.track.cues;
    var list = document.getElementById('results');
    for (i = 0; i < cues.length; i++)
        var x = cues[i].getCueAsHTML(); //get the text of the cue
        var option = document.createElement("option"); // create an option in the select list
        option.text = x.textContent;        // assign the text to the option 
        option.setAttribute('data-time', cues[i].startTime);  // assign an attribute called data-time to option
        list.add(option);       // add the new option 

function playCaption(control) 
    var o = control.options[control.options.selectedIndex];  //get the option the user clicked
    var t = o.getAttribute('data-time');        // get the start time of that option
    var video = document.getElementById('video');   // get video element. 
    video.currentTime = t - 0.1;                //  move the video to start at the time we want (subtrackting a fudge factor)

function search(text) 
    var cues = eID('track').track.cues;     // retrieve a list of cues from current track
    var list = eID('results');              // get the select box object
    list.innerHTML = '';                    // clear the select box 
    for (i = 0; i < cues.length; i++) {     // scan through list of cues
        var cuetext = cues[i].getCueAsHTML().textContent;  // get the text content of the current element in cue lists 
        if (cuetext.toLowerCase().indexOf(text.toLowerCase()) != -1)  // does the cue contain the key we're looking for? 
        {                                                             // if a match, create a new option to add to the select box
            var option = document.createElement("option");         
            option.text = cuetext;                                  
            option.setAttribute('data-time', cues[i].startTime);


   <video id='video' controls autoplay loop>
     <source src='movie.mp4'>
     <track id='track' label='English captions' src='captions.vtt' kind='subtitles' srclang='en' default onload='loadCaptions(this)'>
   <div>Search captions <input onkeyup='search(this.value)' /> </div>
   <div>Click a caption to jump to that time in the video</div>
   <div><select size='10' id='results' onchange='playCaption(this)'></select></div>


API Reference

HTML5 Audio and Video

Samples and tutorials

HTML5 Timed Text Track sample

Make your videos accessible with Timed Text Tracks

Internet Explorer Test Drive demos

HTML5 Video Caption Maker

IE10 Video Captioning

IEBlog posts

HTML5 Video Captioning


HTML5: Section