Working with Binary Data using Typed Arrays
With HTML5 comes many APIs that push the envelope on user experiences involving media and real-time communications. These features often rely on binary file formats, like MP3 audio, PNG images, or MP4 video. The use of binary file formats is important to these features to reduce bandwidth requirements, deliver expected performance, and interoperate with existing file formats. But until recently, Web developers haven’t had direct access to the contents of these binary files or any other custom binary files.
This post explores how Web developers can break through the binary barrier using the JavaScript Typed Arrays API, and explore its use in the Binary File Inspector Test Drive demo.
Typed Arrays, available in IE10 Platform Preview 4, enable Web applications to use a broad range of binary file formats and directly manipulate the binary contents of files already supported by the browser. Support for Typed Arrays has been added throughout IE10: in JavaScript, in XMLHttpRequest, in the File API, and in the Stream API.
Binary File Inspector
The Binary File Inspector test drive demo highlights some of the new capabilities offered by this combination of new features. You can see the ID3 headers for music files, get a sense of the raw bytes in video files, and also see how additional file formats, like the PCX image file format, can be supported in the browser with the use of JavaScript and Canvas.
In the example above, an .mp4 video is rendered using a <video> element on the left, and the binary contents of the file are displayed on the right, both in HEX form, and as corresponding ASCII characters. In this example, you can see some characteristic elements of the MPEG file format, such as the “ftyp” of “mp4.”
Typed Arrays and ArrayBuffers
Typed Arrays provide a means to look at raw binary contents of data through a particular typed view. For example, if we want to look at our raw binary data a byte at a time, we can use a Uint8Array (Uint8 describes an 8-bit unsigned integer value, commonly known as a byte). If we want to read the raw data as an array of floating point numbers, we can use a Float32Array (Float32 describes a 32-bit IEE754 floating point value, commonly known as a floating point number). The following types are supported:
Array Type | Element size and description |
---|---|
Int8Array | 8-bit signed integer |
Uint8Array | 8-bit unsigned integer |
Int16Array | 16-bit signed integer |
Uint16Array | 16-bit unsigned integer |
Int32Array | 32-bit signed integer |
Uint32Array | 32-bit unsigned integer |
Float32Array | 32-bit IEEE754 floating point number |
Float64Array | 64-bit IEEE754 floating point number |
Each array type is a view over an ArrayBuffer. The ArrayBuffer is a reference to the raw binary data, but it does not provide any direct way to interact with the data. Creating a TypedArray view of the ArrayBuffer provides access to read from and write to the binary contents.
The example below creates a new ArrayBuffer from scratch and interprets its contents in a few different ways:
// Create an 8 byte buffer
var buffer = new ArrayBuffer(8);
// View as an array of Uint8s and put 0x05 in each byte
var uint8s = new Uint8Array(buffer);
for (var i = 0; i < 8; i++) {
uint8s[i] = 5; // fill each byte with 0x05
}
// Inspect the resulting array
uint8s[0] === 5; // true - each byte has value 5
uint8s.byteLength === 8; // true - there are 8 Uint8s
// View the same buffer as an array of Uint32s
var uint32s = new Uint32Array(buffer);
// The same raw bytes are now interpreted differently
uint32s[0] === 84215045 // true - 0x05050505 == 84215045
In this way, Typed Arrays can be used for tasks such as creating floating point values from their byte-level components or for building data structures that require a very specific layout of data for efficiency or interoperation.
Typed Arrays for Reading Binary File Formats
An important new scenario enabled by Typed Arrays is to read and render the contents of custom binary file formats that are not natively supported by the browser. As well as the various array types introduced above, Typed Arrays also provide a DataView object that can be used to read and write the contents of an ArrayBuffer in an unstructured way. This is well suited to reading new file formats, which are typically made up of heterogeneous mixes of data.
The Binary File Inspector demo uses DataView to read the PCX file format and render it using a <canvas> element. Here’s a slightly simplified version of what the demo does to read the file header, which includes information like the width, height, DPI, and bits-per-pixel of color depth.
var buffer = getPCXFileContents();
var reader = new DataView(buffer);
// Read the header of the PCX file
var header = {}
// The first section is single bytes
header.manufacturer = reader.getUint8(0);
header.version = reader.getUint8(1);
header.encoding = reader.getUint8(2);
header.bitsPerPixel = reader.getUint8(3);
// The next section is Int16 values, each in little-endian
header.xmin = reader.getInt16(4, true);
header.ymin = reader.getInt16(6, true);
header.xmax = reader.getInt16(8, true);
header.ymax = reader.getInt16(10, true);
header.hdpi = reader.getInt16(12, true);
header.vdpi = reader.getInt16(14, true);
Code similar to the above can be used to add support for rendering a broad range of new data formats in the browser including examples like custom image formats, additional video file formats or domain-specific map data formats.
Getting Binary Data with XHR and File API
Before we can use the Typed Arrays APIs to work with the contents of files, we need to use browser APIs to get access to the raw data. For accessing files from the server, the XMLHttpRequest API has been extended with support for various “responseType”s. The “arraybuffer” responseType provides the contents of the requested server resource to JavaScript as an ArrayBuffer object. Also supported are the “blob,” “text” and “document” response types.
function getServerFileToArrayBufffer(url, successCallback) {
// Create an XHR object
var xhr = new XMLHttpRequest();
xhr.onreadystatechange = function () {
if (xhr.readyState == xhr.DONE) {
if (xhr.status == 200 && xhr.response) {
// The 'response' property returns an ArrayBuffer
successCallback(xhr.response);
} else {
alert("Failed to download:" + xhr.status + " " + xhr.statusText);
}
}
}
// Open the request for the provided url
xhr.open("GET", url, true);
// Set the responseType to 'arraybuffer' for ArrayBuffer response
xhr.responseType = "arraybuffer";
xhr.send();
}
In many cases files are provided by the user, for example as an attachment to an email in a Web mail application. The File API offers Web developers tools to read the contents of files provided via an <input> element, drag-and-drop or any other source that provides Blobs or Files. The FileReader object is used to read the contents of a file into an ArrayBuffer and, like the XHR object, is asynchronous to ensure that reading from the disk does not prevent the user interface from responding.
function readFileToArrayBuffer(file, successCallback) {
// Create a FileReader
var reader = new FileReader();
// Register for 'load' and 'error' events
reader.onload = function () {
// The 'result' property returns an ArrayBuffer for readAsArrayBuffer
var buffer = reader.result;
successCallback(buffer);
}
reader.onerror = function (evt) {
// The error code indicates the reason for failure
if (evt.target.error.code == evt.target.error.NOT_READABLE_ERR) {
alert("Failed to read file: " + file.name);
}
}
// Begin a read of the file contents into an ArrayBuffer
reader.readAsArrayBuffer(file);
}
Conclusion
Binary data is heavily used by Web browsers. With support for Typed Arrays, XHR2 and the File API in IE10, Web applications can now work directly with binary data, to manipulate byte-level data, to render additional binary data formats, and to extract data from existing media file formats. Try out the Binary File Inspector test drive demo, and take Typed Arrays for a spin in IE10.
—Luke Hoban, Program Manager, JavaScript
Comments
Anonymous
December 01, 2011
Will it be possible to get access to TypedArray's bytes from native plugins, like now I can get access to pixels through ICanvasPixelArrayData COM interface ? It will be very helpfull and performant way to exchange large binary data between JS and native plugins.Anonymous
December 01, 2011
I've got a question about Typed arrays. It's said to replace string manipulation using charCodeAt. I'm fine with it, but I've got a problem. There's no (easy) way at this time to read or write a string from an ArrayBuffer. This seems a problem to me because (1) old code will continue to rely on String manipulation and (2) data exchange between WebWorkers and WebPages (postMessage) is limited to Strings at this time. Is there a plan to provide an efficient readString / writeString / fromString / toString implementation or will we have to implement that on ourself, causing an important performance hit? I'm also eager to learn more about Typed Array performance in IE10, and especially how you handle compilation. Do you use native types? Best regards, FrançoisAnonymous
December 01, 2011
Binary arrays seem pretty awesome, although I do worry that endianness differences between platforms is going to result in compatibility problems for web apps.Anonymous
December 01, 2011
I can understand the need for typed arrays but the idealistic me is not happy about it being "mainstream" now because JS is supposed to be a dynamic and untyped language.Anonymous
December 01, 2011
@Prebio JavaScript has never been untyped and it is still dynamically typed even with Typed Arrays which has nothing to do with the typing discipline of the programming language.Anonymous
December 01, 2011
Any chance of combining this with the well proven file command from unix to determine the file type?Anonymous
December 04, 2011
Pardon me if I'm wrong, but wasn't the main reason for introducing typed arrays because of WebGL?Anonymous
December 04, 2011
The comment has been removedAnonymous
December 05, 2011
@Jason 5 Dec 2011 7:12 AM: Don't know how valid your points are since you appear to be behind times about three years... (You just noticed that?)Anonymous
December 05, 2011
The comment has been removedAnonymous
December 06, 2011
Would be useful to view in Developer Tools all the blobs acquired via URL.createObjectURL.Anonymous
December 06, 2011
The comment has been removedAnonymous
December 11, 2011
I'm really excited about the potential new scenerios this enables like the h.264 decoding eg: mbebenita.github.com/.../broadway.html