Source Maps under the hood – VLQ, Base64 and Yoda
Source Maps can be extremely useful whenever there is a preprocessor that modifies an input into its corresponding output. In such cases, a source map enables one to figure out the original location (line, column) that corresponds to a given location of the output (line, column), and vice-versa.
This article discusses source maps, the problem they try to solve, how they work and some of the why it is the way it is. If I am successful in my attempt to explain things, by the end of the article you will have all the tools you need to understand or create your own source maps – even by hand if you want to!
Note: This article won’t be very useful for the vast majority of people who simply want to use source maps as opposed to understand source maps inside and out. If you don’t care about the details of how it works, then this article is probably not for you.
Why Yoda in the title of the article, you ask? Well, for the purposes of this article we will treat Yoda as a Jedi text preprocessor that takes words as inputs and puts them in semi-random order as output. TypeScript and CoffeScript are more complex preprocessors (although not necessarily Jedi), but the idea remains:
Input ⇒ Preprocessor ⇒ Output
Disclaimer: I am NOT an authoritative source on Source Maps and everything in this article is based on other articles I read online and on my own experimentation. If something looks wrong, it might very well be. Please leave a comment below and I’ll try to address it. My hope is that this will help explain some of the gaps not thoroughly covered in other places.
Who uses Source Maps?
Notable cases where source maps are commonly used include JavaScript compilers / transpilers such as CoffeeScript and TypeScript, or CSS preprocessors such as Less. All modern browsers support source maps to ease debugging: Chrome, Firefox, Safari, Edge and Internet Explorer 11 (since Windows 8.1 Update).
Here’s an excerpt describing the benefits, from here
When the compiled or compressed JavaScript contains a pointer to a source map, all files are present, and source map functionality is enabled, the Debugger tool shows the original source and maps breakpoints and errors to it.
With Visual Studio 2013 and up you can even debug TypeScript right within the IDE (when you hit F5 and choose IE as the browser). The ability to set breakpoint and control execution flow in the original TypeScript source is a very powerful feature, and the magic behind that lies in Source Maps. You can achieve most of the same awesomeness on other browsers as well by following a few steps to get source maps generated for you, but IE still provides by far the best debugging experience for TypeScript (I suppose IE doesn’t get that kind of praise too often, which is unfortunate). See Typescript debugging in Visual Studio with IE, Chrome and Firefox using source maps.
The most basic source map
One way to create a source map would be to simply store the source location (line, column) that corresponds to each output character’s location.
“feel the force” ⇒ Yoda ⇒ “the force feel”
Output location | Input | Input location | Character |
---|---|---|---|
Line 1, Column 0 | Yoda_input.txt | Line 1, Column 5 | t |
Line 1, Column 1 | Yoda_input.txt | Line 1, Column 6 | h |
Line 1, Column 2 | Yoda_input.txt | Line 1, Column 7 | e |
Line 1, Column 4 | Yoda_input.txt | Line 1, Column 9 | f |
Line 1, Column 5 | Yoda_input.txt | Line 1, Column 10 | o |
Line 1, Column 6 | Yoda_input.txt | Line 1, Column 11 | r |
Line 1, Column 7 | Yoda_input.txt | Line 1, Column 12 | c |
Line 1, Column 8 | Yoda_input.txt | Line 1, Column 13 | e |
Line 1, Column 10 | Yoda_input.txt | Line 1, Column 0 | f |
Line 1, Column 11 | Yoda_input.txt | Line 1, Column 1 | e |
Line 1, Column 12 | Yoda_input.txt | Line 1, Column 2 | e |
Line 1, Column 13 | Yoda_input.txt | Line 1, Column 3 | l |
This table could then be encoded like this (the colors below are used simply to make it easier to distinguish one mapping from the next):
Mappings (283 chars length): 1|0|Yoda_input.txt|1|5, 1|1|Yoda_input.txt|1|6, 1|2|Yoda_input.txt|1|7, 1|4|Yoda_input.txt|1|9, 1|5|Yoda_input.txt|1|10, 1|6|Yoda_input.txt|1|11, 1|7|Yoda_input.txt|1|12, 1|8|Yoda_input.txt|1|13, 1|10|Yoda_input.txt|1|0, 1|11|Yoda_input.txt|1|1, 1|12|Yoda_input.txt|1|2, 1|13|Yoda_input.txt|1|3
While this would work, the source map grows quickly and takes up a lot of space. Let’s see how we can improve it.
1. Improvement: Don’t write output line number each time. Use ‘;’ to indicate a line change
Because in many cases the output is condensed into fewer lines than the input (minified JavaScript for example), not having to write the output line location for each entry results in great space savings. The same table we had before can now be encoded as follows:
Mappings (245 chars length): 0|Yoda_input.txt|1|5, 1|Yoda_input.txt|1|6, 2|Yoda_input.txt|1|7, 4|Yoda_input.txt|1|9, 5|Yoda_input.txt|1|10, 6|Yoda_input.txt|1|11, 7|Yoda_input.txt|1|12, 8|Yoda_input.txt|1|13, 10|Yoda_input.txt|1|0, 11|Yoda_input.txt|1|1, 12|Yoda_input.txt|1|2, 13|Yoda_input.txt|1|3;
2. Improvement: Table of names and inputs, referencing by indexes
Index | Input |
---|---|
0 | Yoda_input.txt |
Index | Name |
---|---|
0 | the |
1 | force |
2 | feel |
Output location | Input index | Input location | Name index |
---|---|---|---|
Line 1, Column 0 | 0 | Line 1, Column 5 | 0 |
Line 1, Column 4 | 0 | Line 1, Column 9 | 1 |
Line 1, Column 10 | 0 | Line 1, Column 0 | 2 |
Now this can be encoded as:
Inputs: Yoda_input.txt
Names: the,force,feel
Mappings (31 chars length): 0|0|1|5|0, 4|0|1|9|1, 10|0|1|0|2;
3. Improvement: Use relative (instead of absolute) offsets
This idea will only be useful when leveraged by improvement 4 (VLQ encoding - Variable length Quantities). The goal here is to try to make the numbers that we encode smaller, because we can encode smaller numbers more efficiently than larger numbers thanks to VLQ.
The way we try to make most of the numbers smaller is by using relative offsets. Looking at the first column of the previous table, here’s what relative offsets mean:
Output location(improvement 2) | Output location(improvement 3) |
---|---|
Line 1, Column 0 | Line 1, Column 0 |
Line 1, Column 4 | Line 1, Column (last + 4 = 4) |
Line 1, Column 10 | Line 1, Column (last + 6 = 10) |
Applying that to the entire table, we end up with:
Output location | Input index | Input location | Name index |
---|---|---|---|
Line 1, Column 0 | 0 | Line 1, Column 5 | 0 |
Line 1, Column +4 | +0 | Line 1, Column +4 | +1 |
Line 1, Column +6 | +0 | Line 1, Column -9 | +1 |
And this can now be encoded as:
Inputs: Yoda_input.txt
Names: the,force,feel
Mappings (31 chars length): 0|0|1|5|0, 4|0|1|4|1, 6|0|1|-9|1;
4. Improvement: VLQ (Variable Length Quantities)
4.1. VLQ in the decimal system
The idea behind VLQ is rather simple. Imagine you wanted to represent 4 decimal number in order. One option is to use a special character to separate those numbers:
1|2|3|4
What if you knew ahead of time that the numbers are all single digits? In that case you don’t need the separator characters, and the following would still be perfectly understandable, but much shorter (half the number of characters in this case):
1234
But what if each number could be more than a single digit? Well, with VLQ all you have to do is add an indicator to digits to indicate that the number still has more digits. Let’s encode the following numbers using VLQ:
1|23|456|7
Using an underline as our indicator that a number has more digits (the continuation indicator):
1234567
Here’s how to read this:
- 1 does not have an underscore, so that’s it → The first number is 1
- 2 has an underscore, so we continue; 3 does not, so that’s it → The second number is 23
- 4 has an underscore, so we continue; 5 also has an underscore, so we continue; 6 does not, so that’s it → The third number is 456
- 7 does not have an underscore, so that’s it → The fourth number is 7
4.2. VLQ in binary form
I cheated a little bit on the example above with the decimal system. I introduced the underscores as the continuation indicator, but underscores are, of course, not part of the decimal system. Introducing underscores is effectively the same as using a base-20 system instead (there are now 20 possible values for a digit: 0 – 9 not underlined and 0 – 9 underlined).
In binary, however, we don’t have to cheat. How nice! Instead, we will use binary groups made of 6 bits (for a total of 64 possible values), and we will use one of those bits as the continuation indicator (‘C’ below). While at it, let’s also reserve another bit as a sign indicator (‘S’ below). ‘0’ will indicate positive; ‘1’ will indicate negative.
First binary group (4 bits for value):
B5 | B4 | B3 | B2 | B1 | B0 |
---|---|---|---|---|---|
C | Value | S |
Here’s what each of those binary ‘groups’ can represent:
Binary group | Meaning |
---|---|
000000 | 0 |
000001 * | -0 |
000010 | 1 |
000011 | -1 |
000100 | 2 |
000101 | -2 |
… | … |
011110 | 15 |
011111 | -15 |
100000 | 0 with continuation |
100001 | -0 with continuation |
100010 | 1 with continuation |
100011 | -1 with continuation |
… | … |
111110 | 15 with continuation |
111111 | -15 with continuation |
* Not required, but technically possible
There is one additional improvement we can make: when a binary group has a continuation, the next binary groups don’t need to specify the sign bit again. We already know the sign from the first group, so we can use up to 5 bits for the value. It still needs the continuation bit, though, because the numbers VLQ encodes could be arbitrarily large and might require more than 2 binary groups to be represented.
Continuation binary group (5 bits for value):
B5 | B4 | B3 | B2 | B1 | B0 |
---|---|---|---|---|---|
C | Value |
Let’s now encode the same numbers we had before, but in binary: 1|23|456|7.
Helper table for decimal to binary conversions:
Decimal | Binary |
---|---|
1 | 1 |
23 | 10111 |
456 | 111001000 |
7 | 111 |
Encoding decimal 1:
The corresponding binary value takes up a single bit (1). This will easily fit inside the first binary group (it has space for 4 bits). So this is what we get:
B5(C) | B4 | B3 | B2 | B1 | B0(S) |
---|---|---|---|---|---|
0 | 0 | 0 | 0 | 1 | 0 |
Encoding decimal 23:
The corresponding binary value takes up 5 bits (10111), so it doesn’t fit in the first binary group and therefore requires a continuation. We will put the 4 least-significant bits in the first group (0111) and leave the rest (1) for the continuation group.
B5(C) | B4 | B3 | B2 | B1 | B0(S) | B5(C) | B4 | B3 | B2 | B1 | B0 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
Encoding decimal 456:
The corresponding binary value takes up 9 bits (111001000), so it doesn’t fit in the first binary group and therefore requires a continuation. We will put the 4 least-significant bits in the first group (1000) and leave the rest (11100) for the continuation group.
B5(C) | B4 | B3 | B2 | B1 | B0(S) | B5(C) | B4 | B3 | B2 | B1 | B0 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 |
Encoding decimal 7:
The corresponding binary value takes up 3 bits (111). This will easily fit inside the first binary group (it has space for 4 bits). So this is what we get:
B5(C) | B4 | B3 | B2 | B1 | B0(S) |
---|---|---|---|---|---|
0 | 0 | 1 | 1 | 1 | 0 |
The final binary result
Now all we have to do is concatenate the results we have so far:
000010 101110 000001 110000 011100 001110
Base64 encoding
As it turns out, traditional base64 encoding defines an alphabet of 64 characters. Values from 0 – 63 can be encoded as a single character. Well, the initial choice to use 6 bits per binary group was not arbitrary. By using 6 bits per group we can now encode each group as a single character as per the base 64 alphabet. Here’s what we get for the bits above:
CuBwcO
4.3. Applying VLQ + base64 to the Yoda example
Following the same steps as outlined above, we can now encode what we had at the end of Improvement 3 by using VLQ + base64. This is what we get:
Inputs: Yoda_input.txt
Names: the,force,feel
Mappings (18 chars length): AACKA, IACIC, MACTC;
5. Improvement: Omit unnecessary fields
A mapping can omit some of the fields if those fields don’t provide any useful information. This means that a mapping entry can be any of the following options:
- 5 fields: Output column, input file name index, input line, input column, name index
- 4 fields: Output column, input file name index, input line, input column
- 1 field: Output column
Sadly, the simplistic Yoda example in this article wouldn’t benefit from this.
Wrapping up
Here’s a sample source map for the Yoda example, which should no longer be scary to look at.
{ "version": 3, "file": "Yoda_output.txt", "sources": ["Yoda_input.txt"], "names": ["the", "force", "feel"], "mappings": "AACKA,IACIC,MACTC;" }
Pretty cool, right? Next time you see Yoda, remember to ask him for a Source Map. You can also use an online tool by Alexander Pavlov available here to play with VLQ + base64 and see it working in real time.
Let me know what you think in the comments and I will see you next time.
Useful links
- Source Map Revision 3 Proposal (official)
- Online tool to play with VLQ + base 64 encodings
- Another article that tries to explain Source Maps and VLQ encoding.
- Library to generate and consume source maps on GitHub
Comments
- Anonymous
July 26, 2016
That is one of the most helpful explanation on source mapping mechanism! Thanks!- Anonymous
December 05, 2017
I learnt a lot from this article!
- Anonymous
- Anonymous
December 29, 2016
Thank you for this excellent article. I'm left with one question - if Yoda was transforming a file that already had a sourcemap from a previous transformation would things be any different? Does it need to offset it's sourcemap against the existing map somehow?- Anonymous
December 29, 2016
Thanks Darrell, glad you liked it. If I understand you correctly, your scenario is that you applied 2 (or more) consecutive transformations to a file, each of which generated its corresponding source map, and now you want a final source map that represents the complete transformation from initial to final state. Looks like others have already done that, see if these links help:https://www.npmjs.com/package/source-map-merger (Github: https://github.com/jakobwesthoff/source-map-merger)https://www.npmjs.com/package/merge-source-map (Github: https://github.com/keik/merge-source-map)
- Anonymous