Performance Optimization of Arrays - Part I

Hello Friends,

Have you observed better performance of Array operations in IE8 than IE7? If not, try the following code in both IE7 & IE8 and I can bet you would not leave this page without reading till the last word.

 var arrObj = new Array();
var count = 10000;
var before, after;
for(var i = 0; i < count; i ++)
before = new Date();
for(var i = 0; i < count; i ++)
after = new Date();
alert(after- before);

On my machine, IE7 took 7640 ms while IE8 took just 18 ms. As you see the above example is doing nothing but populating an array with 10000 entries and then popping them one by one, and just the pop operation is taking so much time in IE7.

This was just one example. Pick any array operation, and you would notice a huge difference between IE7 and IE8 performance.

So, won’t it be interesting to know why array operations in IE7 were taking so much time? Won’t you like to know how we dealt with all those issues? I am sure the answer is ‘yes’ and in the next few paragraphs that is what I have tried to explain. So keep reading.

Why Array operations were slow?

There were two main reasons which made Array operations so less performant. I will be explaining one of them in this post and leave other one for the next post, otherwise it would be so long to read that I would surely lose the bet.

1. JScript didn’t treat Arrays as Arrays – What does this mean? Arrays, in general sense, are considered a contiguous memory storage which enables fast random access to indexes. In JScript world, Arrays can be sparse, that means an Array can be of length 1 million but it may not have storage committed for all those 1 million indexes. But in real world scenarios, Arrays are hardly used as sparse arrays. They are mostly dense

Unfortunately JScript runtime was not handling arrays for real word usage. It always handled them as sparse arrays. Therefore it never committed any contiguous space for them, resulting into slower access to indexes.

Instead it used to insert indexed entries into a property bag which is nothing but a classic hash table whose keys are strings.

So if you are doing something like the following in your code…

 var arrObj = new Array();
 for (i = 0; i < 100; i ++)
 arrObj[i] = i;

… and expecting that JScript runtime internally would allocate a C-Style contiguous array, then sorry to say, it doesn’t meet your expectation at all. Instead what it does internally is…

a) Allocates a generic object, which is nothing but the HashTable.

b) Since the generic object (HashTable) has to be treated like an array object, associate a special attribute “length” with it and set its value to 0.

c) In the for loop, for each value of i

a. Convert ‘i’ to string (e.g. for i = 10, it would be “10”)

b. Add <key value> pair of <string equivalent of i, i> (e.g. <”10” , 10>) to the hash table.

c. Set the “length“attribute’s value properly.

So every time you access an indexed entry of an array, it first converts the index to string, computes hash value for the string and then looks up the hash table.

How we fixed it

No rewards if you already guessed it.

Now JScript runtime treats an Array Object as an special object, different from other JScript objects. Basically it maintains two storage areas for arrays. One is the old HashTable, which is used for storing named entries. The other one is special one which is used exclusively for indexed entries and resembles a C-Style Array.

So if you have some code like following…

 arrObj = new Array(20);
 for(var i = 0; i < 20; i ++ )
     arrObj[i] = i;
     arrObj.Name = “IE8 Array”;

… then for loop is adding entries to a different storage and “Name” is added to the different storage.

Ok. So looks like all indexed entries always go to new storage. No. That is not the case. There is a condition which should be met before we put an indexed entry to the new storage. The condition is that new entry must not make the array sparse and to decide that whether a particular indexed entry would make the array sparse or not, we have certain heuristics, for example…

 arrObj = new Array();
 arrObj[10] = 20;
 arrObj[50000] = 500000;

In above snippet, indexed entry 10 satisfies our heuristics and is added to new storage. But the indexed entry 50000 will not meet them and will be added to old HashTable as a normal named entry.

Cool. looks fine if you always populate the array such that it meets the heuristics, e.g. in an incrementing for loop starting from 0. But what if you want to populate it in a decrementing for loop starting from max length…

 arrObj = new Array();
 for(var i = 2000; i >=0 ; i -- )
   arrObj[i] = i;

or you want to populate the array from both ends …

 arrObj = new Array();
 var length = 2000;
 for(var i = 0; i < length/2 ; i ++ )
   arrObj[i] = i;
   arrObj[length – i - 1] = length – i;

In such scenarios, not all of your indexed entries will go to new storage and performance of further operations on this array would not be as great as it can be.

For such scenarios, to get working in most performant way, what you have to do is to pass the actual array size to constructor when you create the object. If you pass the size, JScript runtime assumes that you are sure that your array would be dense and don’t want our heuristics to ensure that. We will not exercise those heuristics for any indexed entry added within that size range, in whatsoever order they are added. So if you write something like following…

 arrObj = new Array(50000);
 arrObj[10] = 20;
 arrObj[50000] = 500000;

… both will go to the new storage even though the last indexed entry doesn’t satisfy heuristics. But do remember that anything beyond 50000, will have to meet the heuristics, else it would go to HashTable.

Time out. Hope you enjoyed reading it and would like to read second part as well. See you soon.