Compartir a través de


Why is the NULL bitmap in a record an optimization?

I've had some questions sent in comments and I wanted to reply to some of them using a post so others who aren't subscribed to the comment sections can see the answers.

 

Wesley Backelant said:

I was wondering what exactly you mean by "this allows an optimization when reading columns that are NULL". Could you elaborate on that?

Firstly, having a null bitmap removes the need for storing special 'NULL' values for fixed-length datatypes. Without the null bitmap, how can you tell whether a column is NULL?   That's easy for a variable-length column - just check the length. If it's zero, then the column is NULL.   [Edit] Ryan Stonecipher (the dev responsible for DBCC) pointed out that I'd forgotten the case of empty strings - thanks Ryan. In this case, an empty string also has a zero length so for varchar columns you'd need to use the fixed-length solution described below.   It's not so easy for fixed-length columns, which, as their name suggests, have a fixed-length so that trick doesn't work. The only solution is to define a special 'NULL' value, which limits the effective range of the datatype being stored.   Secondly, it saves CPU cycles. If there was no NULL bitmap, then there are extra instructions executed for fixed- and variable-length columns.   For fixed-length:
  1. read in the stored column value (possibly taking a cpu data cache miss)
  2. load the pre-defined NULL value for that datatype (possibly taking a cpu data cache miss, but only for the first read in the case of a multiple row select)
  3. do a comparison between the two values

For variable-length:

  1. calculate the offset of the variable length array
  2. read the number of variable length columns (possibly taking a cpu data cache miss)
  3. calculate the position in the variable length offset array to read
  4. read the column offset from it (possibly taking a cpu data cache miss)
  5. read the next one too (possibly taking another cpu data cache miss, if the offset in step 4 was on the boundary of a cache line size)
  6. compare them to see if they're the same

But with a NULL bitmap, all you have to do is:

  1. read the NULL bitmap offset (possibly taking a cpu data cache miss)
  2. calculate the additional offset of the NULL bit you want to read
  3. read it (possibly taking a cpu data cache miss)

So, its about even for a lookup of a single fixed-length column, but for variable-length columns, and for multiple row selects, there's a clear advantage to having the NULL bitmap.

Hope this makes my comment a bit clearer.

Comments