Introduction to Indices

Article
09/10/2012

ESE is an ISAM engine -- Indexed Sequential Access Method. Wikipedia has an article on ISAMs: https://en.wikipedia.org/wiki/ISAM.

ESE allows you to insert data in a structured manner, with data types such as integer or strings. Then you need to be able to find the data, which is where indexing comes in. As one co-worker put it '80% of ESE's coolness is in its indexing'.

ESE stores records in a B+tree (https://en.wikipedia.org/wiki/B%2B_tree). This tree is ordered by the Primary Key. You can either use the default order (an internally-incrementing number, so any new records inserted just end up at the end), or you can explicitly define a primary index (sometimes called 'Clustered Index') by specifying JET_bitIndexPrimary to the index definition. This is very beneficial when you have a unique property that you know will be accessed sequentially. (For example sorting by Employee ID or by Social Security Number [US] or Social Insurance Number [Canada] could be useful. But a primary index on employee name wouldn't be good because two people may have the same name.)

It's usually more interesting to be able to sort on properties other than the Primary Index. For example, sorting by employee names can be useful. This can't be a primary index, since full names aren't unique. You may want to create an index over "+LastName\0+FirstName\0\0". And maybe an index over "+Salary\0\0". These would be secondary indices.

But how do secondary indices work? They are also B+trees. Take the following data:

EmployeeId	LastName	FirstName	Salary
3	Pertwee	Jon	1000
5	Agyeman	Freema	2200
101	Gillan	Karen	2200
202	Smith	Matt	2000

(Obviously 'Salary' is not unique, but 'EmployeeId' is.)

The Primary Index over 'EmployeeId' will have the primary keys of {3, 5, 101, 202}. The contents in that tree will be the records themselves.

The LastName/FirstName index would be: { [ "AGYEMAN\0FREEMA\0", 5 ], [ "GILLAN\0KAREN\0", 101 ], [ "PERTWEE\0JON\0", 3 ], [ "SMITH\0MATT\0", 202 ] }.

The Salary index would be { [ 1000, 3], [2000, 202], [2200, 5], [2200, 101] }.

Some things to note are:
-The contents of the secondary indices are the primary keys.
-By default, ESE converts ANSI text to uppercase for sorting. (Unicode text is treated differently.)
-The Salary index has a couple of duplicate entries. it's not a Unique index, so this is OK. The tie-breaker isthe value of the Primary Key.

If one was to insert a new record (e.g. { 55, "Baker", "Tom", 3000 } ), then a few things happen:
-By inserting in the middle, if there is no space left on the database page, then it may be necessary to split the page in two. Doing this page split may result in some discontiguous data. So specifying an index density of less than 100 when creating the index can avoid splits. (Data contiguity in general should be another blog topic...)
-It also requires updating the two secondary indices (including uniqueness checks). The more secondary indices you have, the more expensive this will be.

The uniqueness check is done at JetUpdate() time, not JetSetColumn/JetSetColumns.

Conclusion:

That just scratches the surface of indexing. Secondary indices are useful, but can also expensive. Use them wisely!

Partager via

Introduction to Indices

Ressources supplémentaires