Understanding IDENTITY in SQL Server
This article is aimed at covering all the aspects of identity, its various features and concepts along with a couple of interesting scenarios where there is a bit different behavior of Identity.
Introduction
Often we find ourselves in a situation where we would be needing a column to contain a unique set of values (without any NULLs) that can possibly act as the primary key of the table. Since this column is to preserve the uniqueness of the rows and contains not much significance (in some cases, not at all) to the business with respect to the values that it may hold, entering values for every record can be a pain.
For this purpose, we have Identity property in the SQL Server. Upon defining a column with this property, one may have values automatically being generated for every row thereafter. Since we have been seeing so many threads on the various aspects of IDENTITY in the MSDN Forums, this article would help cover all the nuances and queries available that are needed to work with IDENTITY in SQL Server.
IDENTITY Property
We can assign this identity property to a column during the table definition itself or during the addition of a column as well. It is given using the IDENTITY keyword, along with a start_value and an increment_value. For example 1,1 indicates that the identity generated would be from 1 onward with an increment of 1 for every row like : 1,2,3,4... etc. Thereafter, during insertion of values, one need not provide values for the identity column; since we would be providing values for every other column only this would require us to have an insert statement with embedded column names as shown below:
--Creation of a sample table with an identity column
create table #tab
(
id int identity(1,1),
name varchar(100),
constraint pk_temp primary key(id)
)
--insertion of values to the table
insert #tab(name) select 'jk'
--insertion of default values (meaning for the identity column alone)
insert #tab default values
Here the values inserted into the table are as below:
id | name |
1 | jk |
2 | NULL |
If you noticed correctly, the second insert statement was different from the first one by the fact that it didn't have any values for name column but instead had the keywords 'default values'. This keyword instructs the compiler to insert just the 'default' values for the record - which includes the identity and any default values assigned via a default constraint.
*(Note: The above 'default values' feature would throw an error if there are any columns with a NOT NULL upon them)
*
*We can use the IDENT_SEED, IDENT_INCR functions to retrieve the identity seed and increment values:
*
SELECT
IDENT_SEED('#tab') AS Seed,
IDENT_INCR('#tab') AS Increment
IDENTITY_INSERT option
Not always would we resort to the practice of sticking with the auto-generated value. We MAY want to go in for a user-defined value, which when given explicitly throws the explicit value error. In such cases, enabling the IDENTITY_INSERT option allows us to provide explicit values for the column. But it is to be noted that in case we give a pre-inserted value or an ambiguous one, it shall throw the duplicate value error. The below code would explain the process:
--insertion of values explicitly for all columns (including identity column)
insert #tab(id,name) select 30,'jk'
/*
**ERROR**
Msg 544, Level 16, State 1, Line 1
Cannot insert explicit value for IDENTITY column in table '#tab________________________________________________________________________________________________________________00000000D7A6' when IDENTITY_INSERT is set to OFF.
*/
--Insertion of explicit values for IDENTITY column after setting IDENTITY_INSERT property
SET IDENTITY_INSERT #tab ON
--insertion of values explicitly for all columns (including identity column)
insert #tab(id,name) select 30,'jk'
SET IDENTITY_INSERT #tab OFF
insert #tab(name) select 'Madhu' --value is 3 now
Getting the last inserted Identity value
In a pragmatic view of a database solution involving the identity column we would need to know the last inserted identity value for various purposes. The immediate thought for this would be to use the maximum value of the table as below:
select max(id)
from #tab
But, it is to be noted that this would give only the max id value among the records currently present in the table, that is if there was a deletion of the last record which was inserted, the same wouldn't have been taken into account!! So to avoid this, we have custom solutions available as given below. While the CHECKIDENT is a DBCC command to show the identity property values as a message, ident_current is a scalar function to return the last identity value of the table. Both of them take the respective table_name as the input.
--Viewing the latest identity value inserted
DBCC CHECKIDENT('#tab')
select ident_current('#tab')
Adding to this, there are also another two methods available to obtain the latest identity value, except that they are not specific to any table but to a session. So for example, if the #tab table shown above, has a last identity value of 3 and when we create a new table say, #newtable with identity property and insert one row then the below methods would yield 1 and not 3.
--After creation of another table in the same scope
create table #newtab(id int identity)
insert #newtab default values
--The following gives the latest identity value in the whole scope (1)
--(Irrespective of table)
select @@IDENTITY
select scope_identity()
--cleanup
drop table #newtab
Identity() Function
We have the INTO clause in SQL Server to push the results of a particular SELECT query into a new table. However when we do so, we may or may not have a primary key and in such cases, we may use this identity function to generate a sequential vale into the new table. The function takes an input of the datatype in addition to the usual set of seed (start_value) and the increment values. The following example would help to demonstrate this:
--IDENTITY() Function to be used with into clause
declare @tab table(name varchar(10))
insert @tab select 'jay'
insert @tab select 'jk'
select identity(int,1,1) as id,name
into #temp
from @tab
select * from #temp
--cleanup
drop table #temp
It is to be noted that, this identity function can be used only with the INTO clause and not generally as a column in a SELECT query. (For that case, we may always go for a ROW_NUMBER() with the OVER() clause)
Identity RESEED
In many a case, we would want to reseed the identity values being generated - for example at the end of a given year I may want the values to start from 2014001 instead of 2013xxx. So in such cases, we may reseed the identity value of the table using CHECKIDENT.
--IDENTITY Reseed to new value
select * From #tab -- last inserted is 3
DBCC CHECKIDENT('#tab',RESEED,100)
/*
Checking identity information: current identity value '3', current column value '100'.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
*/
insert #tab default values
select * from #tab --101
So, the CHECKIDENT actually resets the last identity value to the one mentioned in the command (here, 100) so that future records carry the value as 'value+1'.
It is also to be noted that, if the table is not empty, setting the identity value to a number less than the maximum value in the identity column can result in one of the following conditions:
If a PRIMARY KEY or UNIQUE constraint exists on the identity column, error message 2627 will be generated on later insert operations into the table because the generated identity value will conflict with existing values.
If a PRIMARY KEY or UNIQUE constraint does not exist, later insert operations will result in duplicate identity values
Removal of the Identity Property
One may remove the IDENTITY property from a column not using any ALTER DROP CONSTRAINT or property but by just dropping the column itself. The simple approach would be is to create a new column, transfer values to the new column and drop the old column as given below:
--Removing identity property
alter table #tab add id_new int
update #tab set id_new=id
alter table #tab drop column id
exec sp_rename '#tab.id_new','id','column
select * from #tab
Getting IDENTITY info
We can get information of the identity column using simple queries as below:
--to get the identity column of a table
select name,*
from sys.columns
where object_id=object_id('tempdb..#tab') and is_identity=1
We can check if a column is having identity property or not using the COLUMNPROPERTY function
--to check if a column is identity
select COLUMNPROPERTY(object_id('tempdb..#tab'),'id_new','isidentity')
select columnproperty(object_id('tempdb..#tab'),'id','isidentity')
We can retrieve the identity current value of specific table using the function IDENT_CURRENT
SELECT IDENT_CURRENT('#tab') AS CurrentIdentity
Interesting Scenarios
Let us look at a couple of interesting scenarios with respect to the identity property of SQL Server which one may come across during regular SQL development.
Effect of TRUNCATE on Identity
Let us see what happens upon the truncation of a table having an identity column:
--last identity value of table is 3
truncate table #tab
insert #tab select 'Vidhya' --1
So, as seen above, the newly inserted value after truncate is 1 which is nothing but the seed value of the column defined as a part of the table setup. In the absence of a seed value, 1 is used as default. Hence, it can be seen that there is an automatic reseed of the identity of the table after a TRUNCATE.
To make this interesting, lets see if the same happens with DELETE now. After all delete also removes (deletes) the records of the table.
--last identity value of table is 1
delete from #tab
insert #tab select 'Sathya' --2
--Value not reseeded
Though some of us may expect the new value to be 1 as before, it is NOT SO. Delete being a DML operation, doesn't play with the fundamental properties of the table like the TRUNCATE (which is a DDL).
RESEED with TRUNCATE and DELETE
Lets take this to the next level, by combining reseed with truncate and delete. As we saw before, truncate by itself reseeds the table identity while delete does not. Also, as covered in the RESEED topic, it is to be noted that the new value after a reseed shall take up a value of 'n+1' where n is the reseed value given in the DBCC command.
Execute the below query and analyse the results:
--Reseed with respect to DELETE and TRUNCATE
--First cleaning up all tables created
drop table #temp,#tab
--Query
create table #tab (id int identity(1,1))
insert #tab default values--1
insert #tab default values--2
/**INITIAL**/
select id as initial_data from #tab
/**DELETE**/
delete from #tab --deleting all records
dbcc checkident('#tab',reseed,100) --reseeding to 100
insert #tab default values --101 as expected
select id as data_after_delete_reseed_100 from #tab
/**TRUNCATE**/
truncate table #tab --TRUNCATING the table
dbcc checkident('#tab',reseed,100) --reseeding to 100 same as before
insert #tab default values --Expected value is 101 but actually inserted value -> 100 !!
select id as data_after_truncate_reseed_100 from #tab
As we can see from above, the value inserted after the reseed is different from our expected result in case of the truncate. This is an unique behavior of reseed because of the absence of rows caused by the TRUNCATE command. This is explained by the MSDN documentation as below:
Current identity value is set to the new_reseed_value. If no rows have been inserted into the table since the table was created, or if all rows have been removed by using the TRUNCATE TABLE statement, the first row inserted after you run DBCC CHECKIDENT uses new_reseed_value as the identity. Otherwise, the next row inserted uses new_reseed_value + the current increment value.
**
**I myself came across this behavior of reseed with truncate pretty recently thanks to a post in the forums. Also thanks to Erland for rightly pointing out the issue to me :). I had created a post regarding this as well here.
Limited Scope of SCOPE_IDENTITY and @@IDENTITY
As we have seen before in this article in the Getting last identity value section, @@IDENTITY and SCOPE_IDENTITY() gets us the last inserted identity value in the entire session and not for any particular table. However, it is to be noted that, this behavior is for a limited scope only, i.e., it is for a specific session involving the user's direct commands only. The following script should illustrate this in a succinct manner:
First lets create a sample function which would involve insertion of values using identity property.
--lets create a function
create function fn_test()
returns int
as
begin
declare @tab table(id int identity)
insert @tab default values
return @@identity
end
Now lets have our usual #tab sample table and try and get values using SCOPE_IDENTITY() / @@IDENTITY before and after calling this function:
--cleanup object if existing
if object_id( 'tempdb..#tab' ) is not null
drop table #tab
--create table with identity property
create table #tab(id int identity,name varchar(100))
insert #tab(name) select 'Jk'--1
insert #tab(name) select 'Madhu'--2
insert #tab(name) select 'Vidhya'--3
select scope_identity() --returns 3
select dbo.fn_test() --involves creation of identity of value 1
/**
--Both return the value 3 (of #tab) and not 1
--Hence, the scope is restricted to explicit commands only
--and not the scope within triggers, functions or other objects.
**/
select @@identity --3
select scope_identity()--3
Hence, it is to be known that, thought SCOPE_IDENTITY and @@IDENTITY are for the entire session, their scope doesn't extend to triggers or functions being called out, but just explicit commands issued in the session.
Gaps and Islands problem with IDENTITY
When a table with IDENTITY is being inserted and an error happens, in that case the particular identity pops out and the next value inserted will be a different one as shown below.
--cleanup
drop table #tab
CREATE table #tab (id int identity(1,1),col1 int)
INSERT #tab (col1) SELECT 101 --1
INSERT #tab SELECT 102--2
INSERT #tab SELECT 103--3
SELECT * FROM #tab
/*
id col1
----------- -----------
1 101
2 102
3 103
*/
INSERT #tab SELECT 'A'
/*
Msg 245, Level 16, State 1, Line 1
Conversion failed when converting the varchar value 'A' to data type int.
*/
INSERT #tab SELECT 200 --??!!
SELECT * FROM #tab
/*
id col1
----------- -----------
1 101
2 102
3 103
5 200
(4 row(s) affected)
*/
In this way, due to error during insertion, a gap in the identity is being generated. Since this qualifies as a perfect gaps and islands problem, the same can be handled using any of the gaps and island problem approaches available here on Technet Wiki.
SEQUENCE in SQL Server (Denali) 2012
We have a new object SEQUENCE from the Denali version which has its behavior pretty similar to the identity property we have seen so far. It also has a seed value, incrementing value and in addition the max value as well.
The SEQUENCE object can be created as a generic one which can then be used across tables. This particular aspect can be useful especially in cases wherein we might want to have a unique id across tables. (whether this is a scalable and normalized table design and whether this would really be used in reality - that's a different debate altogether :) )
SEQUENCE allows us to get successive values using the NEXT FROM clause as shown in the below example:
CREATE SEQUENCE [Sample_Seq]
AS [int]
START WITH 1
INCREMENT BY 1
MAXVALUE 20000
GO
--Getting just the value
SELECT NEXT VALUE FOR Sample_Seq as value1 --1
--Using with the table data
SELECT NEXT VALUE FOR Sample_Seq, name --2,3,4 ..
FROM #tab
When the max value of 20000 has been reached, automatic error would be thrown which can be countered by resetting the sequence as below:
ALTER SEQUENCE Sample_Seq RESTART WITH 1
This sequence object, though partially different from the IDENTITY, has its own set of uses and applications.
See Also
- Transact-SQL Portal
- http://msdn.microsoft.com/en-us/library/ms186775.aspx
- http://technet.microsoft.com/en-us/library/ff878091.aspx
- T-SQL: Gaps and Islands Problem
Conclusion
We have seen the different concepts of IDENTITY in SQL Server and the ways of handling the special scenarios associated with them as well.