Separating Name to First Name Last Name and Middle Name

Debilon 431 Reputation points

Separating Name to First Name Last Name and Middle Name
Have a Name Column holding a name with a format that varies

case only two words exist in Name,
I need to separate name into firstName and LastName and delete comma if exist

case three words exist
i need to separate name into firstName LastName and MI and delete comma if exist

The results i am getting are a bit confusing as one can see in the attached picture

 -- DDL and sample data population, start  
 DECLARE @tbl AS TABLE (id int identity(1,1) PRIMARY KEY, string varchar(100));  
 ('FirstName LastName.'),  
 ('FirstName , LastName'),  
 ('FirstName , LastName M'),  
 ('FirstName LastName MiddleName');  
 select string,  
   parsename(replace(string, ' ', '.'), 1) as FirstName,  
   parsename(replace(string, ' ', '.'), 2) as FirstName,  
   parsename(replace(string, ' ', '.'), 3) as MiddleInitial   
 from @tbl as t  


A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,563 questions
{count} vote

Accepted answer
  1. Ronen Ariely 15,096 Reputation points


    What if the order of names is different?

    Can you confirm 100% that First name comes always before Last name and these two always comes before the Middle name?!?

    And what if the first name includes two words?!?

    You should have redesign you database better so that the data will be inserted from the start to three explicit columns in this case. Any solution we will provide will never give you the guarantee that the result is as expected if you cannot control the input in this case!

    Just for the sake of the discussion with a huge recommendation NOT TO USE IT IN PRODUCTION, but instead redesign your system

    This cover your current sample data:

    use tempdb  
    CREATE TABLE tbl (id int identity(1,1) PRIMARY KEY, string varchar(100));  
    ('FirstName LastName.'),  
    ('FirstName , LastName'),  
    ('FirstName , LastName M'),  
    ('FirstName LastName MiddleName');  
    ;With MyCTE AS (  
    	select id, string, EditedString =   
    		REPLACE(replace(replace(replace(REPLACE(REPLACE(string,'.',''), ',',' '),' ','<>'),'><',''),'<>',' '),' ','.')  
    		, Amount = LEN(REPLACE(replace(replace(replace(REPLACE(REPLACE(string,'.',''), ',',' '),' ','<>'),'><',''),'<>',' '),' ','.')) - LEN(REPLACE(REPLACE(replace(replace(replace(REPLACE(REPLACE(string,'.',''), ',',' '),' ','<>'),'><',''),'<>',' '),' ','.'),'.','')) + 1  
    	from tbl as t  
    select string,  
    	parsename(EditedString, Amount) as FirstName,  
    	parsename(EditedString, Amount-1) as LastName,  
    	parsename(EditedString, Amount-2) as MiddleInitial  
    from MyCTE  


    0 comments No comments

3 additional answers

Sort by: Most helpful
  1. Erland Sommarskog 102.3K Reputation points

    I had reason to look into this problem yesterday. I thought I would just do a quick thing before I called it off before the weekend, but I was soon reminded that this is a non-trivial problem. Let's just take two examples:

    Per Anders Fogelström
    Elsa Nilsson Tell

    How should these be handled? Now, these are Swedish name, so the penny may not drop directly for you, but if you think that Anders and Nilsson should fall into the same bin, you are wrong. The first person is a man whose first name is Per Anders. These double names are common in Sweden. Often they are written with a hyphen, but not always. The second name is a woman, presumably married who have retained her maiden name, but also carry the name of her husband. Actually, I don't know whether her last name officially is Nilsson Tell, or if Nilsson is considered to be a middle name.

    And this is just the start of it. To this comes formatting issues. you mention commas. But a comma often indicates a reverse order:

    Sommarskog, Erland

    If you have data like this that you need to cleanse, I would suggest that you are better off doing this outside SQL Server, in a language with better support for string operations. Regular expressions may help. If you have a large dataset, you may also want to have tables with names, so you can check whether a name part is a first or last name. But even then it can be difficult. Consider:

    Fittipaldi, Emerson
    Keith Emerson

    As for what I going to do with the names I was looking at, I don't know yet. But these are only names of users in a database which only has a single name column, and which I'm migrating to a database that has separate columns. Since they are only a couple of hundreds, we could clean they up manually.

    1 person found this answer helpful.

  2. Tom Phillips 17,716 Reputation points

    The short answer is, you can't.

    There is no logic you can create which will 100% split a string into Firstname, LastName and Middle name. Depending on your data set you may be able to get >80%. But without knowing your data, it is impossible to guess.

    The problem you have, in addition to the other posts is things like:

    James Earl Jones II
    Jones Sr, Earl Jones
    Kiefer William Frederick Dempsey George Rufus Sutherland
    María de la Paz Elizabeth Sofía Adriana de la Huerta

    1 person found this answer helpful.
    0 comments No comments

  3. Erland Sommarskog 102.3K Reputation points

    i have close to 500K Names. so i have to come with something, so i dissected the name field into logic structures.

    Currently on the name field they store corporations, individuals and other legal structures (partnerships, trusts etc.)

    so yes in this particular case FirstName is always the first, last name second and the third is the middle name.

    So you have looked through all 500 000 names?

    I don't understand how you can make the assumption that names are always in a certain order, least of all when you also have name of juridical persons.

    I don't know what the ultimate plan is, but it sounds like you have to write something which makes a first sieve. Some manual inspection on that to improve and add more rules etc. And as I said, T-SQL may not be the best choice. Not the least, as I understand from your other posts, you are relatively new to T-SQL.

    1 person found this answer helpful.