Parse address field to extract City name using t-sql

SQL9 246 Reputation points
2021-05-30T22:24:33.51+00:00

Hi All,

I have a table(@Address table) with address field. I would like to extract the city name , state and zip code from it.

I have used STRING_AGG() and STRING_SPLIT() to get the City name, state, zip. In these 3 fields City name is very important to show.

Below is the sample code I am using to get the city, state, & zip but it works fine for city names with space in between them but doesn't work for single word states(top 4 works for states with space but not for states with single word, it pulls other street info).
Also I tried with a table to filter the data based on city names from the table but don't know how to use it correctly with string_split()

Any kind of help is greatly appreciate. I am using 2017 version.

declare @Output as varchar(100) = '100 NORTH MAIN MASON CITY IL 626641104'
SELECT @output = STRING_AGG(cityname, SPACE(1)) FROM (SELECT top 4 ROW_NUMBER() Over (order by (select null)) rn, value FROM STRING_SPLIT(@Output, SPACE(1))
order by rn desc) AS cn (rn,cityname);

SELECT @output = STRING_AGG(cityname, SPACE(1)) FROM (SELECT top 4 ROW_NUMBER() Over (order by (select null)) rn, value FROM STRING_SPLIT(@output, SPACE(1))
order by rn asc) AS Cn (rn,cityname);

SELECT @output = STRING_AGG(cityname, SPACE(1)) FROM (SELECT top 4 ROW_NUMBER() Over (order by (select null)) rn, value FROM STRING_SPLIT(@output, SPACE(1))
/* WHERE value IN (SELECT cityname FROM @City ) */ order by rn desc ) AS Cn (rn,cityname);
select @output

Below is the sample city table data I want to use it to filter the STRING_Split() data

declare @City table (CityName varchar(50))
insert into @City (CityName)
Select 'PLAIN CITY' union all
Select 'MASON CITY' union all
select 'BLOOMINGDALE' union all
select 'CHICAGO'

--select * from @City

declare @Address table (FullAddress varchar(150))
insert into @Address (FullAddress)
Select '5500 North AVENUE, APT. 116 plain city OH 60053' union all
Select '12536 S INDIANA AVE CHICAGO IL 60625' union all
Select 'W MAIN ST PO BOX 111 Addison VA 24293' union all
select '179 ARMY TRAIL RD BLOOMINGDALE IL 60108' union all
select '100 NORTH MAIN MASON CITY IL 626641104'

--select * from @Address

Thanks in advance
RH

Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,705 questions
{count} votes

Accepted answer
  1. MelissaMa-MSFT 24,211 Reputation points
    2021-06-02T04:45:16.987+00:00

    Hi @SQL9 ,

    Thanks for your update.

    If you could list all cityname with "City", "Ville" or anything, you could refer below:

    ;with cte as (  
    select  ROW_NUMBER() Over (order by (select null)) addressid,FullAddress from @Address)  
    ,cte1 as (  
    SELECT ROW_NUMBER() Over (partition by addressid order by (select null) ) id, value,addressid FROM cte  
    cross apply STRING_SPLIT(FullAddress, SPACE(1)))  
    ,cte2 as (  
    select ROW_NUMBER() Over (partition by addressid order by id desc) rn ,value ,addressid  
    from cte1)  
    ,cte3 as (  
    select a.*,b.value1   
    from cte2 a  
    left join (select addressid,value value1 from cte2 where rn=3) b on a.addressid=b.addressid)  
    select STRING_AGG(value,SPACE(1)) WITHIN GROUP (ORDER BY rn DESC) Result  
    from cte3  
    where value1 in ('city','Ville') and rn<=4  
    group by addressid  
    union   
    select STRING_AGG(value,SPACE(1)) WITHIN GROUP (ORDER BY rn DESC) Result  
    from cte3  
    where value1 not in ('city','Ville') and rn<4  
    group by addressid  
    

    Output:

    Result  
    ADDISON VA 24293  
    BLOOMINGDALE IL 60108  
    CHICAGO IL 60625  
    EDISON NJ 34561  
    MASON CITY IL 626641104  
    PLAIN CITY OH 60053  
    

    Or you could refer another method using user-defined function.

    Firstly create one function as below:

    CREATE FUNCTION dbo.GetSplitString  
    (  
       @List       VARCHAR(MAX),  
       @Delimiter  VARCHAR(255),  
       @ElementNumber int  
    )  
    RETURNS VARCHAR(4000)  
    AS  
    BEGIN  
       DECLARE @result varchar(4000)      
       DECLARE @Items TABLE ( position int IDENTITY PRIMARY KEY,  
                              Item VARCHAR(4000)  
                             )    
      
       DECLARE @ll INT = LEN(@List) + 1, @ld INT = LEN(@Delimiter);    
      
       WITH a AS  
       (  
           SELECT  
               [start] = 1,  
               [end]   = COALESCE(NULLIF(CHARINDEX(@Delimiter,   
                           @List, @ld), 0), @ll),  
               [value] = SUBSTRING(@List, 1,   
                         COALESCE(NULLIF(CHARINDEX(@Delimiter,   
                           @List, @ld), 0), @ll) - 1)  
           UNION ALL  
           SELECT  
               [start] = CONVERT(INT, [end]) + @ld,  
               [end]   = COALESCE(NULLIF(CHARINDEX(@Delimiter,   
                           @List, [end] + @ld), 0), @ll),  
               [value] = SUBSTRING(@List, [end] + @ld,   
                         COALESCE(NULLIF(CHARINDEX(@Delimiter,   
                           @List, [end] + @ld), 0), @ll)-[end]-@ld)  
           FROM a  
           WHERE [end] < @ll  
       )  
       INSERT @Items SELECT [value]  
       FROM a  
       WHERE LEN([value]) > 0  
       OPTION (MAXRECURSION 0);  
      
       SELECT @result=Item  
       FROM @Items  
       WHERE position=@ElementNumber  
         
       RETURN @result;  
    END  
    GO  
    

    Then call this function as below:

    SELECT case when REVERSE(dbo.GetSplitString(REVERSE(replace(FullAddress,' ',',')),',',3)) in ('city','Ville')  
    then right(FullAddress,charindex(' ', REVERSE(FullAddress),charindex(' ', REVERSE(FullAddress),charindex(' ', REVERSE(FullAddress), (charindex(' ', REVERSE(FullAddress), 1))+1)+1)+1))  
    else right(FullAddress,charindex(' ', REVERSE(FullAddress),charindex(' ', REVERSE(FullAddress), (charindex(' ', REVERSE(FullAddress), 1))+1)+1))  
    end result  
    from @Address  
    

    Best regards,
    Melissa


    If the answer is helpful, please click "Accept Answer" and upvote it.
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.


4 additional answers

Sort by: Most helpful
  1. Ronen Ariely 15,201 Reputation points
    2021-05-31T01:35:36.503+00:00

    Good day,

    Your system appears to have a fundamental flaw, and now you try to deal with the result of that design instead of dealing with the real issue - your system design.

    Using free text which include multiple values does not fit the relational model !
    This design leads you to try to parse a string of text which can have any format and therefore any solution that you will find and will cover all your values today will not cover a different format which someone will insert the column.

    You cannot count on any solution which based on parsing by format of the text, since your text can have endless options to format the data.

    The only solution for parsing such free format text is using Artificial intelligence (AI) and even then it might not give you 100% correct solution for any input.

    So, in your case you are using a single column to store address as free text which is the basic mistake.

    Option 1 : best solution in most cases when using Tabular database like SQL Server

    A much better design is to store the data in separate columns (as more detailed columns better) from the start. For example using the columns: building number, apartment number, street, town/city, state, zip code...

    You should parse the information before you insert it to the database

    If you are developing your own application then you can get free input of address from the user and then use Google map API or Bing map API in order to check the address and parse it.

    Option 2:

    Store the address in a single column using well formatted JSON value, after you parsed the data that the user wanted to insert using Google map API or Bing map

    Option 3:

    If must store the data in free text, then you should parse the data on-the-fly in your application.

    Select the text as it is, and in your application side parse the text of the address using Google map API or Bing map API.

    Option 4

    If must store the data in free text and ONLY OF YOU MUST parse the data on-the-fly using in your query (and there should not be aby reason for this except bad design in most cases), then you should parse the data on-the-fly using using HttpWebRequest from your server to Google map API or Bing map API. This can be done using many ways in SQL Server on-premises including SQLCLR function, python or R solutions. Create function that send HttpWebRequest and returns the data as well formatted JSON.

    Option 5 - bad solution but can work with the risk of inconsistent information and mistakes/errors

    Make sure that the input (and the stored value) is well formatted in specific format using separator. Make sure using regular expression in your application side that the information fit the exact format(s) before you store it in the database.

    Use simple T-SQL query to parse the data as you tried to do, but remember that the use might insert the data in different t format and you cannot cover the endless option of input which the user might use for the address!

    THIS IS BAD IDEA

    To provide a query that solve this case - text which specific format(s), please provide a list of exact formats of addressess which you might have and please provide the exact expected output for these formats

    0 comments No comments

  2. MelissaMa-MSFT 24,211 Reputation points
    2021-05-31T02:55:07.26+00:00

    Hi @SQL9 ,

    Welcome to Microsoft Q&A!

    According to your situation, you could consider to add one condition of value<>'city'. Then you could only choose the top 3 values.

    Please refer below:

    declare @Output as varchar(100) = '100 NORTH MAIN MASON CITY IL 626641104'  
    SELECT @output = STRING_AGG(cityname, SPACE(1)) FROM (SELECT top 3 ROW_NUMBER() Over (order by (select null)) rn, value FROM STRING_SPLIT(@Output, SPACE(1))  
    where value<>'city'  
    order by rn desc) AS cn (rn,cityname);  
    select @output  
    --626641104 IL MASON  
      
    declare @Output1 as varchar(100) = '5500 North AVENUE, APT. 116 plain city OH 60053'  
    SELECT @output1 = STRING_AGG(cityname, SPACE(1)) FROM (SELECT top 3 ROW_NUMBER() Over (order by (select null)) rn, value FROM STRING_SPLIT(@Output1, SPACE(1))  
    where value<>'city'  
    order by rn desc) AS cn (rn,cityname);  
    select @output1  
    --60053 OH plain  
    

    You could also refer below query:

    ;with cte as (  
    select  ROW_NUMBER() Over (order by (select null)) addressid,FullAddress from @Address)  
    ,cte1 as (  
    SELECT ROW_NUMBER() Over (partition by addressid order by (select null) ) id, value,addressid FROM cte  
    cross apply STRING_SPLIT(FullAddress, SPACE(1))   
    where value<>'city')  
    ,cte2 as (  
    select ROW_NUMBER() Over (partition by addressid order by id desc) rn ,value ,addressid  
    from cte1)  
    select  STRING_AGG(value,SPACE(1)) AS Result  
    from cte2  
    where rn<4   
    group by addressid  
    

    Output:

    Result  
    60053 OH plain  
    60625 IL CHICAGO  
    24293 VA Addison  
    60108 IL BLOOMINGDALE  
    626641104 IL MASON  
    

    Best regards,
    Melissa


    If the answer is helpful, please click "Accept Answer" and upvote it.
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    0 comments No comments

  3. Tom Phillips 17,766 Reputation points
    2021-06-01T13:47:32.117+00:00

    You will be extremely unhappy attempting to parse a full string address into its individual parts. You will be much better off using the USPS or 3rd party to parse them correctly and give you the exact parts.

    I have used SmartyStreets for this purpose. They have an API to send the string and get the parts. https://www.smartystreets.com/

    0 comments No comments

  4. Guoxiong 8,206 Reputation points
    2021-06-01T17:02:17.093+00:00

    Suppose the city name in the Address table can be found in the City table. You can try this:

    ;WITH CTE AS (
        SELECT a.FullAddress, CHARINDEX(c.CityName, a.FullAddress) AS CityStartPosition 
        FROM @Address AS a, @City AS c
        WHERE CHARINDEX(c.CityName, a.FullAddress) > 0
    )
    
    SELECT FullAddress, SUBSTRING(FullAddress, CityStartPosition, LEN(FullAddress) - CityStartPosition + 1) AS Result
    FROM CTE;
    

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.