Extracting street type from street name

Debilon 431 Reputation points
2022-04-30T19:50:13.633+00:00

I wrote some code to extract the street type from the street name
The code should finds the last word and if it's 'DR','ST','AVE','AV','RD','LN','TR' extracts it as StPOS
however in some cases where the string itself has any of the abbreviations it extracts it as well

  DECLARE @StreetName Table (ID INT IDENTITY PRIMARY KEY, StreetName VARCHAR(50));  
                  
      
          
  INSERT INTO @StreetName (StreetName) VALUES  
  ('S Chestnut St '),  
  ('W 1st St'),  
  ('W 21st St'),  
  ('31st St'),  
  ('Happyjack Rd'),  
  ('Murray Rd'),  
  ('N Mccue St'),  
  ('Linda Vista Dr'),  
  ('Reynolds St'),  
  ('17 Mile Rd'),  
  ('Center St'),  
  ('N Mccue St');  
  
WITH CTE as(     
 Select StreetName    
      ,LastWord = right(rtrim(StreetName),charindex(' ',reverse(rtrim(StreetName))+' ')-1)  
  
 From  @StreetName  
    
 )  
Select StreetName, lastword, Substring(StreetName, 1,Charindex(LastWord, StreetName)-1) as NewAddress,  
Substring(StreetName, Charindex(LastWord, StreetName), LEN(StreetName)-1) as StPos    
from CTE  
where LEN(lastword) =2  and LastWord in ('DR','ST','AVE','AV','RD','LN','TR')  

Current Results
197942-image.png

Desired Output
197971-image.png

I'm Using SQL 2019.

Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,665 questions
0 comments No comments
{count} votes

Accepted answer
  1. Yitzhak Khabinsky 26,201 Reputation points
    2022-05-01T04:38:32.957+00:00

    Hi @Debilon ,

    Please try the following solution.

    Notable points:

    • 1st CROSS APPLY is tokenizing address via XML.
    • 2nd CROSS APPLY is retrieving last token via ...r[last()]...
    • 3rd CROSS APPLY is checking if the last token is on the predefined list of street types.
    • TRIM() function is used to trim a possible trailing dot.

    SQL

    -- DDL and sample data population, start  
    DECLARE @StreetName Table (ID INT IDENTITY PRIMARY KEY, StreetName VARCHAR(50));  
    INSERT INTO @StreetName (StreetName) VALUES  
    ('S Chestnut St'),  
    ('W 1st St'),  
    ('W 21st St'),  
    ('31st St'),  
    ('Happyjack Rd'),  
    ('Murray Rd'),  
    ('N Mccue St'),  
    ('Linda Vista Dr'),  
    ('Reynolds St'),  
    ('17 Mile Rd'),  
    ('Center St'),  
    ('N Mccue St'),  
    ('Denny Way');  
    -- DDL and sample data population, end  
      
    DECLARE @separator CHAR(1) = SPACE(1);  
      
    SELECT t.*  
     , NewAddress = c.query('data(/root/r[position() le (last()-sql:column("y"))])').value('.', 'VARCHAR(50)')  
     , StPOS = IIF(y=1, x, NULL)  
    FROM @StreetName AS t  
    CROSS APPLY (SELECT TRY_CAST('<root><r>' +   
          REPLACE(StreetName, @separator, '</r><r>') +   
          '</r></root>' AS XML)) AS t1(c)  
    CROSS APPLY (SELECT c.value('(/root/r[last()]/text())[1]', 'VARCHAR(5)')) AS t2(x)  
    CROSS APPLY (SELECT IIF(TRIM('.' FROM x) IN ('DR', 'ST', 'AVE', 'AV', 'RD', 'LN', 'TR'),1,0)) AS t3(y);  
    

    Output

    +----+----------------+-------------+-------+  
    | ID |   StreetName   | NewAddress  | StPOS |  
    +----+----------------+-------------+-------+  
    |  1 | S Chestnut St  | S Chestnut  | St    |  
    |  2 | W 1st St       | W 1st       | St    |  
    |  3 | W 21st St      | W 21st      | St    |  
    |  4 | 31st St        | 31st        | St    |  
    |  5 | Happyjack Rd   | Happyjack   | Rd    |  
    |  6 | Murray Rd      | Murray      | Rd    |  
    |  7 | N Mccue St     | N Mccue     | St    |  
    |  8 | Linda Vista Dr | Linda Vista | Dr    |  
    |  9 | Reynolds St    | Reynolds    | St    |  
    | 10 | 17 Mile Rd     | 17 Mile     | Rd    |  
    | 11 | Center St      | Center      | St    |  
    | 12 | N Mccue St     | N Mccue     | St    |  
    | 13 | Denny Way      | Denny Way   | NULL  |  
    +----+----------------+-------------+-------+  
    

    CData section

    Here is a defense mechanism against XML special characters.

    198188-cdata-section.jpg

    1 person found this answer helpful.

3 additional answers

Sort by: Most helpful
  1. Erland Sommarskog 113.6K Reputation points MVP
    2022-04-30T20:46:30.3+00:00

    I think this problem is not too dissimilar from your last one, but this solution may not be an exact copy. But I believe that there are some common principles.

    I added one more entry to your sample data, to test that case.

    DECLARE @StreetName Table (ID INT IDENTITY PRIMARY KEY, StreetName VARCHAR(50));
    
    DECLARE @StreetSuffixes TABLE (Suffix varchar(10) NOT NULL PRIMARY KEY)
    INSERT @StreetSuffixes(Suffix)
      VALUES('DR'), ('ST'), ('AVE'), ('AV'), ('RD'), ('LN'), ('TR') 
    
    INSERT INTO @StreetName (StreetName) VALUES
      ('S Chestnut St '),
      ('W 1st St'),
      ('W 21st St'),
      ('31st St'),
      ('Happyjack Rd'),
      ('Murray Rd'),
      ('N Mccue St'),
      ('Linda Vista Dr'),
      ('Reynolds St'),
      ('17 Mile Rd'),
      ('Center St'),
      ('N Mccue St'),
      ('Denny Way');
    
    SELECT N.StreetName,
           substring(trim(N.StreetName), 1, len(N.StreetName) - isnull(len(S.Suffix) + 1, 0)) AS NewAddress,
           CASE WHEN S.Suffix IS NOT NULL THEN right(trim(N.StreetName), len(S.Suffix)) END AS StPos
    FROM   @StreetName N
    LEFT   JOIN @StreetSuffixes S ON N.StreetName LIKE '% ' + S.Suffix
    

  2. Bert Zhou-msft 3,431 Reputation points
    2022-05-02T07:36:00.49+00:00

    Hi,@Debilon

    Welcome to Microsoft T-SQL Q&A Forum!

    Please check this:

    DECLARE @StreetName Table (ID INT IDENTITY PRIMARY KEY, StreetName VARCHAR(50));  
    INSERT INTO @StreetName (StreetName) VALUES  
       ('S Chestnut St '),  
       ('W 1st St'),  
       ('W 21st St'),  
       ('31st St'),  
       ('Happyjack Rd'),  
       ('Murray Rd'),  
       ('N Mccue St'),  
       ('Linda Vista Dr'),  
       ('Reynolds St'),  
       ('17 Mile Rd'),  
       ('Center St'),  
       ('N Mccue St');  
          
     WITH CTE as(     
      Select StreetName    
           ,LastWord = right(rtrim(StreetName),charindex(' ',reverse(rtrim(StreetName))+' ')-1)  
        --,len(StreetName) as lenth  
      From  @StreetName  
            
      )  
      --select * from cte  
     Select StreetName, lastword,  substring(rtrim(StreetName), 1, len(StreetName) - isnull(len(LastWord) + 1, 0)) AS NewAddress,  
    StPos  = Substring(StreetName,  LEN(StreetName)-len(lastword)+1,len(lastword))    
     from CTE  
     where LEN(lastword) =2  and LastWord in ('DR','ST','AVE','AV','RD','LN','TR')  
    

    198155-image.png

    Best regards,
    Bert Zhou


    If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".
    Note: Please follow the steps in our Documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    0 comments No comments

  3. Tom Phillips 17,741 Reputation points
    2022-05-02T12:48:17.433+00:00

    This is extremely difficult to parse and get the correct data 100% of the time.

    You are better off sending your data through an address cleaning service and getting back the valid data.

    I have used Smarty Streets for this before. I would suggest you look at:
    https://www.smarty.com/pricing


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.