Split string based on preexisting phrases

Debilon 431 Reputation points
2022-04-22T21:58:39.607+00:00

Updated the Question and added DDL, and a picture of the desired output

-- DDL and sample data population, start  
  
DECLARE @OwnerName Table (ID INT IDENTITY PRIMARY KEY, TrustName VARCHAR(100));  
Declare @tbl TABLE (ID INT IDENTITY PRIMARY KEY, Identifier VARCHAR(100));  
  
INSERT INTO @OwnerName (TrustName) VALUES  
('DNDERSON, ROM & LITA K. FAMILY TRUST'),  
('CAKER FAMILY LIVING TRUST '),  
('CAKER, MAX E. & NANCY A. FAMILY TRUST'),  
('AALDWIN SALLY L LIVING TRUST'),  
('GRINGOLF JOHN S LIVING TRUST'),  
('DROWN FAMILY TRUST'),  
('CURCH FAMILY LIVING TRUST'),  
('AURDICK FAMILY REVOCABLE TRUST'),  
('CHURCHILL 1993 TRUST');  
  
  
  
INSERT INTO @tbl (Identifier) VALUES  
-- Add rows as many as needed  
('FAMILY TRUST'),  
('FAMILY TRUST'),  
('LIVING TRUST'),  
('REVOCABLE TRUST'),  
('TRUST'),  
('TR');  
-- DDL and sample data population, end  
  
  
  
;WITH MyCTE AS (  
  Select Distinct  
  TrustName,  
  TheValue = REPLACE(TrustName,Identifier,''),  
  Identifier  
  , CI = CASE WHEN charindex(Identifier,TrustName,1)>1 THEN 1 ELSE 0 END  
  , RN = ROW_NUMBER() OVER (  
  PARTITION BY CASE WHEN charindex(Identifier,TrustName,1)>1 THEN 1 ELSE 0 END   
  ORDER BY CASE WHEN charindex(Identifier,TrustName,1)>1 THEN 1 ELSE 0 END DESC,LEN(Identifier) DESC)  
  FROM @OwnerName CROSS APPLY @tbl Identifier  
  )  
  Select * from MyCTE  

Desired Output
196344-image.png

Using Microsoft SQL Server 2019 (RTM-CU11) (KB5003249) - 15.0.4138.2 (X64) May 27 2021 17:34:14 Copyright (C) 2019 Microsoft Corporation Standard Edition (64-bit) on Windows Server 2019 Standard 10.0 <X64> (Build 17763: )

Developer technologies Transact-SQL
{count} votes

Accepted answer
  1. Erland Sommarskog 121.4K Reputation points MVP Volunteer Moderator
    2022-04-25T21:37:47.88+00:00

    Here is a query for your updated sample data. It does not include the CI and RN columns, because I don't know what they are.

    The idea is simply to match the end of a string to a single identifier. If there are multiple matches, take the longest. The match must start with something else than a letter.

    SELECT TrustName,
           CASE WHEN I.Identifier IS NULL THEN O.TrustName 
                ELSE substring(O.TrustName, 1, len(O.TrustName) - len(I.Identifier) - 1)
           END  AS TheValue, I.Identifier
    FROM   @OwnerName O
    OUTER  APPLY (SELECT TOP (1) I.Identifier
                  FROM   @tbl I 
                  WHERE  O.TrustName LIKE '%[^A-Z]' + I.Identifier
                  ORDER BY len(I.Identifier) DESC) AS I
    
    1 person found this answer helpful.
    0 comments No comments

2 additional answers

Sort by: Most helpful
  1. Erland Sommarskog 121.4K Reputation points MVP Volunteer Moderator
    2022-04-23T09:06:48.707+00:00

    Working from the rule that I can infer from your single example:

    DECLARE @TName nvarchar(200) = 'Ron, Lamor M. The Great'
    SELECT substring(@TName, 1, charindex('.', @TName) - 1) AS ObjectName,
           ltrim(substring(@TName, charindex('.', @TName) + 1, len(@TName))) AS Identifier
    

  2. Ronen Ariely 15,206 Reputation points
    2022-04-23T15:03:06.707+00:00

    Check this query - Note that it will not cover all future cases probably but do answer your current data in the thread at this time

    -- (1) This solution based on the assumption that the identifier is at the end of the string - not sure what you want to do in other cases 
    -- (2) To fit the new request "from the longest to the smallest" I check the length
    -----------------------------------------
    DECLARE @TName nvarchar(200)
    --SET @TName = 'Ron, Lamor M. The Great' -- Test 01
    SET @TName = 'fg dfs gfdg. Not The Great' -- Test 02
    ;With MyCTE AS (
     select
     --@TName, 
     TheValue = REPLACE(@TName,Identifier,''),
     Identifier
     , CI = CASE WHEN charindex(Identifier,@TName,1)>1 THEN 1 ELSE 0 END
     , RN = ROW_NUMBER() OVER (
     PARTITION BY CASE WHEN charindex(Identifier,@TName,1)>1 THEN 1 ELSE 0 END 
     ORDER BY CASE WHEN charindex(Identifier,@TName,1)>1 THEN 1 ELSE 0 END DESC,LEN(Identifier) DESC)
     FROM (VALUES ('The Great'),('Not The Great'),('Small'),('Big')) T(Identifier) -- add all your preexisting phrases or use a table with all preexisting phrases
    )
    SELECT TOP 1 TheValue,Identifier--, CI,RN 
    FROM MyCTE
    ORDER BY CI DESC,RN
    GO
    

    UPDATE 2022-04-23 19:17 Israel time

    Following Erland comment, the next solution cover a case that there is no identifier at the end of the string - in this case it will return '' in the second column and the entire string in the first column of the result SET

     DECLARE @TName nvarchar(200)
     --SET @TName = 'Ron, Lamor M. The Great' -- Test 01
     SET @TName = 'Big Swifty and Associates, Trendmongers' -- Test 02
     ;With MyCTE AS (
      select
      --@TName, 
      TheValue = REPLACE(@TName,Identifier,''),
      Identifier
      , CI = CASE WHEN charindex(Identifier,@TName + ' ',1)>1 THEN 1 ELSE 0 END
      , RN = ROW_NUMBER() OVER (
      PARTITION BY CASE WHEN charindex(Identifier,@TName,1)>1 THEN 1 ELSE 0 END 
      ORDER BY CASE WHEN charindex(Identifier,@TName,1)>1 THEN 1 ELSE 0 END DESC,LEN(Identifier) DESC)
      FROM (VALUES (' '),('The Great'),('Not The Great'),('Small'),('Big')) T(Identifier) -- add all your preexisting phrases or use a table with all preexisting phrases
     )
     SELECT TOP 1 
     TheValue,Identifier--, CI,RN 
     FROM MyCTE
     ORDER BY CI DESC,RN
     GO
    

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.