Using LIKE in joins efficiently

Eshwar 216 Reputation points
2021-04-26T16:17:37.253+00:00

HI,
I am trying to use LIKE join on big tables, can you please provide some inputs on efficient way of doing it?

Query:

SELECT 
    C.cnsmr_idntfr_lgcy_txt,
    C.cnsmr_brth_dt,
    CASE WHEN STG1.cnsmr_id IS NULL THEN C.cnsmr_nm_lst_txt ELSE STG1.cnsmr_nm_lst_txt END,
    ssn_txt,
    cnsmr_nm_mddl_txt,,
    STG.accountnumber
FROM @cnsmr C
INNER JOIN @TEST_0103_STG STG ON C.cnsmr_idntfr_lgcy_txt LIKE '%' + STG.accountnumber + '%'
LEFT JOIN @TEST_CNSMR_UPDATES STG1 ON C.cnsmr_id = STG1.cnsmr_id
WHERE C.cnsmr_idntfr_lgcy_txt LIKE '%C%'
    AND STG.is_valid <> 'N'

Appreciate your inputs.

Regards,
Eshwar.

Developer technologies Transact-SQL
{count} votes

5 answers

Sort by: Most helpful
  1. Guoxiong 8,206 Reputation points
    2021-04-26T17:26:41.703+00:00

    How about this:

    INNER JOIN @TEST_0103_STG STG ON CHARINDEX(STG.accountnumber, C.cnsmr_idntfr_lgcy_txt) > 0
    
    1 person found this answer helpful.
    0 comments No comments

  2. EchoLiu-MSFT 14,621 Reputation points
    2021-04-27T02:25:19.78+00:00

    Hi @Eshwar ,

    When using like for query, like'abc%' uses index, but like'%abc' does not use index, such a query will always perform a full table scan (or clustered index scan). When querying like'%abc%', the query time is proportional to the total length of the field value, so try to avoid using the CHAR type and NVARCHAR type, and choose VARCHAR instead. For data with long field values, it is best to create a full-text index.

    If you can more clearly determine the search mode, for example,'%C%', if you can use C%' or'XC%' instead(X represents any one or more certain characters), you can improve the performance of the query. Normally, if you use% at the beginning, SQL server will perform a table scan.

    Guoxiong mentioned using charindex instead of the like clause. But the efficiency of the function is relatively low, because the function has to be called as many times as there are data.

    Regards
    Echo


    If the answer is helpful, please click ''Accept Answer'' and upvote it.
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    1 person found this answer helpful.

  3. Tom Phillips 17,771 Reputation points
    2021-04-26T17:27:10.143+00:00

    You can't, if you are going to use a leading %.

    See "Wildcard String Searches":

    https://www.sqlshack.com/query-optimization-techniques-in-sql-server-tips-and-tricks/

    0 comments No comments

  4. Erland Sommarskog 121.4K Reputation points MVP Volunteer Moderator
    2021-04-26T21:43:12.853+00:00

    There is not much hope if you have leading wildcards. It is not going to be very efficient.

    However, if you have an nvarchar column or a Windows collation, you may win some speed by casting to a binary collation:

    INNER JOIN @TEST_0103_STG STG ON C.cnsmr_idntfr_lgcy_txt COLLATE Latin1_General_BIN2 LIKE '%' + STG.accountnumber + '%'
    

    If you want the comparison to be case-insensitive, wrap the columns in UPPER().

    If you have varchar and an SQL collation, you are not going to win a whole lot with this, as varchar in SQL collation is relatively fast already.

    And to make it clear: what you are saving speed on here is the character comparison, which are a quite expensive with full Unicode applied and there are many of them. Which there will be in this case.

    0 comments No comments

  5. SQLZealots 276 Reputation points
    2021-04-27T04:14:41.18+00:00

    As other said, indexes are nto helping you as you have a wild condition %xxx% check.
    I would like to suggest to split your queries into multiple, that would help you sometimes.

    Could not test it, but the below can be a starting point and see the performance impact.

    SELECT 
         C.cnsmr_idntfr_lgcy_txt,
         C.cnsmr_brth_dt,cnsmr_idntfr_lgcy_txt, cnsmr_id into #Temp
         --CASE WHEN STG1.cnsmr_id IS NULL THEN C.cnsmr_nm_lst_txt ELSE STG1.cnsmr_nm_lst_txt END,
         --ssn_txt, --Not sure which table these from
         --cnsmr_nm_mddl_txt,--Not sure which table these from
         --STG.accountnumber
     FROM @cnsmr C
     WHERE C.cnsmr_idntfr_lgcy_txt LIKE '%C%'
    
    
     Select C.*,STG.accountnumber Into #Temp1 From #Temp C
     INNER JOIN @TEST_0103_STG STG ON C.cnsmr_idntfr_lgcy_txt LIKE '%' + STG.accountnumber + '%'
         where STG.is_valid <> 'N'
    
    Select C.*,CASE WHEN STG1.cnsmr_id IS NULL THEN C.cnsmr_nm_lst_txt ELSE STG1.cnsmr_nm_lst_txt END,
    From #Temp1 C
    LEFT JOIN @TEST_CNSMR_UPDATES STG1 ON C.cnsmr_id = STG1.cnsmr_id
    
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.