Modifying Fields to Accomplish Successful Joins

Carlton Patterson 741 Reputation points
2022-10-08T12:19:20.333+00:00

I have tables and I would like to join on two fields. Unfortunately, in one table a field for website address appears as http://www.unilever.com, whereas the website address in the other table appears as http://www.unilever.com. Therefore, naturally if I attempt to do a Inner Join on those fields there won't be a match.

The problem is the website address for both tables can be a mix of including https, http, or simply without http all together. However, the only common match will be the actual website address e.g. http://www.unilever.com. Therefore, I was wondering what would be the best approach when doing a join in these situations? Is there someway to just to a join on the website address http://www.unilever.com?

Sample data below
CREATE TABLE accweburl (
websiteurl varchar(100))

INSERT accweburl VALUES  
('https://www.unilever.co.uk/'),  
('www.unilever.com')  
  
SELECT * FROM accweburl  
  
CREATE TABLE dueurl (  
    primaryWebsite varchar(50))  
  
INSERT dueurl VALUES  
('http://www.charles-stanley.co.uk'),  
('http://www.unilever.com'),  
('http://www.londonsecurity.org')  
  
SELECT * FROM dueurl  
  
  
Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,559 questions
0 comments No comments
{count} votes

Accepted answer
  1. Tom Cooper 8,466 Reputation points
    2022-10-08T15:52:32.7+00:00

    One way

    Create table #Foo(Id int identity, WebAddress varchar(50));  
    Insert #Foo(WebAddress) Values  
    ('www.unilever.com'),  
    ('https://www.unilever.com'),  
    ('http://www.unilever.com');  
      
    Select Id, Webaddress,  
      SubString(WebAddress,   
        Case When CharIndex('://', WebAddress) = 0 Then 1 Else CharIndex('://', WebAddress)+ 3 End, Len(WebAddress)) As AdjustedWebAddress  
    From #Foo  
      
    go  
    Drop Table #Foo;  
    

    Tom


1 additional answer

Sort by: Most helpful
  1. Carlton Patterson 741 Reputation points
    2022-10-08T15:27:41.8+00:00

    Having thought about this question a little more closely, I have come to realise the problem can't be solved with modifying the join criteria.

    The problem is a a data issue.

    Therefore, the question then becomes how to replace or remove http://, so that http://www.unilever.com becomes www.unilever.com?

    0 comments No comments