Remove Data with HTML or <>

San 161 Reputation points
2021-07-19T14:53:25.493+00:00

Hi Experts,

Still unable to resolve my problem. I need the result as given below in expected result. Is it possible clean the data with t-Sql and SSIS derived Column.

115900-image.png

Thanks Much. Its much needed

SQL Server Integration Services
SQL Server Integration Services
A Microsoft platform for building enterprise-level data integration and data transformations solutions.
2,578 questions
Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,639 questions
{count} votes

Accepted answer
  1. EchoLiu-MSFT 14,581 Reputation points
    2021-07-20T06:02:30.403+00:00

    Please also check:

    CREATE FUNCTION [dbo].Func(@string varchar(40)) returns varchar(max)  
    AS  
    BEGIN  
       DECLARE @result varchar(max)  
       SET @result=@string  
       DECLARE @num int  
       SET @num=1  
       
       WHILE @num>0  
       BEGIN   
       SET @result=REPLACE(@result,SUBSTRING(@result,CHARINDEX('<',@result)  
       ,CHARINDEX('>',@result)-CHARINDEX('<',@result)+1),'')  
          
       SET @num=CHARINDEX('<',@result)  
          
       END  
       RETURN @result  
    END    
      
    SELECT *,dbo.Func(Qst) Qst,dbo.Func(Ans) Ans  
    FROM yourtable  
      
    DROP function dbo.Func  
    

    Output:
    116119-image.png

    If you have any question, please feel free to let me know.

    Regards
    Echo


    If the answer is helpful, please click "Accept Answer" and upvote it.


3 additional answers

Sort by: Most helpful
  1. Yitzhak Khabinsky 25,861 Reputation points
    2021-07-19T16:08:30.177+00:00

    Hi @San ,

    You never provided a minimal reproducible example.
    So the DDL and sample data population are a wild guess.
    Commas treatment is not clear.

    Also, it is not clear why the Viorel's answer is not working for you.

    Please try the following approach.
    It is based on XML and XQuery.

    SQL

    -- DDL and sample data population, start  
    DECLARE @tbl TABLE (ID INT IDENTITY PRIMARY KEY, tokens VARCHAR(1024));  
    INSERT INTO @tbl (tokens) VALUES  
    ('PQWS, <Get>,was it,<Mas>'),  
    ('Which one, </u>, is like ,<new one> ,you got'),  
    ('<Yes>, It was');  
    -- DDL and sample data population, end  
      
    DECLARE @separator CHAR(1) = '>'  
     , @lessthan CHAR(2) = '<';  
      
    SELECT ID, tokens  
     , REPLACE(c.query('  
     for $x in /root/r[substring((./text())[1],1,1) ne "="]/text()  
     return $x  
     ').value('text()[1]', 'VARCHAR(1024)'), ' , ', '') AS Result  
    FROM @tbl  
     CROSS APPLY (SELECT CAST('<root><r>' +   
     REPLACE(REPLACE(REPLACE(REPLACE(tokens,'>,','>'),',<','<'), '<', '>='), @separator, '</r><r>') +   
     '</r></root>' AS XML)) AS t1(c);  
    

    Output

    +----+----------------------------------------------+-------------------------------+  
    | ID |                    tokens                    |            Result             |  
    +----+----------------------------------------------+-------------------------------+  
    |  1 | PQWS, <Get>,was it,<Mas>                     | PQWS, was it                  |  
    |  2 | Which one, </u>, is like ,<new one> ,you got | Which one,  is like  ,you got |  
    |  3 | <Yes>, It was                                |  It was                       |  
    +----+----------------------------------------------+-------------------------------+  
    

  2. Erland Sommarskog 111.5K Reputation points MVP
    2021-07-19T16:52:53.72+00:00

    Here is another solution which requires SQL 2017 or later and that you have installed the Python support as part of the Machine Learning Extensions. It is based on the power of regular expressions.

    Note that the CREATE TABLE + INSERT is not part of the solution. They are only a setup for a demo. Ideally, you should have provided your sample data in this form, rather than using a screenshot.

    CREATE TABLE #tbl (ID INT IDENTITY PRIMARY KEY, tokens VARCHAR(1024));
     INSERT INTO #tbl (tokens) VALUES
     ('PQWS, <Get>,was it,<Mas>'),
     ('Which one, </u>, is like ,<new one> ,you got'),
     ('<Yes>, It was');
    
    
    EXEC sp_execute_external_script @language = N'Python', 
         @input_data_1 = N'SELECT ID, tokens FROM #tbl', 
         @input_data_1_name = N'Data',
         @output_data_1_name = N'Data',
         @script = N'
    import re, pandas
    Data["tokens"] = pandas.Series(
        [re.sub(r"<[^>]*>", "", str) for str in Data["tokens"]])
    ' 
    WITH RESULT SETS ((ID     int,
                       tokens varchar(1024)))
    
    0 comments No comments

  3. San 161 Reputation points
    2021-07-20T06:34:40.203+00:00

    @Yitzhak Khabinsky , I have include the table and col in the statement, but getting an error and in cross apply statement I want include qnt and ans both cols

    Msg 102, Level 15, State 1, Line 5
    Incorrect syntax near '('.

    Completion time: 2021-07-20T07:31:21.9439274+01:00

    DECLARE mytable (qnt VARCHAR(1024), ans VARCHAR(1024));
    select qnt , ans from [BI_DW_IAPTus].[STAGING].[Result_Answers] ;

    DECLARE @separator CHAR(1) = '>'
    , @lessthan CHAR(2) = '<';

    SELECT qnt, ans
    , REPLACE(c.query('
    for $x in /root/r[substring((./text())[1],1,1) ne "="]/text()
    return $x
    ').value('text()[1]', 'VARCHAR(1024)'), ' , ', '') AS Result
    FROM mytable
    CROSS APPLY (SELECT CAST('<root><r>' +
    REPLACE(REPLACE(REPLACE(REPLACE(qnt,'>,','>'),',<','<'), '<', '>='), @separator, '</r><r>') +
    '</r></root>' AS XML)) AS t1(c);


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.