Text.Remove function performs differently with PowerQuery Source for SSIS

John Nickell 36 Reputation points
2024-03-20T19:25:42.36+00:00

I have a XML source that occasionally has characters like this in a field. "™"
This will show up in Power BI PQ editor a "TM" symbol.

I'm able to removed these (and other special characters) in Power BI PowerQuery using the following step.

Table.AddColumn(#"Added Custom", "_CleanSampleName", each Text.Remove([Name],{"™","®",",","â","„","¢","Â","®"}))

When I use this same step in SSIS the output of the Power Query source still includes the problematic characters.

Has anyone else encountered this and found a work-around?

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
12,583 questions
0 comments No comments
{count} votes

Accepted answer
  1. Azar 18,045 Reputation points
    2024-03-21T18:49:32.62+00:00

    Hey there John Nickell

    thanks for getting back on this and letting us know

    So let me post your answer here and you can accept it as you cant accept your own answers,

    this helps community members who face the similar issue

    I "think" that the answer turned out to be deleting and re-adding the PowerQuery source component in the package. After doing that the output of the PowerQuery step is properly removing the special characters.

    I say think because I'm second guessing myself now, but at this point the Text.Remove step is performing the same in Power BI and SSIS (Visual Studio 2022).

    Thanks for your contribution.

    1 person found this answer helpful.

3 additional answers

Sort by: Most helpful
  1. Azar 18,045 Reputation points
    2024-03-20T19:42:10.1833333+00:00

    Hey there John Nickell

    Thanks for using QandA platform, lemme try to help you here.

    One possible reason for this could be the encoding used by the XML source in SSIS compared to Power BI. It's possible that the special characters are being interpreted differently in each environment, making this issue.

    • Check the character encoding of the XML source in SSIS and make sure it matches the encoding used in Power BI.
    • Verify that the Text.Remove function syntax is correct and compatible with both environments.
    • If this helps kindly accept the answer thanks much.

  2. Yitzhak Khabinsky 24,836 Reputation points
    2024-03-20T20:31:40.2833333+00:00

    @John Nickell ,

    You can use a simple XSLT in SSIS to remove not needed characters in an XML file.

    SSIS has XML Task, operation XSLT transformation for that.

    Please see below how to do it in XSLT via translate() function. You can add as many as needed characters to remove for your needs.

    <?xml version="1.0"?>
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
       <xsl:output method="xml" encoding="utf-8" indent="yes"/>
       <xsl:strip-space elements="*"/>
       <xsl:template match="@*|node()">
          <xsl:copy>
             <xsl:apply-templates select="@*|node()"/>
          </xsl:copy>
       </xsl:template>
       <xsl:template match="text()">
             <xsl:value-of select="translate(., '™®,™®', '')"/>
       </xsl:template>
    </xsl:stylesheet>
    

  3. John Nickell 36 Reputation points
    2024-03-21T18:40:40.6+00:00

    I "think" that the answer turned out to be deleting and re-adding the PowerQuery source component in the package. After doing that the output of the PowerQuery step is properly removing the special characters.

    I say think because I'm second guessing myself now, but at this point the Text.Remove step is performing the same in Power BI and SSIS (Visual Studio 2022).

    0 comments No comments