Validate the data before loading

CzarR 296 Reputation points
2022-05-14T02:07:52.453+00:00

Hi, I have an SSIS task that requires me to load a csv into a table. But while loading I need to perform validation on the data to make sure it is in the expected format. Screenshot has a sample requirement. I need to validate each column's data and either make modification to the data or skip the record entirely or record a warning. Towards the end an email must be sent with the list of skipped, Warning and modified records. Need help with ideas on how to accomplish this. Thanks in advance.

201799-image.png

SQL Server Integration Services
SQL Server Integration Services
A Microsoft platform for building enterprise-level data integration and data transformations solutions.
2,451 questions
Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,552 questions
0 comments No comments
{count} votes

3 answers

Sort by: Most helpful
  1. Yitzhak Khabinsky 24,941 Reputation points
    2022-05-15T01:25:52.157+00:00

    Hi @CzarR ,

    *.csv/txt flat files always have host of problems where column delimiters, column separators, invisible characters like null terminators \0, special characters based on encoding, and line breaks are in the middle of the actual data.

    The most reliable format for data feeds is XML enforced by XSD. An XSD plays a role of a data contract between sender and receiver. It will guarantee proper data format/shape, and enforce data quality.

    XML supports very well optional vs. mandatory data elements via cardinality in XSDs.

    • SSIS has XML Source adapter to ingest XML files. XML Source is using XSD to validate XML files.
    • SSIS has XML Task, operation Validation.

    You can use XML Task, operation Validation to validate XML before loading.
    It will produce a detailed report in the following format:

    <root xmlns:ns="https://schemas.microsoft.com/xmltools/2002/xmlvalidation">  
    	<metadata>  
    		<result>false</result>  
    		<errors>2</errors>  
    		<warnings>0</warnings>  
    		<startTime>2015-05-28T10:45:09.538</startTime>  
    		<endTime>2015-05-28T10:45:09.558</endTime>  
    		<xmlFile>C:\Temp\TestData.xml</xmlFile>  
    		<xsdFile>C:\Temp\TestSchema.xsd</xsdFile>  
    	</metadata>  
    	<messages>  
    		<error line="5" position="26">The 'ApplicantRole' element is invalid - The value 'wer3' is invalid    
        according to its datatype 'ApplicantRoleType' - The Enumeration constraint failed.</error>  
    		<error line="16" position="28">The 'Phone' element is invalid - The value 'we3056666666' is invalid    
         according to its datatype 'phone' - The Pattern constraint failed.</error>  
    	</messages>  
    </root>  
    

    Useful links:
    validate-xml-with-the-xml-task
    importing-xml-documents-using-sql-server-integration-services

    0 comments No comments

  2. Tom Phillips 17,716 Reputation points
    2022-05-16T15:31:53.953+00:00

    You will be much happier if you always import files into a "staging" table with all varchar data types for the columns. Then you can do any data validation/cleaning on the staging table then "merge" the data into your target using TSQL commands.

    0 comments No comments

  3. Camilo Cuevas 1 Reputation point
    2022-05-30T04:42:47.013+00:00

    You can use conditional split tasks to define rule by rule.
    when a row doesn't match the rule, you can define a flow that sends it to a derived column to correct it,
    with the change applied you can send it back to the main flow with a union all task.

    0 comments No comments