question

CzarR-3851 avatar image
0 Votes"
CzarR-3851 asked CamiloCuevas-0569 answered

Validate the data before loading

Hi, I have an SSIS task that requires me to load a csv into a table. But while loading I need to perform validation on the data to make sure it is in the expected format. Screenshot has a sample requirement. I need to validate each column's data and either make modification to the data or skip the record entirely or record a warning. Towards the end an email must be sent with the list of skipped, Warning and modified records. Need help with ideas on how to accomplish this. Thanks in advance.



201799-image.png


sql-server-transact-sqlsql-server-integration-services
image.png (185.6 KiB)
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

YitzhakKhabinsky-0887 avatar image
0 Votes"
YitzhakKhabinsky-0887 answered YitzhakKhabinsky-0887 edited

Hi @CzarR-3851,

*.csv/txt flat files always have host of problems where column delimiters, column separators, invisible characters like null terminators \0, special characters based on encoding, and line breaks are in the middle of the actual data.

The most reliable format for data feeds is XML enforced by XSD. An XSD plays a role of a data contract between sender and receiver. It will guarantee proper data format/shape, and enforce data quality.

XML supports very well optional vs. mandatory data elements via cardinality in XSDs.

  • SSIS has XML Source adapter to ingest XML files. XML Source is using XSD to validate XML files.

  • SSIS has XML Task, operation Validation.

You can use XML Task, operation Validation to validate XML before loading.
It will produce a detailed report in the following format:

 <root xmlns:ns="https://schemas.microsoft.com/xmltools/2002/xmlvalidation">
     <metadata>
         <result>false</result>
         <errors>2</errors>
         <warnings>0</warnings>
         <startTime>2015-05-28T10:45:09.538</startTime>
         <endTime>2015-05-28T10:45:09.558</endTime>
         <xmlFile>C:\Temp\TestData.xml</xmlFile>
         <xsdFile>C:\Temp\TestSchema.xsd</xsdFile>
     </metadata>
     <messages>
         <error line="5" position="26">The 'ApplicantRole' element is invalid - The value 'wer3' is invalid  
     according to its datatype 'ApplicantRoleType' - The Enumeration constraint failed.</error>
         <error line="16" position="28">The 'Phone' element is invalid - The value 'we3056666666' is invalid  
      according to its datatype 'phone' - The Pattern constraint failed.</error>
     </messages>
 </root>

Useful links:
validate-xml-with-the-xml-task
importing-xml-documents-using-sql-server-integration-services

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

TomPhillips-1744 avatar image
0 Votes"
TomPhillips-1744 answered TomPhillips-1744 edited

You will be much happier if you always import files into a "staging" table with all varchar data types for the columns. Then you can do any data validation/cleaning on the staging table then "merge" the data into your target using TSQL commands.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

CamiloCuevas-0569 avatar image
0 Votes"
CamiloCuevas-0569 answered

You can use conditional split tasks to define rule by rule.
when a row doesn't match the rule, you can define a flow that sends it to a derived column to correct it,
with the change applied you can send it back to the main flow with a union all task.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.