load fact data from .csv files to table perform the following vallidations

vaishnavi tatrashi 1 Reputation point
2021-10-19T06:00:53.023+00:00
  1. load only .csv files.
  2. create reject file by rejecting the records due to truncate or data type mismatch error.
  3. check if there are any leading and tailing spaces on. if the measure column coming from source is null then convert it to 0.
  4. capture the details of files loaded.
  5. convert the business key to surrogate key.
  6. check if a record already exists on the basis of dimension data keys. if records already exists and a changed record is coming the file , then overwrite the data on the basis of dimension data key. if a new comes then insert the data
  7. In case package failure or any task failure, capture the failure details in another control table specifying package name failed, task name failed and the error details
SQL Server Integration Services
SQL Server Integration Services
A Microsoft platform for building enterprise-level data integration and data transformations solutions.
2,451 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 14,881 Reputation points
    2023-08-05T14:07:46.4833333+00:00

    I will try to answer point by point :

    1. Loading only .csv files:
      • Use the 'Foreach Loop Container' to iterate through files in the designated folder. In the 'Collection' setting of 'Foreach Loop Container', set 'Enumerator' to 'Foreach File Enumerator' and select '.csv' as 'Files' extension.
    2. Creating a reject file for truncated or datatype mismatch records:
      • Inside the data flow task, use a 'Flat File Source' to load the CSV file. You can then add 'Data Conversion' tasks to attempt to convert the data to your desired formats. The output of this task will include an 'Error Output' which you can redirect to a 'Flat File Destination' to create your rejection file.
    3. Trimming leading/trailing spaces and handling nulls:
      • You can use the 'Derived Column' task to create new columns based on existing ones. For instance, the expression TRIM([ColumnName]) will remove leading and trailing spaces. Similarly, the expression ISNULL([MeasureColumnName]) ? 0 : [MeasureColumnName] will replace nulls with 0.
    4. Capturing the details of files loaded:
      • To log the details of the files loaded, you can use an 'Execute SQL Task' after your data flow task to insert a record into your log table. The system variable User::FileName will contain the current file name inside the 'Foreach Loop Container'.
    5. Converting business keys to surrogate keys:
      • This is typically done by using a 'Lookup' task in your data flow. The 'Lookup' can join the incoming data with your dimension table based on the business key, and return the surrogate key.
    6. Checking for existing records and performing updates or inserts (upsert operation):
      • This is also typically handled by a 'Lookup' task. If the lookup finds a match, the record is directed to the 'Match Output', where you can perform an update. If no match is found, the record goes to the 'No Match Output', where you can perform an insert.
    7. Capturing package failure details: SSIS has system variables that capture the current executing package and task names (System::PackageName and System::TaskName respectively). You can use 'Event Handlers' at the package level for 'OnError' event type, and use an 'Execute SQL Task' within this event handler to insert the failure details (System::ErrorDescription) into your control table.
    0 comments No comments