I will try to answer point by point :
- Loading only .csv files:
- Use the 'Foreach Loop Container' to iterate through files in the designated folder. In the 'Collection' setting of 'Foreach Loop Container', set 'Enumerator' to 'Foreach File Enumerator' and select '.csv' as 'Files' extension.
- Creating a reject file for truncated or datatype mismatch records:
- Inside the data flow task, use a 'Flat File Source' to load the CSV file. You can then add 'Data Conversion' tasks to attempt to convert the data to your desired formats. The output of this task will include an 'Error Output' which you can redirect to a 'Flat File Destination' to create your rejection file.
- Trimming leading/trailing spaces and handling nulls:
- You can use the 'Derived Column' task to create new columns based on existing ones. For instance, the expression
TRIM([ColumnName])
will remove leading and trailing spaces. Similarly, the expression ISNULL([MeasureColumnName]) ? 0 : [MeasureColumnName]
will replace nulls with 0.
- Capturing the details of files loaded:
- To log the details of the files loaded, you can use an 'Execute SQL Task' after your data flow task to insert a record into your log table. The system variable
User::FileName
will contain the current file name inside the 'Foreach Loop Container'.
- Converting business keys to surrogate keys:
- This is typically done by using a 'Lookup' task in your data flow. The 'Lookup' can join the incoming data with your dimension table based on the business key, and return the surrogate key.
- Checking for existing records and performing updates or inserts (upsert operation):
- This is also typically handled by a 'Lookup' task. If the lookup finds a match, the record is directed to the 'Match Output', where you can perform an update. If no match is found, the record goes to the 'No Match Output', where you can perform an insert.
- Capturing package failure details: SSIS has system variables that capture the current executing package and task names (
System::PackageName
and System::TaskName
respectively). You can use 'Event Handlers' at the package level for 'OnError' event type, and use an 'Execute SQL Task' within this event handler to insert the failure details (System::ErrorDescription
) into your control table.