Hi @Charlie Welcome to Microsoft Q&A platform and thanks for posting your question here.
As I understand that you're looking for help setting up an Azure pipeline to dynamically locate a file pattern in a dataset, extract the filename, and append it to the daily sales file as a new column. I'll be happy to assist you with that.
- To answer your first question, you can dynamically locate the file pattern in a dataset by using a wildcard in the dataset's file path. A wildcard is a character that can represent one or more characters in a file or folder name. In your case, you can use the asterisk (*) wildcard to represent the variable parts of the file name, such as the date and time.
Example to setup the dataset with a wildcard:
{
"name": "SalesDataset",
"properties": {
"linkedServiceName": "AzureBlobStorageLinkedService",
"folderPath": "Sales/Data_Source",
"fileName": {
"value": "Sales_*_*_*",
"type": "WildcardFileName"
},
"format": {
"type": "TextFormat",
"columnDelimiter": ",",
"firstRowAsHeader": true
}
}
}
In this example, the file name pattern is set to "Sales__", which means that the dataset will match any file name that starts with "Sales*" and has three wildcard segments separated by underscores. The three wildcard segments represent the year, month, day, hour, minute, and second parts of the file name.
- To answer your second question, you can extract the filename and append it to the daily sales file as a new column by using a derived column transformation in the data pipeline. A derived column transformation allows you to add a new column to the dataset and set its value based on an expression.
Example of how you can add a new column to the dataset with the filename:
{
"name": "SalesPipeline",
"properties": {
"activities": [
{
"name": "SalesCopyActivity",
"type": "Copy",
"inputs": [
{
"name": "SalesDataset"
}
],
"outputs": [
{
"name": "SalesOutputDataset"
}
],
"copy": {
"source": {
"type": "BlobSource"
},
"sink": {
"type": "BlobSink"
},
"transformations": [
{
"name": "AddFilenameColumn",
"type": "DerivedColumn",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"columns": [
{
"name": "Filename",
"expression": "@substring(activity('SalesCopyActivity').output.firstRow.FileName, add(indexOf(activity('SalesCopyActivity').output.firstRow.FileName, '_'), 1), sub(indexOf(activity('SalesCopyActivity').output.firstRow.FileName, '_', add(indexOf(activity('SalesCopyActivity').output.firstRow.FileName, '_'), 1)), indexOf(activity('SalesCopyActivity').output.firstRow.FileName, '_', add(indexOf(activity('SalesCopyActivity').output.firstRow.FileName, '_', add(indexOf(activity('SalesCopyActivity').output.firstRow.FileName, '_'), 1))), 1)))"
}
]
}
}
]
}
}
]
}
}
In this example, the data pipeline includes a copy activity that copies the data from the SalesDataset to the SalesOutputDataset. The copy activity also includes a derived column transformation that adds a new column called "Filename" to the dataset and sets its value based on an expression. The expression uses the substring, indexOf, and add functions to extract the filename from the FileName column and remove the date and time parts.
- To answer your third question, you can set up one pipeline to achieve both goals. The example pipeline I provided includes both the dataset and the data pipeline activities to copy the data and add the filename column.
Reference: Derived column transformation in mapping data flow
I hope this information helps you set up your Azure pipeline. Let me know if you have any further questions or need additional assistance.