Disable dataset sampling for production in Data Factory

Jona 455 Reputation points
2023-11-14T15:11:40.9033333+00:00

Hi every one.

As you know, Data Factory has an option to sample a dataset/source so that we can be able to work with a small piece of the data. This allows us to save money and time instead of working directrly with the entire dataset.

sampling

However, as Data Factory says, that is perfectly fine for dev & debugging purposes. For production porpuses, this option needs to be disabled.

¿How could I enable/disable depending on the environment? I don't see I can insert a dynamic expression to detect the environment I'm working on.

Regards

Jonathan

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,555 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,897 questions
0 comments No comments
{count} votes

Accepted answer
  1. ShaikMaheer-MSFT 38,301 Reputation points Microsoft Employee
    2023-11-16T10:04:31.7333333+00:00

    Hi Jona,

    Thank you for posting query in Microsoft Q&A Platform.

    Sampling field value is only considered by ADF when you are running your dataflow preview by enabling debug session. Otherwise, in all other cases when you are running dataflow Sampling field value will have no impact. Hence, keeping this Sampling filed value enable or disable will not have any impact in PROD environment. For some reason, if you want to have Sampling field value as disable in PROD then consider having same in DEV environment too. When you are doing data preview in DEV environment, you can control number of rows for sample preview under debug settings. That works for DEV.

    User's image

    Hope this helps. Please let me know if any further queries. Thank you.


    Please consider hitting Accept Answer button. Accepted answers help community as well.

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Amira Bedhiafi 17,791 Reputation points
    2023-11-15T16:04:52.3433333+00:00

    By definition :

    Enable Sampling to limit the number of rows from your source. Use this setting when you test or sample data from your source for debugging purposes. This is very useful when executing data flows in debug mode from a pipeline. When debug mode is turned on, the row limit configuration in debug settings will overwrite the sampling setting in the source during data preview.

    In your Data Factory, define a global parameter that specifies the environment. For instance, you can create a parameter named Environment with possible values like Dev or Prod.

    Then, create separate linked services for your development and production environments. You can pass the environment parameter to these linked services to differentiate between them.

    Then, you can use an if expression to check the value of the Environment parameter. If it's set to Dev, enable sampling; if it's Prod, disable it.