Hello Haans
Thanks for reaching out to us. For Clean Missing Value component, please refer to this document - https://learn.microsoft.com/en-us/azure/machine-learning/component-reference/clean-missing-data
Use this component to remove, replace, or infer missing values.
Data scientists often check data for missing values and then perform various operations to fix the data or insert new values. The goal of such cleaning operations is to prevent problems caused by missing data that can arise when training a model.
This component supports multiple types of operations for "cleaning" missing values, including:
- Replacing missing values with a placeholder, mean, or other value
- Completely removing rows and columns that have missing values
- Inferring values based on statistical methods
Using this component does not change your source dataset. Instead, it creates a new dataset in your workspace that you can use in the subsequent workflow. You can also save the new, cleaned dataset for reuse.
This component also outputs a definition of the transformation used to clean the missing values. You can re-use this transformation on other datasets that have the same schema, by using the Apply Transformation component.
The component returns two outputs:
- Cleaned dataset: A dataset comprised of the selected columns, with missing values handled as specified, along with an indicator column, if you selected that option. Columns not selected for cleaning are also "passed through".
- Cleaning transformation: A data transformation used for cleaning, that can be saved in your workspace and applied to new data later.
If you don't want to the missing data to effect the result a lot, you may try mean as an option.
I hope this helps!
Regards,
Yutong
-Please kindly accept the answer if you feel helpful to support the community, thanks a lot.