What is the fundamental idea of missing value cleanse?

Haans 40 Reputation points
2023-01-30T17:14:42.12+00:00

Hello, My project’s project need some data from user feedback. There will be some missing value, but when I try it I find the result is not reliable. What is the fundamental idea? What is the best setting? I just want to make the result constantly.

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,729 questions
0 comments No comments
{count} votes

Accepted answer
  1. YutongTie-MSFT 48,586 Reputation points
    2023-01-30T17:57:57.6266667+00:00

    Hello Haans

    Thanks for reaching out to us. For Clean Missing Value component, please refer to this document - https://learn.microsoft.com/en-us/azure/machine-learning/component-reference/clean-missing-data

    Use this component to remove, replace, or infer missing values.

    Data scientists often check data for missing values and then perform various operations to fix the data or insert new values. The goal of such cleaning operations is to prevent problems caused by missing data that can arise when training a model.

    This component supports multiple types of operations for "cleaning" missing values, including:

    • Replacing missing values with a placeholder, mean, or other value
    • Completely removing rows and columns that have missing values
    • Inferring values based on statistical methods

    Using this component does not change your source dataset. Instead, it creates a new dataset in your workspace that you can use in the subsequent workflow. You can also save the new, cleaned dataset for reuse.

    This component also outputs a definition of the transformation used to clean the missing values. You can re-use this transformation on other datasets that have the same schema, by using the Apply Transformation component.

    The component returns two outputs:

    • Cleaned dataset: A dataset comprised of the selected columns, with missing values handled as specified, along with an indicator column, if you selected that option. Columns not selected for cleaning are also "passed through".
    • Cleaning transformation: A data transformation used for cleaning, that can be saved in your workspace and applied to new data later.

    If you don't want to the missing data to effect the result a lot, you may try mean as an option.

    I hope this helps!

    Regards,

    Yutong

    -Please kindly accept the answer if you feel helpful to support the community, thanks a lot.


0 additional answers

Sort by: Most helpful