Azure machine learning data scheme blue print support?

delonder carter 20 Reputation points
2023-04-29T23:20:30.54+00:00

I am looking for a way to import my data according to my data blueprint. My data scheme is changing depending on our scenario. It finally will be import as Pandas but I want to define the way. Is this possible?

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,844 questions
0 comments No comments
{count} votes

Accepted answer
  1. YutongTie-MSFT 50,831 Reputation points
    2023-04-30T20:32:13.4866667+00:00

    Hello

    Thanks for reaching out to us, one choice you may want to consider is MLtable - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-mltable?view=azureml-api-2&tabs=cli

    Azure Machine Learning supports a Table type (mltable). This allows for the creation of a blueprint that defines how to load data files into memory as a Pandas or Spark data frame.

    This is very similar to the scenario you described.

    Azure Machine Learning Tables (mltable) allow you to define how you want to load your data files into memory, as a Pandas and/or Spark data frame. Tables have two key features:

    1. An MLTable file. A YAML-based file that defines the data loading blueprint. In the MLTable file, you can specify:
    • The storage location(s) of the data - local, in the cloud, or on a public http(s) server.
    • Globbing patterns over cloud storage. These locations can specify sets of filenames, with wildcard characters (*).
    • read transformation - for example, the file format type (delimited text, Parquet, Delta, json), delimiters, headers, etc.
    • Column type conversions (enforce schema).
    • New column creation, using folder structure information - for example, creation of a year and month column, using the {year}/{month} folder structure in the path.
    • Subsets of data to load - for example, filter rows, keep/drop columns, take random samples.
    1. A fast and efficient engine to load the data into a Pandas or Spark dataframe, according to the blueprint defined in the MLTable file. The engine relies on Rust for high speed and memory efficiency.

    Please take a look at above and let me know if this is what you are looking for, thanks.

    Regards,

    Yutong

    -Please kindly accept the answer and vote yes if you feel helpful to support the community, thanks a lot.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.