Thanks for troubleshooting on this.
It looks like the error you're encountering is due to the MLTable
object not having a method called to_spark_dataframe
. Instead, you can convert the MLTable
to a Pandas DataFrame and then convert it to a Spark DataFrame. Here's an example of how you can do this:
from mltable import load
from azureml.core import Workspace
import pandas as pd
from pyspark.sql import SparkSession
# Initialize Spark session
spark = SparkSession.builder.appName("MySparkApp").getOrCreate()
# Load MLTable
ws = Workspace.from_config()
path = "./mltable-test/" # Path to your mltable YAML file
mltable = load(path)
# Convert MLTable to Pandas DataFrame
pandas_df = mltable.to_pandas_dataframe()
# Convert Pandas DataFrame to Spark DataFrame
spark_df = spark.createDataFrame(pandas_df)
spark_df.show()
Regarding the installation issue, you got it right so you should use pip install mltable
instead of azureml-mltable
. The correct command is:
pip install mltable pyspark
Hope this should resolve the version error you encountered.
If you have any further questions or need additional assistance, feel free to ask!
Thanks.