Hi @Tú Nguyễn
Here’s how you can connect and query a MySQL database using Python scripts within Azure Machine Learning SDK V2:
- Register the MySQL Database as a Datastore: Although direct support for MySQL in the
Dataset.Tabular.from_sql_query()
method is limited, you can use a custom approach. First, you need to set up your MySQL connection details and register it as a datastore if required. - Create a Python Script for Data Retrieval: You can use Python libraries such as
mysql-connector-python
orpymysql
to connect to your MySQL database and perform queries. Write a Python script that connects to your MySQL database and retrieves data.
Example of a Python script:
import mysql.connector
import pandas as pd
# MySQL connection details
db_config = {
'host': 'your-mysql-host',
'user': 'your-username',
'password': 'your-password',
'database': 'your-database'
}
# Create a connection to the database
connection = mysql.connector.connect(**db_config)
query = "SELECT * FROM your_table"
# Execute the query and load data into a pandas DataFrame
df = pd.read_sql(query, con=connection)
connection.close()
# Save DataFrame to a CSV file (or any other format)
df.to_csv('output_data.csv', index=False)
Submit a Job to AML: Use the command
job type in AML SDK V2 to run your Python script. This approach allows you to execute your custom Python script within an AML environment.
Example of submitting a job using AML SDK V2:
from azure.ai.ml import MLClient
from azure.ai.ml import command, Input, Output
from azure.ai.ml.constants import AssetTypes
from azure.identity import DefaultAzureCredential
# Enter details of your AML workspace
subscription_id = "your-subscription-id"
resource_group = "your-resource-group"
workspace = "your-workspace"
# Create MLClient
ml_client = MLClient(
DefaultAzureCredential(), subscription_id, resource_group, workspace
)
# Define your job
job = command(
code="./src", # Local path where the code is stored
command="python query_mysql.py",
environment="azureml://registries/azureml/environments/sklearn-1.5/labels/latest",
compute="cpu-cluster",
)
# Submit the job
returned_job = ml_client.jobs.create_or_update(job)
print(f"Job submitted: {returned_job.studio_url}")
Handle Data Storage: Once your job completes, you can handle the output data as needed. The data can be uploaded to a datastore or processed further within AML.