執行 Python 腳本元件

發行項
09/01/2024

本文說明 Azure 機器學習設計工具中的執行 Python 腳本元件。

使用此元件來執行 Python 程式代碼。如需 Python 架構和設計原則的詳細資訊，請參閱如何在 Azure 機器學習設計工具中執行 Python 程式代碼。

使用 Python，您可以執行現有元件不支援的工作，例如：

使用 matplotlib將數據可視化。
使用 Python 連結庫來列舉工作區中的數據集和模型。
從匯入數據元件不支援的來源讀取、載入及操作數據。
執行您自己的深度學習程序代碼。

支援的 Python 套件

Azure 機器學習使用 Python 的 Anaconda 散發，其中包含許多常見的數據處理公用程式。我們將自動更新 Anaconda 版本。目前的版本為：

Python 3.6 的 Anaconda 4.5+ 散發套件

如需完整清單，請參閱預安裝 Python 套件一節。

若要安裝未在預安裝清單中的套件（例如 scikit-misc），請將下列程式代碼新增至您的文稿：

import os
os.system(f"pip install scikit-misc")

使用下列程式代碼來安裝套件以提升效能，特別是用於推斷：

import importlib.util
package_name = 'scikit-misc'
spec = importlib.util.find_spec(package_name)
if spec is None:
    import os
    os.system(f"pip install scikit-misc")

注意

如果您的管線包含多個「執行 Python 腳本」元件，而該元件需要不在預安裝清單中的套件，請在每個元件中安裝套件。

警告

Excute Python 腳本元件不支援安裝依賴額外原生連結庫的套件，如 “apt-get”，例如 Java、PyODBC 等。這是因為此元件是在僅預安裝 Python 且具有非系統管理員許可權的簡單環境中執行。

存取目前工作區和已註冊的數據集

您可以參考下列範例程式代碼，以存取工作區中已註冊的資料集：

def azureml_main(dataframe1 = None, dataframe2 = None):

    # Execution logic goes here
    print(f'Input pandas.DataFrame #1: {dataframe1}')
    from azureml.core import Run
    run = Run.get_context(allow_offline=True)
    #access to current workspace
    ws = run.experiment.workspace

    #access to registered dataset of current workspace
    from azureml.core import Dataset
    dataset = Dataset.get_by_name(ws, name='test-register-tabular-in-designer')
    dataframe1 = dataset.to_pandas_dataframe()
     
    # If a zip file is connected to the third input port,
    # it is unzipped under "./Script Bundle". This directory is added
    # to sys.path. Therefore, if your zip file contains a Python file
    # mymodule.py you can import it using:
    # import mymodule

    # Return value must be of a sequence of pandas.DataFrame
    # E.g.
    #   -  Single return value: return dataframe1,
    #   -  Two return values: return dataframe1, dataframe2
    return dataframe1,

上傳檔案

執行 Python 腳本元件支援使用 Azure 機器學習 Python SDK 來上傳檔案。

下列範例示範如何在執行 Python 腳本元件中上傳圖像檔：


# The script MUST contain a function named azureml_main,
# which is the entry point for this component.

# Imports up here can be used to
import pandas as pd

# The entry point function must have two input arguments:
#   Param<dataframe1>: a pandas.DataFrame
#   Param<dataframe2>: a pandas.DataFrame
def azureml_main(dataframe1 = None, dataframe2 = None):

    # Execution logic goes here
    print(f'Input pandas.DataFrame #1: {dataframe1}')

    from matplotlib import pyplot as plt
    plt.plot([1, 2, 3, 4])
    plt.ylabel('some numbers')
    img_file = "line.png"
    plt.savefig(img_file)

    from azureml.core import Run
    run = Run.get_context(allow_offline=True)
    run.upload_file(f"graphics/{img_file}", img_file)

    # Return value must be of a sequence of pandas.DataFrame
    # For example:
    #   -  Single return value: return dataframe1,
    #   -  Two return values: return dataframe1, dataframe2
    return dataframe1,

管線執行完成之後，您可以在元件的右面板中預覽映像。

上傳影像的預覽

您也可以使用下列程式代碼將檔案上傳至任何資料存放區。您只能預覽記憶體帳戶中的檔案。

import pandas as pd

# The entry point function MUST have two input arguments.
# If the input port is not connected, the corresponding
# dataframe argument will be None.
#   Param<dataframe1>: a pandas.DataFrame
#   Param<dataframe2>: a pandas.DataFrame
def azureml_main(dataframe1 = None, dataframe2 = None):

    # Execution logic goes here
    print(f'Input pandas.DataFrame #1: {dataframe1}')

    from matplotlib import pyplot as plt
    import os

    plt.plot([1, 2, 3, 4])
    plt.ylabel('some numbers')
    img_file = "line.png"

    # Set path
    path = "./img_folder"
    os.mkdir(path)
    plt.savefig(os.path.join(path,img_file))

    # Get current workspace
    from azureml.core import Run
    run = Run.get_context(allow_offline=True)
    ws = run.experiment.workspace
    
    # Get a named datastore from the current workspace and upload to specified path
    from azureml.core import Datastore 
    datastore = Datastore.get(ws, datastore_name='workspacefilestore')
    datastore.upload(path)

    return dataframe1,

如何設定執行 Python 腳本

執行 Python 腳本元件包含可用來作為起點的範例 Python 程式代碼。若要設定執行 Python 腳本元件，請提供一組輸入和 Python 程式代碼，以在 [Python 腳本 ] 文字框中執行。

將執行 Python 腳本 元件新增至管線。
從您想要用於輸入的設計工具，在 Dataset1 上新增並連接任何數據集。將 Python 腳本中的這個數據集參考為 DataFrame1。

使用數據集是選擇性的。如果您想要使用 Python 產生資料，或使用 Python 程式代碼直接將數據匯入元件，請使用它。

此元件支援在 Dataset2 上新增第二個數據集。將 Python 腳本中的第二個數據集參考為 DataFrame2。

當使用此元件載入時，儲存在 Azure 機器學習中的數據集會自動轉換成 pandas 數據框架。
若要包含新的 Python 套件或程式碼，請將包含這些自訂資源的 壓縮檔連接到腳本套件組合 埠。或者，如果您的腳本大於 16 KB，請使用腳本組合埠來避免如 CommandLine 超過 16597 個字元的限制的錯誤。
1. 將腳本和其他自定義資源組合至 zip 檔案。
2. 將 zip 檔案上傳為 檔案數據集 至 Studio。
3. 從 設計工具撰寫頁面左元件窗格中的 [資料集 ] 清單拖曳數據集元件。
4. 將數據集元件連線至執行 Python 腳本元件的腳本套件組合埠。
上傳壓縮封存中包含的任何檔案都可以在管線執行期間使用。如果封存包含目錄結構，則會保留結構。

重要

請針對腳本套件組合中的檔案使用唯一且有意義的名稱，因為某些通用字組（例如 test、 app 等等）會保留給內建服務。

以下是腳本套件組合範例，其中包含 Python 腳本檔案和 txt 檔案：

以下是的內容 my_script.py：
```
def my_func(dataframe1):
    return dataframe1
```
以下是示範如何使用文稿套件組合中的檔案的範例程式代碼：
```
import pandas as pd
from my_script import my_func

def azureml_main(dataframe1 = None, dataframe2 = None):

    # Execution logic goes here
    print(f'Input pandas.DataFrame #1: {dataframe1}')

    # Test the custom defined Python function
    dataframe1 = my_func(dataframe1)

    # Test to read custom uploaded files by relative path
    with open('./Script Bundle/my_sample.txt', 'r') as text_file:
        sample = text_file.read()

    return dataframe1, pd.DataFrame(columns=["Sample"], data=[[sample]])
```
在 [ Python 腳本] 文本框中，輸入或貼上有效的 Python 腳本。
注意

撰寫腳本時請小心。請確定沒有語法錯誤，例如使用未宣告的變數或未匯入的元件或函式。請特別注意預安裝元件清單。若要匯入未列出的元件，請在腳稿中安裝對應的套件，例如：
```
import os
os.system(f"pip install scikit-misc")
```
Python 文稿文字框會預先填入批注中的一些指示，以及用於數據存取和輸出的範例程式代碼。您必須編輯或取代此程式碼。遵循 Python 慣例進行縮排和大小寫：
- 腳本必須包含名為 azureml_main 的函式，做為此元件的進入點。
- 進入點函式必須有兩個輸入自變數， Param<dataframe1> 即使 Param<dataframe2>腳本中未使用這些自變數也一樣。
- 連線到第三個輸入埠的壓縮檔會解壓縮並儲存在目錄中，此目錄 .\Script Bundle也會新增至 Python sys.path。
如果您的.zip檔案包含 mymodule.py，請使用 import mymodule加以匯入。

兩個數據集可以傳回至設計工具，這必須是類型的 pandas.DataFrame序列。您可以在 Python 程式代碼中建立其他輸出，並將其直接寫入 Azure 記憶體。

警告

不建議連線到執行 Python 腳本元件中的資料庫或其他外部記憶體。您可以使用匯入資料元件和匯出資料元件
提交管線。

如果元件已完成，請檢查輸出是否如預期般。

如果元件失敗，您必須進行一些疑難解答。選取元件，然後在右窗格中開啟 Outputs+logs 。開啟70_driver_log.txt並在azureml_main中搜尋，然後您可以找到哪一行造成錯誤。例如，「檔案」/tmp/tmp01_ID/user_script.py“，行 17，azureml_main”表示錯誤發生在 Python 腳本的 17 行中。

結果

內嵌 Python 程式代碼的任何計算結果都必須提供為 pandas.DataFrame，這會自動轉換成 Azure 機器學習數據集格式。然後，您可以將結果與管線中的其他元件搭配使用。

元件會傳回兩個數據集：

結果數據集 1，由 Python 腳本中第一個傳回的 pandas 數據框架所定義。
結果數據集 2，由 Python 腳本中第二個傳回的 pandas 數據框架所定義。

預安裝 Python 套件

預安裝的套件如下：

adal==1.2.2
applicationinsights==0.11.9
attrs==19.3.0
azure-common==1.1.25
azure-core==1.3.0
azure-graphrbac==0.61.1
azure-identity==1.3.0
azure-mgmt-authorization==0.60.0
azure-mgmt-containerregistry==2.8.0
azure-mgmt-keyvault==2.2.0
azure-mgmt-resource==8.0.1
azure-mgmt-storage==8.0.0
azure-storage-blob==1.5.0
azure-storage-common==1.4.2
azureml-core==1.1.5.5
azureml-dataprep-native==14.1.0
azureml-dataprep==1.3.5
azureml-defaults==1.1.5.1
azureml-designer-classic-modules==0.0.118
azureml-designer-core==0.0.31
azureml-designer-internal==0.0.18
azureml-model-management-sdk==1.0.1b6.post1
azureml-pipeline-core==1.1.5
azureml-telemetry==1.1.5.3
backports.tempfile==1.0
backports.weakref==1.0.post1
boto3==1.12.29
botocore==1.15.29
cachetools==4.0.0
certifi==2019.11.28
cffi==1.12.3
chardet==3.0.4
click==7.1.1
cloudpickle==1.3.0
configparser==3.7.4
contextlib2==0.6.0.post1
cryptography==2.8
cycler==0.10.0
dill==0.3.1.1
distro==1.4.0
docker==4.2.0
docutils==0.15.2
dotnetcore2==2.1.13
flask==1.0.3
fusepy==3.0.1
gensim==3.8.1
google-api-core==1.16.0
google-auth==1.12.0
google-cloud-core==1.3.0
google-cloud-storage==1.26.0
google-resumable-media==0.5.0
googleapis-common-protos==1.51.0
gunicorn==19.9.0
idna==2.9
imbalanced-learn==0.4.3
isodate==0.6.0
itsdangerous==1.1.0
jeepney==0.4.3
jinja2==2.11.1
jmespath==0.9.5
joblib==0.14.0
json-logging-py==0.2
jsonpickle==1.3
jsonschema==3.0.1
kiwisolver==1.1.0
liac-arff==2.4.0
lightgbm==2.2.3
markupsafe==1.1.1
matplotlib==3.1.3
more-itertools==6.0.0
msal-extensions==0.1.3
msal==1.1.0
msrest==0.6.11
msrestazure==0.6.3
ndg-httpsclient==0.5.1
nimbusml==1.6.1
numpy==1.18.2
oauthlib==3.1.0
pandas==0.25.3
pathspec==0.7.0
pip==20.0.2
portalocker==1.6.0
protobuf==3.11.3
pyarrow==0.16.0
pyasn1-modules==0.2.8
pyasn1==0.4.8
pycparser==2.20
pycryptodomex==3.7.3
pyjwt==1.7.1
pyopenssl==19.1.0
pyparsing==2.4.6
pyrsistent==0.16.0
python-dateutil==2.8.1
pytz==2019.3
requests-oauthlib==1.3.0
requests==2.23.0
rsa==4.0
ruamel.yaml==0.15.89
s3transfer==0.3.3
scikit-learn==0.22.2
scipy==1.4.1
secretstorage==3.1.2
setuptools==46.1.1.post20200323
six==1.14.0
smart-open==1.10.0
urllib3==1.25.8
websocket-client==0.57.0
werkzeug==0.16.1
wheel==0.34.2

下一步

請參閱 Azure 機器學習可用的元件集。

共用方式為