如何用 Microsoft Fabric REST API 建立並更新 Spark 工作定義

Microsoft Fabric REST API 提供一個服務端點，用於 Fabric 項目的 CRUD 操作。在本教學課程中，我們會逐步解說如何建立和更新Spark作業定義成品的端對端案例。涉及三個高階步驟：

建立一個包含初始狀態的 Spark 工作定義項目。
上傳主要的定義檔和其他庫檔。
更新 Spark 工作定義項目，包含主定義檔案的 OneLake URL 及其他 lib 檔案。

必要條件

存取 Fabric REST API 需要 Microsoft Entra 憑證。建議使用 MSAL 連結庫來取得令牌。如需詳細資訊，請參閱 MSAL 中的驗證流程支援。
需要儲存體令牌才能存取 OneLake API。如需詳細資訊，請參閱 MSAL for Python。

建立具有初始狀態的Spark作業定義項目

Microsoft Fabric REST API 定義了 Fabric 項目 CRUD 操作的統一端點。端點為 https://api.fabric.microsoft.com/v1/workspaces/{workspaceId}/items。

項目細節會在請求文中指定。以下是建立Spark作業定義項目的要求本文範例：

{
    "displayName": "SJDHelloWorld",
    "type": "SparkJobDefinition",
    "definition": {
        "format": "SparkJobDefinitionV1",
        "parts": [
            {
                "path": "SparkJobDefinitionV1.json",
                "payload": "<REDACTED>",
                "payloadType": "InlineBase64"
            }
        ]
    }
}

在此範例中，Spark 工作定義項目命名為SJDHelloWorld。欄位 payload 是詳細設定中以 base64 編碼的內容。解碼後，內容為：

{
    "executableFile":null,
    "defaultLakehouseArtifactId":"",
    "mainClass":"",
    "additionalLakehouseIds":[],
    "retryPolicy":null,
    "commandLineArguments":"",
    "additionalLibraryUris":[],
    "language":"",
    "environmentArtifactId":null
}

以下是用來編碼和譯碼詳細設定的兩個協助程式函式：

import base64

def json_to_base64(json_data):
    # Serialize the JSON data to a string
    json_string = json.dumps(json_data)
    
    # Encode the JSON string as bytes
    json_bytes = json_string.encode('utf-8')
    
    # Encode the bytes as Base64
    base64_encoded = base64.b64encode(json_bytes).decode('utf-8')
    
    return base64_encoded

def base64_to_json(base64_data):
    # Decode the Base64-encoded string to bytes
    base64_bytes = base64_data.encode('utf-8')
    
    # Decode the bytes to a JSON string
    json_string = base64.b64decode(base64_bytes).decode('utf-8')
    
    # Deserialize the JSON string to a Python dictionary
    json_data = json.loads(json_string)
    
    return json_data

以下是建立 Spark 作業定義項目的代碼段：

import requests

bearerToken = "<REDACTED>"  # Replace this token with the real AAD token

headers = {
    "Authorization": f"Bearer {bearerToken}", 
    "Content-Type": "application/json"  # Set the content type based on your request
}

payload = "<REDACTED>"

# Define the payload data for the POST request
payload_data = {
    "displayName": "SJDHelloWorld",
    "Type": "SparkJobDefinition",
    "definition": {
        "format": "SparkJobDefinitionV1",
        "parts": [
            {
                "path": "SparkJobDefinitionV1.json",
                "payload": payload,
                "payloadType": "InlineBase64"
            }
        ]
    }
}

# Make the POST request with Bearer authentication
sjdCreateUrl = f"https://api.fabric.microsoft.com//v1/workspaces/{workspaceId}/items"
response = requests.post(sjdCreateUrl, json=payload_data, headers=headers)

上傳主要定義檔和其他 lib 檔案

需要儲存體令牌，才能將檔案上傳至 OneLake。以下是取得儲存體令牌的協助程式函式：

import msal

def getOnelakeStorageToken():
    app = msal.PublicClientApplication(
        "<REDACTED>",  # This field should be the client ID 
        authority="https://login.microsoftonline.com/microsoft.com")

    result = app.acquire_token_interactive(scopes=["https://storage.azure.com/.default"])

    print(f"Successfully acquired AAD token with storage audience:{result['access_token']}")

    return result['access_token']

現在我們建立了一個 Spark 工作定義項目。為了讓它可執行，我們需要設定主定義檔和所需的屬性。上傳此 SJD 項目的檔案端點為https://onelake.dfs.fabric.microsoft.com/{workspaceId}/{sjdartifactid}。應該使用前一步的「workspaceId」。可以在前一步驟的回應正文中找到「sjdartifactid」的值。以下是設定主要定義檔的代碼段：

import requests

# Three steps are required: create file, append file, flush file

onelakeEndPoint = "https://onelake.dfs.fabric.microsoft.com/workspaceId/sjdartifactid"  # Replace the ID of workspace and artifact with the right one
mainExecutableFile = "main.py"  # The name of the main executable file
mainSubFolder = "Main"  # The sub folder name of the main executable file. Don't change this value


onelakeRequestMainFileCreateUrl = f"{onelakeEndPoint}/{mainSubFolder}/{mainExecutableFile}?resource=file"  # The URL for creating the main executable file via the 'file' resource type
onelakePutRequestHeaders = {
    "Authorization": f"Bearer {onelakeStorageToken}",  # The storage token can be achieved from the helper function above
}

onelakeCreateMainFileResponse = requests.put(onelakeRequestMainFileCreateUrl, headers=onelakePutRequestHeaders)
if onelakeCreateMainFileResponse.status_code == 201:
    # Request was successful
    print(f"Main File '{mainExecutableFile}' was successfully created in OneLake.")

# With the previous step, the main executable file is created in OneLake. Now we need to append the content of the main executable file

appendPosition = 0
appendAction = "append"

### Main File Append.
mainExecutableFileSizeInBytes = 83  # The size of the main executable file in bytes
onelakeRequestMainFileAppendUrl = f"{onelakeEndPoint}/{mainSubFolder}/{mainExecutableFile}?position={appendPosition}&action={appendAction}"
mainFileContents = "<REDACTED>"  # The content of the main executable file, please replace this with the real content of the main executable file
mainExecutableFileSizeInBytes = 83  # The size of the main executable file in bytes, this value should match the size of the mainFileContents

onelakePatchRequestHeaders = {
    "Authorization": f"Bearer {onelakeStorageToken}",
    "Content-Type": "text/plain"
}

onelakeAppendMainFileResponse = requests.patch(onelakeRequestMainFileAppendUrl, data = mainFileContents, headers=onelakePatchRequestHeaders)
if onelakeAppendMainFileResponse.status_code == 202:
    # Request was successful
    print(f"Successfully accepted main file '{mainExecutableFile}' append data.")

# With the previous step, the content of the main executable file is appended to the file in OneLake. Now we need to flush the file

flushAction = "flush"

### Main File flush
onelakeRequestMainFileFlushUrl = f"{onelakeEndPoint}/{mainSubFolder}/{mainExecutableFile}?position={mainExecutableFileSizeInBytes}&action={flushAction}"
print(onelakeRequestMainFileFlushUrl)
onelakeFlushMainFileResponse = requests.patch(onelakeRequestMainFileFlushUrl, headers=onelakePatchRequestHeaders)
if onelakeFlushMainFileResponse.status_code == 200:
    print(f"Successfully flushed main file '{mainExecutableFile}' contents.")
else:
    print(onelakeFlushMainFileResponse.json())

請遵循相同的程序，視需要上傳其他 lib 檔案。

使用主要定義檔和其他 lib 檔案的 OneLake URL 更新 Spark 作業定義項目

直到現在，我們都建立了一個帶有初始狀態的 Spark 工作定義項目，並上傳了主定義檔和其他 lib 檔。最後一步是更新 Spark Job 定義項目，設定主定義檔及其他 lib 檔案的 URL 屬性。用於更新 Spark 作業定義項目的端點為https://api.fabric.microsoft.com/v1/workspaces/{workspaceId}/items/{sjdartifactid}。應該使用先前步驟中的相同「workspaceId」和「sjdartifactid」。。以下是更新 Spark 作業定義項目的代碼段：

mainAbfssPath = f"abfss://{workspaceId}@onelake.dfs.fabric.microsoft.com/{sjdartifactid}/Main/{mainExecutableFile}"  # The workspaceId and sjdartifactid are the same as previous steps, the mainExecutableFile is the name of the main executable file
libsAbfssPath = f"abfss://{workspaceId}@onelake.dfs.fabric.microsoft.com/{sjdartifactid}/Libs/{libsFile}"  # The workspaceId and sjdartifactid are the same as previous steps, the libsFile is the name of the libs file
defaultLakehouseId = '<REDACTED>'  # Replace this with the real default lakehouse ID

updateRequestBodyJson = {
    "executableFile": mainAbfssPath,
    "defaultLakehouseArtifactId": defaultLakehouseId,
    "mainClass": "",
    "additionalLakehouseIds": [],
    "retryPolicy": None,
    "commandLineArguments": "",
    "additionalLibraryUris": [libsAbfssPath],
    "language": "Python",
    "environmentArtifactId": None}

# Encode the bytes as a Base64-encoded string
base64EncodedUpdateSJDPayload = json_to_base64(updateRequestBodyJson)

# Print the Base64-encoded string
print("Base64-encoded JSON payload for SJD Update:")
print(base64EncodedUpdateSJDPayload)

# Define the API URL
updateSjdUrl = f"https://api.fabric.microsoft.com//v1/workspaces/{workspaceId}/items/{sjdartifactid}/updateDefinition"

updatePayload = base64EncodedUpdateSJDPayload
payloadType = "InlineBase64"
path = "SparkJobDefinitionV1.json"
format = "SparkJobDefinitionV1"
Type = "SparkJobDefinition"

# Define the headers with Bearer authentication
bearerToken = "<REDACTED>"  # Replace this token with the real AAD token

headers = {
    "Authorization": f"Bearer {bearerToken}", 
    "Content-Type": "application/json"  # Set the content type based on your request
}

# Define the payload data for the POST request
payload_data = {
    "displayName": "sjdCreateTest11",
    "Type": Type,
    "definition": {
        "format": format,
        "parts": [
            {
                "path": path,
                "payload": updatePayload,
                "payloadType": payloadType
            }
        ]
    }
}


# Make the POST request with Bearer authentication
response = requests.post(updateSjdUrl, json=payload_data, headers=headers)
if response.status_code == 200:
    print("Successfully updated SJD.")
else:
    print(response.json())
    print(response.status_code)

若要回顧整個程式，需要網狀架構 REST API 和 OneLake API 來建立和更新 Spark 作業定義項目。 Fabric REST API 用於建立和更新 Spark 工作定義項目。 OneLake API 用於上傳主定義檔及其他函式庫檔案。主要定義檔和其他 lib 檔案會先上傳至 OneLake。然後，主要定義檔和其他 lib 檔案的 URL 屬性會設定在 Spark 作業定義項目中。

排程並執行 Apache Spark 作業定義

意見反應

此頁面對您有幫助嗎？

Last updated on 2025-12-05