Quickstart: Use Microsoft Planetary Computer Pro GeoCatalog in Azure Batch

In this quickstart, you learn how to use a Microsoft Planetary Computer Pro GeoCatalog resource in Azure Batch to process geospatial data at scale.

Azure Batch is a cloud-based job scheduling service that enables you to run large-scale parallel and high-performance computing (HPC) workloads. By combining Azure Batch with Microsoft Planetary Computer Pro, you can:

Process large volumes of geospatial data in parallel across multiple compute nodes
Authenticate securely to GeoCatalog APIs using managed identities
Scale processing power up or down based on workload demands
Automate geospatial data pipelines without managing infrastructure

This quickstart demonstrates how to set up a Batch pool with a user-assigned managed identity, configure permissions to access your GeoCatalog, and run jobs that query the STAC API.

Tip

For an overview of application development options with Microsoft Planetary Computer Pro, see Connect and build applications with your data.

Prerequisites

Before you begin, ensure you meet the following requirements to complete this quickstart:

An Azure account with an active subscription. Use the link Create an account for free.
A Microsoft Planetary Computer Pro GeoCatalog resource. A Linux machine with the following tools installed:
- Azure CLI
- perl package.

Create a Batch account

Create a resource group:

az group create \
    --name spatiobatchdemo \
    --location uksouth

Create a storage account:

az storage account create \
    --resource-group spatiobatchdemo \
    --name spatiobatchstorage \
    --location uksouth \
    --sku Standard_LRS

Assign the Storage Blob Data Contributor to the current user to the storage account:

az role assignment create \
    --role "Storage Blob Data Contributor" \
    --assignee $(az account show --query user.name -o tsv) \
    --scope $(az storage account show --name spatiobatchstorage --resource-group spatiobatchdemo --query id -o tsv)

Create a Batch account:

az batch account create \
    --name spatiobatch \
    --storage-account spatiobatchstorage \
    --resource-group spatiobatchdemo \
    --location uksouth

Important

Ensure you have enough quota to create a pool of computer nodes. If you don't have enough quota, you can request an increase by following the instructions in the Azure Batch quotas and limits documentation.

az batch account login \
    --name spatiobatch \
    --resource-group spatiobatchdemo \
    --shared-key-auth

Once you authenticate your account with Batch, subsequent az batch commands in this session use the Batch account you created.

Create a User Assigned Managed Identity:

az identity create \
    --name spatiobatchidentity \
    --resource-group spatiobatchdemo

Create a pool of compute nodes using the Azure portal:

In the Azure portal, navigate to your Batch account and select Pools:
Select + Add to create a new pool and select User-assigned as the pool's identity:
Select the User Assigned Managed Identity you created earlier:
Select your preferred operating system and VM size. In this demo, we use Ubuntu Server 20.04 LTS:
Enable Start Task, set the following Command line: bash -c "apt-get update && apt-get install jq python3-pip -y && curl -sL https://aka.ms/InstallAzureCLIDeb | bash" and set Elevation level to Pool autouser, Admin:
Select OK to create the pool.

Assign permissions to the managed identity

You need to provide the managed identity access to the GeoCatalog. Go to your GeoCatalog, select on Access control (IAM) and select Add role assignment:

Select the appropriate role based on your needs, GeoCatalog Administrator or GeoCatalog Reader, and select Next:

Select the managed identity you created and then select Review + assign.

Prepare the Batch job

Create a container in the storage account:

az storage container create \
    --name scripts \
    --account-name spatiobatchstorage

Upload the script to the container:

az storage blob upload \
    --container-name scripts \
    --file src/task.py \
    --name task.py \
    --account-name spatiobatchstorage

Run the Batch jobs

There are two examples in this quickstart: a Python script, and a Bash script. You can use either of them to create a job.

Python script job

To execute the Python script job, execute the following commands:

geocatalog_url="<geocatalog url>"
token_expiration=$(date -u -d "30 minutes" "+%Y-%m-%dT%H:%M:%SZ")
python_task_url=$(az storage blob generate-sas --account-name spatiobatchstorage --container-name scripts --name task.py --permissions r --expiry $token_expiration --auth-mode login --as-user --full-uri -o tsv)

cat src/pythonjob.json | perl -pe "s,##PYTHON_TASK_URL##,$python_task_url,g" | perl -pe "s,##GEOCATALOG_URL##,$geocatalog_url,g" | az batch job create --json-file /dev/stdin

The Python job executes the following Python script:

import json
from os import environ
import requests
from azure.identity import DefaultAzureCredential

MPCPRO_APP_ID = "https://geocatalog.spatio.azure.com"
credential = DefaultAzureCredential()
access_token = credential.get_token(f"{MPCPRO_APP_ID}/.default")

geocatalog_url = environ["GEOCATALOG_URL"]

response = requests.get(
    f"{geocatalog_url}/stac/collections",
    headers={"Authorization": "Bearer " + access_token.token},
    params={"api-version": "2025-04-30-preview"},
)
print(json.dumps(response.json(), indent=2))

Which uses DefaultAzureCredential to authenticate with the managed identity and retrieves the collections from the GeoCatalog. To get the results of the job, execute the following command:

az batch task file download \
    --job-id pythonjob1 \
    --task-id task1 \
    --file-path "stdout.txt" \
    --destination /dev/stdout

Bash job

To execute the Bash script job, run the following commands:

geocatalog_url="<geocatalog url>"

cat src/bashjob.json | perl -pe "s,##GEOCATALOG_URL##,$geocatalog_url,g" | az batch job create --json-file /dev/stdin

The Bash job executes the following Bash script:

az login --identity --allow-no-subscriptions > /dev/null
token=$(az account get-access-token --resource https://geocatalog.spatio.azure.com --query accessToken --output tsv)
curl --header \"Authorization: Bearer $token\" $GEOCATALOG_URL/stac/collections | jq

Which uses az login --identity to authenticate with the managed identity and retrieves the collections from the GeoCatalog. To get the results of the job, run the following command:

az batch task file download \
    --job-id bashjob1 \
    --task-id task1 \
    --file-path "stdout.txt" \
    --destination /dev/stdout

Feedback

Was this page helpful?

Last updated on 2026-01-09