Hello Kefan Li,
Welcome to Microsoft Q&A. Thank you for reaching out and providing the details.
I understand that you’re encountering an issue where your Azure Language Service (Text Analytics) works as expected in a notebook but hangs or keeps running indefinitely when executed through an Azure Machine Learning (AML) job. I understand how frustrating this can be, especially when there’s no clear error message. Let’s go through some possible causes and steps to help you troubleshoot the issue.
Understanding the Issue
When a Language Service call works fine in a notebook but fails or hangs inside an AML job, it usually indicates a network configuration, authentication, or environment-related issue within the AML execution environment rather than a problem with the Language Service itself. This can happen if the AML job doesn’t have outbound internet access, if API keys aren’t correctly passed into the environment, or if there are connectivity issues between AML and Cognitive Services.
1. Verify Network Configuration
If your AML workspace is deployed in a Virtual Network (VNet), ensure that outbound access to the Language Service endpoint is allowed.
Check whether your AML job is running in a Managed VNet or behind Private Endpoints.
If outbound access is restricted, whitelist the endpoint pattern:
*.cognitiveservices.azure.com
You can also try running a test request inside the AML environment to verify connectivity.
2. Check API Key and Endpoint Configuration
Make sure your API key and endpoint are correctly passed into the AML job environment. Environment variables used in your notebook are not automatically inherited by AML jobs. Example:
endpoint = os.environ.get("LANGUAGE_ENDPOINT")
key = os.environ.get("LANGUAGE_KEY")
Verify that these are properly set in your AML job configuration or retrieved securely from Azure Key Vault.
3. Check Resource Limits and Scaling
Sometimes, if your Azure Language Service is under heavy usage, the service may throttle or delay requests.
Check the quota and usage metrics for your resource in the Azure Portal.
If usage is near the limit, consider scaling up your service tier or distributing requests over time.
4. Review AML Job Logs
Check the stdout and stderr logs from your AML job run details. If the logs stop after a call to the Language Service API, it likely means the request is hanging while waiting for a response indicating a possible network or timeout issue.
5. Configure Timeout Settings
If you’re using the Azure SDK (like azure-ai-textanalytics), set an explicit timeout value to prevent indefinite hangs:
from azure.ai.textanalytics import TextAnalyticsClient
from azure.core.credentials import AzureKeyCredential
from azure.core.exceptions import ServiceRequestError
client = TextAnalyticsClient(endpoint=endpoint, credential=AzureKeyCredential(key))
try:
result = client.analyze_sentiment(["This is a test."], timeout=30)
except ServiceRequestError as e:
print("Request timed out:", e)
You can adjust the timeout duration based on your workload.
6. Network and Connectivity Checks
Confirm that your AML environment has internet access if you’re not using private endpoints. Network restrictions often prevent requests from reaching Cognitive Services, leading to hanging jobs. If you’re using private networking, ensure proper DNS resolution and firewall rules are in place.
7. Test with Different Inputs
Try executing the AML job with simpler or smaller inputs to see if the problem is tied to certain data payloads. If smaller test cases succeed, the issue might relate to request size limits or input-specific timeouts.
8. Scale the Service if Needed
If your workload involves a large number of concurrent requests, consider scaling up your Language Service or using batching to handle inputs efficiently. This can prevent overloading the endpoint and causing response delays.
Also please try these Steps
Confirm that outbound connectivity to Cognitive Services is enabled in your AML environment.
Check your API key, endpoint, and authentication configurations.
Review job logs and test with shorter timeouts or smaller input batches.
If the issue persists even after these checks, please share your AML job run ID and workspace details so we can review the environment configuration further.
I Hope this helps. Do let me know if you have any further queries.
Thank you!