We have had similar issues and have followed the support route. Long story short is that the Azure Automation platform itself is broken and probably won't be fixed for at least a quarter. We have been advised to use hybrid workers in the meantime. We are escalating the issue.
Help with Sporadic Internal Errors in Azure Automation Runbooks
Hello everyone,
I’m hoping someone in the community has faced a similar issue and can share their insights. Since August, we’ve been experiencing problems with our Azure Automation Runbooks, originally running on the PowerShell 5.1 runtime. The main issue is that jobs sporadically fail with an "Internal Error," as indicated in the exception block of the affected jobs.
This makes troubleshooting quite challenging:
- No clear pattern: The errors occur irregularly, whether at the start or near the end of the job execution.
- Lack of detailed error messages: We only see the "Job was suspended due to an internal error. Please retry after sometime." message without additional context to pinpoint the root cause.
Here’s what we’ve tried so far:
- Avoiding parallel executions: We adjusted the scheduling of nightly jobs to eliminate parallel runs, but this hasn’t resolved the issue.
- Throttling as a suspect: Based on Azure diagnostics, we suspected a throttling issue and added Sleep commands in several parts of the Runbooks, but this also didn’t help.
- Upgrading the runtime: We migrated the Runbooks from PowerShell 5.1 to PowerShell 7.4, but the errors persist even in the updated environment.
- Analysis with Azure metrics: We tried investigating the throttling hypothesis using Azure’s built-in metrics but couldn’t find any relevant data to confirm this.
Has anyone else encountered similar problems or have suggestions on how to proceed? Specifically:
- Is there a way to retrieve more detailed error messages for these jobs?
- Are there tools or best practices for better analyzing potential throttling issues?
- Could there be an alternative approach to improving the stability of our Runbooks?
Thank you in advance for your support! I’m happy to provide more details if needed.
Best regards
Azure Automation
2 answers
Sort by: Most helpful
-
-
Pranay Reddy Madireddy 6,180 Reputation points Microsoft External Staff Moderator
2024-12-05T21:20:28.0066667+00:00 Welcome to the Microsoft Q&A Platform! Thank you for asking your question here.
Run the script on your local machine first to check for issues like missing modules, syntax errors, or logic mistakes before deploying it to Azure.
Check that all needed modules are in your Automation account. If your runbook uses any, make sure they are updated and properly installed to avoid unexpected errors.
Add more output statements to your runbook to track its execution flow. This will help you determine what occurs just before the runbook is suspended or fails.
Since you've upgraded to PowerShell 7.4, make sure all scripts and modules are compatible with this version, as it could offer better stability than previous versions.
If relevant, deploying Hybrid Runbook Workers can help resolve issues by running jobs closer to the resources they manage.
For long-running tasks, using checkpoints can help control the execution flow and recover from failures without losing progress.
For reference, please review this documentation:-
https://learn.microsoft.com/en-us/azure/automation/troubleshoot/runbooks
https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/automation/troubleshoot/runbooks.md?plain=1
https://learn.microsoft.com/en-us/azure/automation/troubleshoot/extension-based-hybrid-runbook-worker
https://github.com/uglide/azure-content/blob/master/articles/automation/automation-troubleshooting-automation-errors.mdIf you have any further queries, do let us know.
If the answer is helpful, please click "Accept Answer" and "Upvote it".