Azure Metadata Service: Scheduled Events for Linux VMs
Applies to: ✔️ Linux VMs ✔️ Flexible scale sets ✔️ Uniform scale sets
Scheduled Events is an Azure Metadata Service that gives your application time to prepare for virtual machine (VM) maintenance. It provides information about upcoming maintenance events (for example, reboot) so that your application can prepare for them and limit disruption. It's available for all Azure Virtual Machines types, including PaaS and IaaS on both Windows and Linux.
For information about Scheduled Events on Windows, see Scheduled Events for Windows VMs.
Note
Scheduled Events is generally available in all Azure Regions. See Version and Region Availability for latest release information.
Why use Scheduled Events?
Many applications can benefit from time to prepare for VM maintenance. The time can be used to perform application-specific tasks that improve availability, reliability, and serviceability, including:
- Checkpoint and restore.
- Connection draining.
- Primary replica failover.
- Removal from a load balancer pool.
- Event logging.
- Graceful shutdown.
With Scheduled Events, your application can discover when maintenance will occur and trigger tasks to limit its impact.
Scheduled Events provides events in the following use cases:
- Platform initiated maintenance (for example, VM reboot, live migration or memory preserving updates for host).
- Virtual machine is running on degraded host hardware that is predicted to fail soon.
- Virtual machine was running on a host that suffered a hardware failure.
- User-initiated maintenance (for example, a user restarts or redeploys a VM).
- Spot VM and Spot scale set instance evictions.
The Basics
Metadata Service exposes information about running VMs by using a REST endpoint that's accessible from within the VM. The information is available via a nonroutable IP so that it's not exposed outside the VM.
Scope
Scheduled events are delivered to and can be acknowledged by:
- Standalone Virtual Machines.
- All the VMs in an Azure cloud service (classic).
- All the VMs in an availability set.
- All the VMs in a scale set placement group.
Note
Scheduled Events for all virtual machines (VMs) in a Fabric Controller (FC) tenant are delivered to all VMs in a FC tenant. FC tenant equates to a standalone VM, an entire Cloud Service, an entire Availability Set, and a Placement Group for a VM Scale Set (VMSS) regardless of Availability Zone usage. For example, if you have 100 VMs in a availability set and there's an update to one of them, the scheduled event will go to all 100, whereas if there are 100 single VMs in a zone, then event will only go to the VM which is getting impacted.
As a result, check the Resources
field in the event to identify which VMs are affected.
Endpoint discovery
For VNET enabled VMs, Metadata Service is available from a static nonroutable IP, 169.254.169.254
. The full endpoint for the latest version of Scheduled Events is:
http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01
If the VM isn't created within a Virtual Network, the default cases for cloud services and classic VMs, extra logic is required to discover the IP address to use. To learn how to discover the host endpoint, see this sample.
Version and Region Availability
The Scheduled Events service is versioned. Versions are mandatory; the current version is 2020-07-01
.
Version | Release Type | Regions | Release Notes |
---|---|---|---|
2020-07-01 | General Availability | All | |
2019-08-01 | General Availability | All | |
2019-04-01 | General Availability | All | |
2019-01-01 | General Availability | All | |
2017-11-01 | General Availability | All | |
2017-08-01 | General Availability | All | |
2017-03-01 | Preview | All |
Note
Previous preview releases of Scheduled Events supported {latest} as the api-version. This format is no longer supported and will be deprecated in the future.
Enabling and Disabling Scheduled Events
Scheduled Events are enabled for your service the first time you make a request for events. You should expect a delayed response in your first call of up to two minutes. Scheduled Events are disabled for your service if it doesn't make a request for 24 hours.
Scheduled events are disabled by default for VMSS Guest OS upgrades or reimages. To enable scheduled events for these operations, first enable them using OSImageNotificationProfile.
User-initiated Maintenance
User-initiated VM maintenance via the Azure portal, API, CLI, or PowerShell results in a scheduled event. You then can test the maintenance preparation logic in your application, and your application can prepare for user-initiated maintenance.
If you restart a VM, an event with the type Reboot
is scheduled. If you redeploy a VM, an event with the type Redeploy
is scheduled. Typically events with a user event source can be immediately approved to avoid a delay on user-initiated actions. We advise having a primary and secondary VM communicating and approving user generated scheduled events in case the primary VM becomes unresponsive. This arrangement will prevent delays in recovering your application back to a good state.
Use the API
Headers
When you query Metadata Service, you must provide the header Metadata:true
to ensure the request wasn't unintentionally redirected. The Metadata:true
header is required for all scheduled events requests. Failure to include the header in the request results in a "Bad Request" response from Metadata Service.
Query for events
You can query for scheduled events by making the following call:
Bash sample
curl -H Metadata:true http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01
Python sample
import json
import requests
metadata_url ="http://169.254.169.254/metadata/scheduledevents"
header = {'Metadata' : 'true'}
query_params = {'api-version':'2020-07-01'}
def get_scheduled_events():
resp = requests.get(metadata_url, headers = header, params = query_params)
data = resp.json()
return data
A response contains an array of scheduled events. An empty array means that currently no events are scheduled. In the case where there are scheduled events, the response contains an array of events.
{
"DocumentIncarnation": {IncarnationID},
"Events": [
{
"EventId": {eventID},
"EventType": "Reboot" | "Redeploy" | "Freeze" | "Preempt" | "Terminate",
"ResourceType": "VirtualMachine",
"Resources": [{resourceName}],
"EventStatus": "Scheduled" | "Started",
"NotBefore": {timeInUTC},
"Description": {eventDescription},
"EventSource" : "Platform" | "User",
"DurationInSeconds" : {timeInSeconds},
}
]
}
Event Properties
Property | Description |
---|---|
Document Incarnation | Integer that increases when the events array changes. Documents with the same incarnation contain the same event information, and the incarnation will be incremented when an event changes. |
EventId | Globally unique identifier for this event. Example:
|
EventType | Impact this event causes. Values:
|
ResourceType | Type of resource this event affects. Values:
|
Resources | List of resources this event affects. Example:
|
EventStatus | Status of this event. Values:
Completed or similar status is ever provided. The event is no longer returned when the event is finished. |
NotBefore | Time after which this event can start. The event is guaranteed to not start before this time. Will be blank if the event has already started Example:
|
Description | Description of this event. Example:
|
EventSource | Initiator of the event. Example:
|
DurationInSeconds | The expected duration of the interruption caused by the event. Example:
|
Event Scheduling
Each event is scheduled a minimum amount of time in the future based on the event type. This time is reflected in an event's NotBefore
property.
EventType | Minimum notice |
---|---|
Freeze | 15 minutes |
Reboot | 15 minutes |
Redeploy | 10 minutes |
Terminate | User Configurable: 5 to 15 minutes |
Once an event is scheduled it will move into the started state after it is either approved or the not before time passes. However in rare cases the operation will be cancelled by Azure before it starts. In that case the event will be removed from the Events array and the impact will not occur as previously scheduled.
Note
In some cases, Azure is able to predict host failure due to degraded hardware and will attempt to mitigate disruption to your service by scheduling a migration. Affected virtual machines will receive a scheduled event with a NotBefore
that is typically a few days in the future. The actual time varies depending on the predicted failure risk assessment. Azure tries to give 7 days' advance notice when possible, but the actual time varies and might be smaller if the prediction is that there's a high chance of the hardware failing imminently. To minimize risk to your service in case the hardware fails before the system-initiated migration, we recommend that you self-redeploy your virtual machine as soon as possible.
Note
In the case the host node experiences a hardware failure Azure will bypass the minimum notice period an immediately begin the recovery process for affected virtual machines. This reduces recovery time in the case that the affected VMs are unable to respond. During the recovery process an event will be created for all impacted VMs with EventType = Reboot
and EventStatus = Started
.
Polling frequency
You can poll the endpoint for updates as frequently or infrequently as you like. However, the longer the time between requests, the more time you potentially lose to react to an upcoming event. Most events have 5 to 15 minutes of advance notice, although in some cases advance notice might be as little as 30 seconds. To ensure that you have as much time as possible to take mitigating actions, we recommend that you poll the service once per second.
Start an event
After you learn of an upcoming event and finish your logic for graceful shutdown, you can approve the outstanding event by making a POST
call to Metadata Service with EventId
. This call indicates to Azure that it can shorten the minimum notification time (when possible). The event may not start immediately upon approval, in some cases Azure will require the approval of all the VMs hosted on the node before proceeding with the event.
The following JSON sample is expected in the POST
request body. The request should contain a list of StartRequests
. Each StartRequest
contains EventId
for the event you want to expedite:
{
"StartRequests" : [
{
"EventId": {EventId}
}
]
}
The service will always return a 200 success code for a valid event ID, even if it was already approved by a different VM. A 400 error code indicates that the request header or payload was malformed.
Bash sample
curl -H Metadata:true -X POST -d '{"StartRequests": [{"EventId": "f020ba2e-3bc0-4c40-a10b-86575a9eabd5"}]}' http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01
Python sample
import json
import requests
def confirm_scheduled_event(event_id):
# This payload confirms a single event with id event_id
payload = json.dumps({"StartRequests": [{"EventId": event_id }]})
response = requests.post("http://169.254.169.254/metadata/scheduledevents",
headers = {'Metadata' : 'true'},
params = {'api-version':'2020-07-01'},
data = payload)
return response.status_code
Note
Acknowledging an event allows the event to proceed for all Resources
in the event, not just the VM that acknowledges the event. Therefore, you can choose to elect a leader to coordinate the acknowledgement, which might be as simple as the first machine in the Resources
field.
Example Responses
The following response is an example of a series of events that were seen by two VMs that were live migrated to another node.
The DocumentIncarnation
is changing every time there's new information in Events
. An approval of the event would allow the freeze to proceed for both WestNO_0 and WestNO_1. The DurationInSeconds
of -1 indicates that the platform doesn't know how long the operation will take.
{
"DocumentIncarnation": 1,
"Events": [
]
}
{
"DocumentIncarnation": 2,
"Events": [
{
"EventId": "C7061BAC-AFDC-4513-B24B-AA5F13A16123",
"EventStatus": "Scheduled",
"EventType": "Freeze",
"ResourceType": "VirtualMachine",
"Resources": [
"WestNO_0",
"WestNO_1"
],
"NotBefore": "Mon, 11 Apr 2022 22:26:58 GMT",
"Description": "Virtual machine is being paused because of a memory-preserving Live Migration operation.",
"EventSource": "Platform",
"DurationInSeconds": 5
}
]
}
{
"DocumentIncarnation": 3,
"Events": [
{
"EventId": "C7061BAC-AFDC-4513-B24B-AA5F13A16123",
"EventStatus": "Started",
"EventType": "Freeze",
"ResourceType": "VirtualMachine",
"Resources": [
"WestNO_0",
"WestNO_1"
],
"NotBefore": "",
"Description": "Virtual machine is being paused because of a memory-preserving Live Migration operation.",
"EventSource": "Platform",
"DurationInSeconds": 5
}
]
}
{
"DocumentIncarnation": 4,
"Events": [
]
}
Python Sample
The following sample queries Metadata Service for scheduled events and approves each outstanding event:
#!/usr/bin/python
import json
import requests
from time import sleep
# The URL to access the metadata service
metadata_url ="http://169.254.169.254/metadata/scheduledevents"
# This must be sent otherwise the request will be ignored
header = {'Metadata' : 'true'}
# Current version of the API
query_params = {'api-version':'2020-07-01'}
def get_scheduled_events():
resp = requests.get(metadata_url, headers = header, params = query_params)
data = resp.json()
return data
def confirm_scheduled_event(event_id):
# This payload confirms a single event with id event_id
# You can confirm multiple events in a single request if needed
payload = json.dumps({"StartRequests": [{"EventId": event_id }]})
response = requests.post(metadata_url,
headers= header,
params = query_params,
data = payload)
return response.status_code
def log(event):
# This is an optional placeholder for logging events to your system
print(event["Description"])
return
def advanced_sample(last_document_incarnation):
# Poll every second to see if there are new scheduled events to process
# Since some events may have necessarily short warning periods, it is
# recommended to poll frequently
found_document_incarnation = last_document_incarnation
while (last_document_incarnation == found_document_incarnation):
sleep(1)
payload = get_scheduled_events()
found_document_incarnation = payload["DocumentIncarnation"]
# We recommend processing all events in a document together,
# even if you won't be actioning on them right away
for event in payload["Events"]:
# Events that have already started, logged for tracking
if (event["EventStatus"] == "Started"):
log(event)
# Approve all user initiated events. These are typically created by an
# administrator and approving them immediately can help to avoid delays
# in admin actions
elif (event["EventSource"] == "User"):
confirm_scheduled_event(event["EventId"])
# For this application, freeze events less that 9 seconds are considered
# no impact. This will immediately approve them
elif (event["EventType"] == "Freeze" and
int(event["DurationInSeconds"]) >= 0 and
int(event["DurationInSeconds"]) < 9):
confirm_scheduled_event(event["EventId"])
# Events that may be impactful (for example, Reboot or redeploy) may need custom
# handling for your application
else:
#TODO Custom handling for impactful events
log(event)
print("Processed events from document: " + str(found_document_incarnation))
return found_document_incarnation
def main():
# This will track the last set of events seen
last_document_incarnation = "-1"
input_text = "\
Press 1 to poll for new events \n\
Press 2 to exit \n "
program_exit = False
while program_exit == False:
user_input = input(input_text)
if (user_input == "1"):
last_document_incarnation = advanced_sample(last_document_incarnation)
elif (user_input == "2"):
program_exit = True
if __name__ == '__main__':
main()
Next steps
- Review the Scheduled Events code samples in the Azure Instance Metadata Scheduled Events GitHub repository.
- Review the Node.js Scheduled Events code samples in Azure Samples GitHub repository.
- Read more about the APIs that are available in the Instance Metadata Service.
- Learn about planned maintenance for Linux virtual machines in Azure.
- Learn how to log scheduled events by using Azure Event Hubs in the Azure Samples GitHub repository.
Feedback
Submit and view feedback for