Public Holidays

Article
08/28/2024

Worldwide public holiday data sourced from PyPI holidays package and Wikipedia, covering 38 countries or regions from 1970 to 2099.

Each row indicates the holiday info for a specific date, country or region, and whether most people have paid time off.

Note

Microsoft provides Azure Open Datasets on an “as is” basis. Microsoft makes no warranties, express or implied, guarantees or conditions with respect to your use of the datasets. To the extent permitted under your local law, Microsoft disclaims all liability for any damages or losses, including direct, consequential, special, indirect, incidental or punitive, resulting from your use of the datasets.

This dataset is provided under the original terms that Microsoft received source data. The dataset may include data sourced from Microsoft.

Volume and retention

This dataset is stored in Parquet format. It's a snapshot with holiday information from January 1, 1970 to January 1, 2099. The data size is about 500KB.

Storage location

This dataset is stored in the East US Azure region. We recommend locating compute resources in East US for affinity.

Additional information

This dataset combines data sourced from Wikipedia (WikiMedia Foundation Inc) and PyPI holidays package.

Wikipedia: original source, original license
PyPI holidays: original source, original license

The combined dataset is provided under the Creative Commons Attribution-ShareAlike 3.0 Unported License.

Email aod@microsoft.com if you have any questions about the data source.

Columns

Name	Data type	Unique	Values (sample)	Description
countryOrRegion	string	38	Sweden Norway	Country or region full name.
countryRegionCode	string	35	SE NO	Country or region code following the format here.
date	timestamp	20,665	2074-01-01 00:00:00 2025-12-25 00:00:00	Date of the holiday.
holidayName	string	483	Søndag Söndag	Full name of the holiday.
isPaidTimeOff	boolean	3	True	Indicate whether most people have paid time off on this date (only available for US, GB, and India now). If it is NULL, it means unknown.
normalizeHolidayName	string	438	Søndag Söndag	Normalized name of the holiday.

Preview

countryOrRegion	holidayName	normalizeHolidayName	countryRegionCode	date
Norway	Søndag	Søndag	NO	12/28/2098 12:00:00 AM
Sweden	Söndag	Söndag	SE	12/28/2098 12:00:00 AM
Australia	Boxing Day	Boxing Day	AU	12/26/2098 12:00:00 AM
Hungary	Karácsony másnapja	Karácsony másnapja	HU	12/26/2098 12:00:00 AM
Austria	Stefanitag	Stefanitag	AT	12/26/2098 12:00:00 AM
Canada	Boxing Day	Boxing Day	CA	12/26/2098 12:00:00 AM
Croatia	Sveti Stjepan	Sveti Stjepan	HR	12/26/2098 12:00:00 AM
Czech	2. svátek vánoční	2. svátek vánoční	CZ	12/26/2098 12:00:00 AM

Data access

Azure Notebooks

# This is a package in preview.
from azureml.opendatasets import PublicHolidays

from datetime import datetime
from dateutil import parser
from dateutil.relativedelta import relativedelta


end_date = datetime.today()
start_date = datetime.today() - relativedelta(months=1)
hol = PublicHolidays(start_date=start_date, end_date=end_date)
hol_df = hol.to_pandas_dataframe()

hol_df.info()

# Pip install packages
import os, sys

!{sys.executable} -m pip install azure-storage-blob
!{sys.executable} -m pip install pyarrow
!{sys.executable} -m pip install pandas

# Azure storage access info
azure_storage_account_name = "azureopendatastorage"
azure_storage_sas_token = r""
container_name = "holidaydatacontainer"
folder_name = "Processed"

from azure.storage.blob import BlockBlobServicefrom azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

if azure_storage_account_name is None or azure_storage_sas_token is None:
    raise Exception(
        "Provide your specific name and key for your Azure Storage account--see the Prerequisites section earlier.")

print('Looking for the first parquet under the folder ' +
      folder_name + ' in container "' + container_name + '"...')
container_url = f"https://{azure_storage_account_name}.blob.core.windows.net/"
blob_service_client = BlobServiceClient(
    container_url, azure_storage_sas_token if azure_storage_sas_token else None)

container_client = blob_service_client.get_container_client(container_name)
blobs = container_client.list_blobs(folder_name)
sorted_blobs = sorted(list(blobs), key=lambda e: e.name, reverse=True)
targetBlobName = ''
for blob in sorted_blobs:
    if blob.name.startswith(folder_name) and blob.name.endswith('.parquet'):
        targetBlobName = blob.name
        break

print('Target blob to download: ' + targetBlobName)
_, filename = os.path.split(targetBlobName)
blob_client = container_client.get_blob_client(targetBlobName)
with open(filename, 'wb') as local_file:
    blob_client.download_blob().download_to_stream(local_file)

# Read the parquet file into Pandas data frame
import pandas as pd

print('Reading the parquet file into Pandas data frame')
df = pd.read_parquet(filename)

# you can add your filter at below
print('Loaded as a Pandas data frame: ')
df

Azure Databricks

# This is a package in preview.
# You need to pip install azureml-opendatasets in Databricks cluster. https://learn.microsoft.com/azure/data-explorer/connect-from-databricks#install-the-python-library-on-your-azure-databricks-cluster
from azureml.opendatasets import PublicHolidays

from datetime import datetime
from dateutil import parser
from dateutil.relativedelta import relativedelta


end_date = datetime.today()
start_date = datetime.today() - relativedelta(months=1)
hol = PublicHolidays(start_date=start_date, end_date=end_date)
hol_df = hol.to_spark_dataframe()

display(hol_df.limit(5))

# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "holidaydatacontainer"
blob_relative_path = "Processed"
blob_sas_token = r""

# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)

# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')

# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

Azure Synapse

# This is a package in preview.
from azureml.opendatasets import PublicHolidays

from datetime import datetime
from dateutil import parser
from dateutil.relativedelta import relativedelta


end_date = datetime.today()
start_date = datetime.today() - relativedelta(months=1)
hol = PublicHolidays(start_date=start_date, end_date=end_date)
hol_df = hol.to_spark_dataframe()

# Display top 5 rows
display(hol_df.limit(5))

# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "holidaydatacontainer"
blob_relative_path = "Processed"
blob_sas_token = r""

# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)

# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')

# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

Next steps

View the rest of the datasets in the Open Datasets catalog.

Additional resources

Documentation

OJ Sales Simulated - Azure Open Datasets

Learn how to use the OJ Sales Simulated dataset in Azure Open Datasets.
What are open datasets? Curated public datasets - Azure Open Datasets

Learn about Azure Open Datasets, curated datasets from the public domain such as weather, census, holidays, and location to enrich predictive solutions.

Training

Learning path

Run high-performance computing (HPC) applications on Azure - Training

Azure HPC is a purpose-built cloud capability for HPC & AI workload, using leading-edge processors and HPC-class InfiniBand interconnect, to deliver the best application performance, scalability, and value. Azure HPC enables users to unlock innovation, productivity, and business agility, through a highly available range of HPC & AI technologies that can be dynamically allocated as your business and technical needs change. This learning path is a series of modules that help you get started on Azure HPC - you

AI Skills Fest

Share via