國定假日 \(英文\)

2024-09-03

來自 PyPI 假日套件和 Wikipedia 的全球國定假日資料，涵蓋 1970 年至 2099 年的 38 個國家或地區。

每個資料列都會載明假日資訊，指出特定日期、國家或地區，以及多數人是否具有帶薪休假。

注意

Microsoft 依「現況」提供 Azure 開放資料集。針對　貴用戶對資料集的使用方式，Microsoft 不提供任何明示或默示的擔保、保證或條件。在　貴用戶當地法律允許的範圍內，針對因使用資料集而導致的任何直接性、衍生性、特殊性、間接性、附隨性或懲罰性損害或損失，Microsoft 概不承擔任何責任。

此資料集是根據 Microsoft 接收來源資料的原始條款所提供。資料集可能包含源自 Microsoft 的資料。

資料量與保留期

此資料集以 Parquet 格式儲存，此快照集包含 1970 年 1 月 1 日到 2099 年 1 月 1 日的假日資訊。資料大小約為 500KB。

儲存位置

此資料集儲存於美國東部 Azure 區域。我們建議您在美國東部配置計算資源，以確保同質性。

其他資訊

此資料集合併的資料來源是 Wikipedia (WikiMedia Foundation Inc) 及 PyPI 假日套件。

Wikipedia：原始來源、原始授權
PyPl 假日：原始來源、原始授權

提供的合併資料集由 Creative Commons Attribution-ShareAlike 3.0 Unported License 所規範。

如果您對資料來源有任何疑問，請傳送電子郵件至 aod@microsoft.com。

資料行

名稱	資料類型	唯一	Values (sample)	描述
countryOrRegion	字串	38	Sweden Norway	國家或地區完整名稱。
countryRegionCode	字串	35	SE NO	國碼/區域碼的格式請參閱這裡。
date	timestamp	20,665	2074-01-01 00:00:00 2025-12-25 00:00:00	假日的日期。
holidayName	字串	483	Søndag Söndag	假日的全名。
isPaidTimeOff	boolean	3	True	指出多數人在此日期是否具有帶薪休假 (目前僅適用於美國、英國和印度)。如果為 Null，則表示不明。
normalizeHolidayName	字串	4:38	Søndag Söndag	假日的正規化名稱。

預覽

countryOrRegion	holidayName	normalizeHolidayName	countryRegionCode	date
挪威	Søndag	Søndag	[否]	12/28/2098 12:00:00 AM
瑞典	Söndag	Söndag	SE	12/28/2098 12:00:00 AM
澳洲	Boxing Day	Boxing Day	AU	12/26/2098 12:00:00 AM
匈牙利	Karácsony másnapja	Karácsony másnapja	匈牙利	12/26/2098 12:00:00 AM
奧地利	Stefanitag	Stefanitag	AT	12/26/2098 12:00:00 AM
Canada	Boxing Day	Boxing Day	CA	12/26/2098 12:00:00 AM
克羅埃西亞	Sveti Stjepan	Sveti Stjepan	HR	12/26/2098 12:00:00 AM
捷克文	2. svátek vánoční	2. svátek vánoční	捷克	12/26/2098 12:00:00 AM

資料存取

Azure Notebooks

# This is a package in preview.
from azureml.opendatasets import PublicHolidays

from datetime import datetime
from dateutil import parser
from dateutil.relativedelta import relativedelta


end_date = datetime.today()
start_date = datetime.today() - relativedelta(months=1)
hol = PublicHolidays(start_date=start_date, end_date=end_date)
hol_df = hol.to_pandas_dataframe()

hol_df.info()

# Pip install packages
import os, sys

!{sys.executable} -m pip install azure-storage-blob
!{sys.executable} -m pip install pyarrow
!{sys.executable} -m pip install pandas

# Azure storage access info
azure_storage_account_name = "azureopendatastorage"
azure_storage_sas_token = r""
container_name = "holidaydatacontainer"
folder_name = "Processed"

from azure.storage.blob import BlockBlobServicefrom azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

if azure_storage_account_name is None or azure_storage_sas_token is None:
    raise Exception(
        "Provide your specific name and key for your Azure Storage account--see the Prerequisites section earlier.")

print('Looking for the first parquet under the folder ' +
      folder_name + ' in container "' + container_name + '"...')
container_url = f"https://{azure_storage_account_name}.blob.core.windows.net/"
blob_service_client = BlobServiceClient(
    container_url, azure_storage_sas_token if azure_storage_sas_token else None)

container_client = blob_service_client.get_container_client(container_name)
blobs = container_client.list_blobs(folder_name)
sorted_blobs = sorted(list(blobs), key=lambda e: e.name, reverse=True)
targetBlobName = ''
for blob in sorted_blobs:
    if blob.name.startswith(folder_name) and blob.name.endswith('.parquet'):
        targetBlobName = blob.name
        break

print('Target blob to download: ' + targetBlobName)
_, filename = os.path.split(targetBlobName)
blob_client = container_client.get_blob_client(targetBlobName)
with open(filename, 'wb') as local_file:
    blob_client.download_blob().download_to_stream(local_file)

# Read the parquet file into Pandas data frame
import pandas as pd

print('Reading the parquet file into Pandas data frame')
df = pd.read_parquet(filename)

# you can add your filter at below
print('Loaded as a Pandas data frame: ')
df

Azure Databricks

# This is a package in preview.
# You need to pip install azureml-opendatasets in Databricks cluster. https://learn.microsoft.com/azure/data-explorer/connect-from-databricks#install-the-python-library-on-your-azure-databricks-cluster
from azureml.opendatasets import PublicHolidays

from datetime import datetime
from dateutil import parser
from dateutil.relativedelta import relativedelta


end_date = datetime.today()
start_date = datetime.today() - relativedelta(months=1)
hol = PublicHolidays(start_date=start_date, end_date=end_date)
hol_df = hol.to_spark_dataframe()

display(hol_df.limit(5))

# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "holidaydatacontainer"
blob_relative_path = "Processed"
blob_sas_token = r""

# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)

# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')

# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

Azure Synapse

# This is a package in preview.
from azureml.opendatasets import PublicHolidays

from datetime import datetime
from dateutil import parser
from dateutil.relativedelta import relativedelta


end_date = datetime.today()
start_date = datetime.today() - relativedelta(months=1)
hol = PublicHolidays(start_date=start_date, end_date=end_date)
hol_df = hol.to_spark_dataframe()

# Display top 5 rows
display(hol_df.limit(5))

# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "holidaydatacontainer"
blob_relative_path = "Processed"
blob_sas_token = r""

# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)

# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')

# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

下一步

檢視開放資料集目錄中的其餘資料集。

共用方式為