你当前正在访问 Microsoft Azure Global Edition 技术文档网站。如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站，请访问 https://docs.azure.cn。

按邮政编码划分的美国人口

2024-09-03

与每个美国邮政编码对应的按性别和种族划分的美国人口（源自 2000 年和 2010 年的十年一次的人口普查）。

此数据集来源于美国人口普查局的十年人口普查数据集 API。要了解与使用此数据集相关的条款和条件，请查看服务条款和政策与声明。

注意

Microsoft 按“原样”提供 Azure 开放数据集。 Microsoft 对数据集的使用不提供任何担保（明示或暗示）、保证或条件。在当地法律允许的范围内，Microsoft 对使用数据集而导致的任何损害或损失不承担任何责任，包括直接、必然、特殊、间接、偶发或惩罚性损害或损失。

此数据集是根据 Microsoft 接收源数据的原始条款提供的。数据集可能包含来自 Microsoft 的数据。

数量和保留期

此数据集以 Parquet 格式存储，包含 2010 年的数据。

存储位置

此数据集存储在美国东部 Azure 区域。建议将计算资源分配到美国东部地区，以实现相关性。

按县划分的美国人口

列

名称	数据类型	唯一	值（示例）	说明
decennialTime	字符串	1	2010	人口普查发生的时间（每十年一次），例如 2010、2000。
maxAge	int	23	54 21	年龄范围的最大值。如果为 Null，则为所有年龄，或者年龄范围没有上限，例如年龄 > 85。
minAge	int	23	45 30	年龄范围的最小值。如果为 Null，则为所有年龄。
填充 (population)	int	29,274	1 2	此段的人口。
race	字符串	8	仅其他某个种族仅黑人或非裔美国人	人口普查数据中的人种类别。如果为 Null，则为所有人种。
sex	字符串	3	女男	男性或女性。如果为 Null，则性别不限。
year	int	1	2010	年份（以十年为单位，整数）。
zipCode	字符串	33,120	39218 87420	5 位邮政编码制表区域 (ZCTA5)。

预览

decennialTime	zipCode	填充 (population)	race	sex	minAge	maxAge	year
2010	77477	265	仅白人	女	15	17	2010
2010	77477	107	仅其他某个种族	女	15	17	2010
2010	77477	12	仅其他某个种族	女	65	66	2010
2010	77477	101	仅亚洲人	女	60	61	2010
2010	77477	221	仅亚洲人	男	10	14	2010
2010	77478	256	仅白人	女	15	17	2010
2010	77478	17	仅其他某个种族	女	15	17	2010
2010	77478	3	仅其他某个种族	女	65	66	2010

数据访问

Azure Notebooks

# This is a package in preview.
from azureml.opendatasets import UsPopulationZip

population = UsPopulationZip()
population_df = population.to_pandas_dataframe()

population_df.info()

# Pip install packages
import os, sys

!{sys.executable} -m pip install azure-storage-blob
!{sys.executable} -m pip install pyarrow
!{sys.executable} -m pip install pandas

# Azure storage access info
azure_storage_account_name = "azureopendatastorage"
azure_storage_sas_token = r""
container_name = "censusdatacontainer"
folder_name = "release/us_population_zip/"

from azure.storage.blob import BlockBlobServicefrom azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

if azure_storage_account_name is None or azure_storage_sas_token is None:
    raise Exception(
        "Provide your specific name and key for your Azure Storage account--see the Prerequisites section earlier.")

print('Looking for the first parquet under the folder ' +
      folder_name + ' in container "' + container_name + '"...')
container_url = f"https://{azure_storage_account_name}.blob.core.windows.net/"
blob_service_client = BlobServiceClient(
    container_url, azure_storage_sas_token if azure_storage_sas_token else None)

container_client = blob_service_client.get_container_client(container_name)
blobs = container_client.list_blobs(folder_name)
sorted_blobs = sorted(list(blobs), key=lambda e: e.name, reverse=True)
targetBlobName = ''
for blob in sorted_blobs:
    if blob.name.startswith(folder_name) and blob.name.endswith('.parquet'):
        targetBlobName = blob.name
        break

print('Target blob to download: ' + targetBlobName)
_, filename = os.path.split(targetBlobName)
blob_client = container_client.get_blob_client(targetBlobName)
with open(filename, 'wb') as local_file:
    blob_client.download_blob().download_to_stream(local_file)

# Read the parquet file into Pandas data frame
import pandas as pd

print('Reading the parquet file into Pandas data frame')
df = pd.read_parquet(filename)

# you can add your filter at below
print('Loaded as a Pandas data frame: ')
df

Azure Databricks

# This is a package in preview.
from azureml.opendatasets import UsPopulationZip

population = UsPopulationZip()
population_df = population.to_spark_dataframe()

display(population_df.limit(5))

# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "censusdatacontainer"
blob_relative_path = "release/us_population_zip/"
blob_sas_token = r""

# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)

# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')

# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

Azure Synapse

# This is a package in preview.
from azureml.opendatasets import UsPopulationZip

population = UsPopulationZip()
population_df = population.to_spark_dataframe()

# Display top 5 rows
display(population_df.limit(5))

# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "censusdatacontainer"
blob_relative_path = "release/us_population_zip/"
blob_sas_token = r""

# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)

# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')

# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

后续步骤

查看开放数据集目录中的其余数据集。