미국 우편 번호별 미국 인구

2024-09-03

2000년 및 2010년의 10년 단위 인구 조사에서 제공된 각 미국 우편 번호에 대한 성별 및 인종별 미국 인구입니다.

이 데이터 세트는 미국 인구 조사국의 10년 단위 인구 조사 데이터 세트 API에서 제공됩니다. 이 데이터 세트 사용과 관련된 사용 약관은 서비스 약관 및 정책 및 고지 사항을 검토하세요.

참고 항목

Microsoft는 Azure Open Datasets를 “있는 그대로” 제공합니다. Microsoft는 귀하의 데이터 세트 사용과 관련하여 어떠한 명시적이거나 묵시적인 보증, 보장 또는 조건을 제공하지 않습니다. 귀하가 거주하는 지역의 법규가 허용하는 범위 내에서 Microsoft는 귀하의 데이터 세트 사용으로 인해 발생하는 일체의 직접적, 결과적, 특별, 간접적, 부수적 또는 징벌적 손해 또는 손실을 비롯한 모든 손해 또는 손실에 대한 모든 책임을 부인합니다.

이 데이터 세트는 Microsoft가 원본 데이터를 받은 원래 사용 약관에 따라 제공됩니다. 데이터 세트에는 Microsoft가 제공한 데이터가 포함될 수 있습니다.

볼륨 및 보존

이 데이터 세트는 Parquet 형식으로 저장되며 2010년에 대한 데이터를 포함합니다.

스토리지 위치

이 데이터 세트는 미국 동부 Azure 지역에 저장됩니다. 선호도를 위해 미국 동부에 컴퓨팅 리소스를 할당하는 것이 좋습니다.

자치주별 미국 인구

열

이름	데이터 형식	고유한	값(샘플)	설명
decennialTime	string	1	2010	10년 단위 인구 조사가 수행된 시간(예: 2010년, 2000년)입니다.
maxAge	int	23	54 21	연령 범위의 최댓값입니다. null일 경우 모든 연령을 포함하거나 연령 범위에 상한값이 없습니다(예: >85세 연령).
minAge	int	23	45 30	연령 범위의 최솟값입니다. null일 경우 모든 연령을 포함합니다.
population	int	29,274	1 2	이 부문의 인구입니다.
추적	string	8	SOME OTHER RACE ALONE BLACK OR AFRICAN AMERICAN ALONE	인구 조사 데이터의 인종 범주입니다. null일 경우 모든 인종을 포함합니다.
sex	string	3	Female Male	남성 또는 여성입니다. null일 경우 두 성별을 모두 포함합니다.
연도	int	1	2010	10년 단위 시간의 연도(정수)입니다.
zipCode	string	33,120	39218 87420	5자리 ZIP Code Tabulation Area(ZCTA5)입니다.

미리 보기를

decennialTime	zipCode	population	추적	sex	minAge	maxAge	연도
2010	77477	265	WHITE ALONE	여성	15	17	2010
2010	77477	107	SOME OTHER RACE ALONE	여성	15	17	2010
2010	77477	12	SOME OTHER RACE ALONE	여성	65	66	2010
2010	77477	101	ASIAN ALONE	여성	60	61	2010
2010	77477	221	ASIAN ALONE	남성	10	14	2010
2010	77478	256	WHITE ALONE	여성	15	17	2010
2010	77478	17	SOME OTHER RACE ALONE	여성	15	17	2010
2010	77478	3	SOME OTHER RACE ALONE	여성	65	66	2010

데이터 액세스

Azure Notebooks

# This is a package in preview.
from azureml.opendatasets import UsPopulationZip

population = UsPopulationZip()
population_df = population.to_pandas_dataframe()

population_df.info()

# Pip install packages
import os, sys

!{sys.executable} -m pip install azure-storage-blob
!{sys.executable} -m pip install pyarrow
!{sys.executable} -m pip install pandas

# Azure storage access info
azure_storage_account_name = "azureopendatastorage"
azure_storage_sas_token = r""
container_name = "censusdatacontainer"
folder_name = "release/us_population_zip/"

from azure.storage.blob import BlockBlobServicefrom azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

if azure_storage_account_name is None or azure_storage_sas_token is None:
    raise Exception(
        "Provide your specific name and key for your Azure Storage account--see the Prerequisites section earlier.")

print('Looking for the first parquet under the folder ' +
      folder_name + ' in container "' + container_name + '"...')
container_url = f"https://{azure_storage_account_name}.blob.core.windows.net/"
blob_service_client = BlobServiceClient(
    container_url, azure_storage_sas_token if azure_storage_sas_token else None)

container_client = blob_service_client.get_container_client(container_name)
blobs = container_client.list_blobs(folder_name)
sorted_blobs = sorted(list(blobs), key=lambda e: e.name, reverse=True)
targetBlobName = ''
for blob in sorted_blobs:
    if blob.name.startswith(folder_name) and blob.name.endswith('.parquet'):
        targetBlobName = blob.name
        break

print('Target blob to download: ' + targetBlobName)
_, filename = os.path.split(targetBlobName)
blob_client = container_client.get_blob_client(targetBlobName)
with open(filename, 'wb') as local_file:
    blob_client.download_blob().download_to_stream(local_file)

# Read the parquet file into Pandas data frame
import pandas as pd

print('Reading the parquet file into Pandas data frame')
df = pd.read_parquet(filename)

# you can add your filter at below
print('Loaded as a Pandas data frame: ')
df

Azure Databricks

# This is a package in preview.
from azureml.opendatasets import UsPopulationZip

population = UsPopulationZip()
population_df = population.to_spark_dataframe()

display(population_df.limit(5))

# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "censusdatacontainer"
blob_relative_path = "release/us_population_zip/"
blob_sas_token = r""

# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)

# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')

# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

Azure Synapse

# This is a package in preview.
from azureml.opendatasets import UsPopulationZip

population = UsPopulationZip()
population_df = population.to_spark_dataframe()

# Display top 5 rows
display(population_df.limit(5))

# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "censusdatacontainer"
blob_relative_path = "release/us_population_zip/"
blob_sas_token = r""

# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)

# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')

# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

다음 단계

Open Datasets 카탈로그에서 나머지 데이터 세트를 봅니다.