US Population by ZIP code

Article
08/28/2024

US population by gender and race for each US ZIP code sourced from 2000 and 2010 Decennial Census.

This dataset is sourced from United States Census Bureau’s Decennial Census Dataset APIs. Review Terms of Service and Policies and Notices for the terms and conditions related to the use this dataset.

Note

Microsoft provides Azure Open Datasets on an “as is” basis. Microsoft makes no warranties, express or implied, guarantees or conditions with respect to your use of the datasets. To the extent permitted under your local law, Microsoft disclaims all liability for any damages or losses, including direct, consequential, special, indirect, incidental or punitive, resulting from your use of the datasets.

This dataset is provided under the original terms that Microsoft received source data. The dataset may include data sourced from Microsoft.

Volume and retention

This dataset is stored in Parquet format and has data for the year 2010.

Storage location

This dataset is stored in the East US Azure region. Allocating compute resources in East US is recommended for affinity.

US Population by County

Columns

Name	Data type	Unique	Values (sample)	Description
decennialTime	string	1	2010	The time of the decennial census happened, for example, 2010, 2000.
maxAge	int	23	54 21	Max of the age range. If it’s null, it’s across all ages or the age range has no upper bound, for example, age > 85.
minAge	int	23	45 30	Min of the age range. If it’s null, it’s across all ages.
population	int	29,274	1 2	Population of this segment.
race	string	8	SOME OTHER RACE ALONE BLACK OR AFRICAN AMERICAN ALONE	Race category in Census data. If it’s null, it’s across all races.
sex	string	3	Female Male	Male or female. If it’s null, it’s across both sexes.
year	int	1	2010	Year (in integer) of the decennial time.
zipCode	string	33,120	39218 87420	5-Digit ZIP Code Tabulation Area (ZCTA5).

Preview

decennialTime	zipCode	population	race	sex	minAge	maxAge	year
2010	77477	265	WHITE ALONE	Female	15	17	2010
2010	77477	107	SOME OTHER RACE ALONE	Female	15	17	2010
2010	77477	12	SOME OTHER RACE ALONE	Female	65	66	2010
2010	77477	101	ASIAN ALONE	Female	60	61	2010
2010	77477	221	ASIAN ALONE	Male	10	14	2010
2010	77478	256	WHITE ALONE	Female	15	17	2010
2010	77478	17	SOME OTHER RACE ALONE	Female	15	17	2010
2010	77478	3	SOME OTHER RACE ALONE	Female	65	66	2010

Data access

Azure Notebooks

# This is a package in preview.
from azureml.opendatasets import UsPopulationZip

population = UsPopulationZip()
population_df = population.to_pandas_dataframe()

population_df.info()

# Pip install packages
import os, sys

!{sys.executable} -m pip install azure-storage-blob
!{sys.executable} -m pip install pyarrow
!{sys.executable} -m pip install pandas

# Azure storage access info
azure_storage_account_name = "azureopendatastorage"
azure_storage_sas_token = r""
container_name = "censusdatacontainer"
folder_name = "release/us_population_zip/"

from azure.storage.blob import BlockBlobServicefrom azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

if azure_storage_account_name is None or azure_storage_sas_token is None:
    raise Exception(
        "Provide your specific name and key for your Azure Storage account--see the Prerequisites section earlier.")

print('Looking for the first parquet under the folder ' +
      folder_name + ' in container "' + container_name + '"...')
container_url = f"https://{azure_storage_account_name}.blob.core.windows.net/"
blob_service_client = BlobServiceClient(
    container_url, azure_storage_sas_token if azure_storage_sas_token else None)

container_client = blob_service_client.get_container_client(container_name)
blobs = container_client.list_blobs(folder_name)
sorted_blobs = sorted(list(blobs), key=lambda e: e.name, reverse=True)
targetBlobName = ''
for blob in sorted_blobs:
    if blob.name.startswith(folder_name) and blob.name.endswith('.parquet'):
        targetBlobName = blob.name
        break

print('Target blob to download: ' + targetBlobName)
_, filename = os.path.split(targetBlobName)
blob_client = container_client.get_blob_client(targetBlobName)
with open(filename, 'wb') as local_file:
    blob_client.download_blob().download_to_stream(local_file)

# Read the parquet file into Pandas data frame
import pandas as pd

print('Reading the parquet file into Pandas data frame')
df = pd.read_parquet(filename)

# you can add your filter at below
print('Loaded as a Pandas data frame: ')
df

Azure Databricks

# This is a package in preview.
from azureml.opendatasets import UsPopulationZip

population = UsPopulationZip()
population_df = population.to_spark_dataframe()

display(population_df.limit(5))

# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "censusdatacontainer"
blob_relative_path = "release/us_population_zip/"
blob_sas_token = r""

# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)

# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')

# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

Azure Synapse

# This is a package in preview.
from azureml.opendatasets import UsPopulationZip

population = UsPopulationZip()
population_df = population.to_spark_dataframe()

# Display top 5 rows
display(population_df.limit(5))

# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "censusdatacontainer"
blob_relative_path = "release/us_population_zip/"
blob_sas_token = r""

# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)

# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')

# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

Next steps

View the rest of the datasets in the Open Datasets catalog.

Share via

US Population by ZIP code

Volume and retention

Storage location

Columns

Preview

Data access

Azure Notebooks

Azure Databricks

Azure Synapse

Next steps

Feedback

Additional resources

Share via

US Population by ZIP code

Volume and retention

Storage location

Related datasets

Columns

Preview

Data access

Azure Notebooks

Azure Databricks

Azure Synapse

Next steps

Feedback

Additional resources