美國人口數 (依郡排列)

2024-09-03

美國各郡縣十年一度 (2000 年和 2010 年) 普查的美國人口數 (依性別和種族)。

此資料集的來源是美國普查局十年一次的普查資料集 API。如需此資料集相關的使用條款及條件，請參閱服務條款及政策與聲明。

注意

Microsoft 依「現況」提供 Azure 開放資料集。針對　貴用戶對資料集的使用方式，Microsoft 不提供任何明示或默示的擔保、保證或條件。在　貴用戶當地法律允許的範圍內，針對因使用資料集而導致的任何直接性、衍生性、特殊性、間接性、附隨性或懲罰性損害或損失，Microsoft 概不承擔任何責任。

此資料集是根據 Microsoft 接收來源資料的原始條款所提供。資料集可能包含源自 Microsoft 的資料。

資料量與保留期

此資料集以 Parquet 格式儲存，並含有 2000 年和 2010 年的資料。

儲存位置

此資料集儲存於美國東部 Azure 區域。建議您在美國東部配置計算資源，以確保同質性。

美國人口數 (依郵遞區號排列)

資料行

名稱	資料類型	唯一	Values (sample)	描述
countyName	字串	1,960	Washington County Jefferson County	郡/縣名稱。
decennialTime	字串	2	2010 2000	十年一度普查年份，例如，2010 年、2000 年。
maxAge	int	23	9 66	年齡範圍最大值。如果是 null，就表示跨所有年齡，或年齡範圍沒有上限，例如，年齡 > 85。
minAge	int	23	35 67	年齡範圍最小值。如果是 null，就表示跨所有年齡。
母體	int	47,229	1 和 2	此區段的人口數。
race	字串	8	僅限亞洲人，兩個或多個種族	人口普查資料的種族類別。如果是 null，就表示跨所有種族。
sex	字串	3	Male Female	男性或女性。如果是 null，就表示跨兩種性別。
stateName	字串	52	Texas Georgia	美國州/省的名稱。
year	int	2	2010 2000	十年一度的年份 (整數)。

預覽

decennialTime	stateName	countyName	母體	race	sex	minAge	maxAge	year
2010	Texas	Crockett County	123	僅限白人	男性	5	9	2010
2010	Texas	Crockett County	1	僅限亞洲人	女性	67	69	2010
2010	Texas	Crockett County	111	僅限白人	女性	55	59	2010
2010	Texas	Crockett County	64	兩個或多個種族	null			2010
2010	Texas	Crockett County	18	null	男性	85		2010
2010	Texas	Crockett County	16	僅限美國印度裔和美國阿拉斯加本地族群	女性			2010
2010	Texas	Crockett County	7	僅限白人	男性	21	21	2010
2010	Texas	Crockett County	45	null	女性	85		2010
2010	Texas	Crockett County	0	僅限夏威夷和其他太平洋群島的本地族群	女性	67	69	2010

資料存取

Azure Notebooks

# This is a package in preview.
from azureml.opendatasets import UsPopulationCounty

population = UsPopulationCounty()
population_df = population.to_pandas_dataframe()

population_df.info()

# Pip install packages
import os, sys

!{sys.executable} -m pip install azure-storage-blob
!{sys.executable} -m pip install pyarrow
!{sys.executable} -m pip install pandas

# Azure storage access info
azure_storage_account_name = "azureopendatastorage"
azure_storage_sas_token = r""
container_name = "censusdatacontainer"
folder_name = "release/us_population_county/"

from azure.storage.blob import BlockBlobServicefrom azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

if azure_storage_account_name is None or azure_storage_sas_token is None:
    raise Exception(
        "Provide your specific name and key for your Azure Storage account--see the Prerequisites section earlier.")

print('Looking for the first parquet under the folder ' +
      folder_name + ' in container "' + container_name + '"...')
container_url = f"https://{azure_storage_account_name}.blob.core.windows.net/"
blob_service_client = BlobServiceClient(
    container_url, azure_storage_sas_token if azure_storage_sas_token else None)

container_client = blob_service_client.get_container_client(container_name)
blobs = container_client.list_blobs(folder_name)
sorted_blobs = sorted(list(blobs), key=lambda e: e.name, reverse=True)
targetBlobName = ''
for blob in sorted_blobs:
    if blob.name.startswith(folder_name) and blob.name.endswith('.parquet'):
        targetBlobName = blob.name
        break

print('Target blob to download: ' + targetBlobName)
_, filename = os.path.split(targetBlobName)
blob_client = container_client.get_blob_client(targetBlobName)
with open(filename, 'wb') as local_file:
    blob_client.download_blob().download_to_stream(local_file)

# Read the parquet file into Pandas data frame
import pandas as pd

print('Reading the parquet file into Pandas data frame')
df = pd.read_parquet(filename)

# you can add your filter at below
print('Loaded as a Pandas data frame: ')
df

Azure Databricks

# This is a package in preview.
from azureml.opendatasets import UsPopulationCounty

population = UsPopulationCounty()
population_df = population.to_spark_dataframe()

display(population_df.limit(5))

# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "censusdatacontainer"
blob_relative_path = "release/us_population_county/"
blob_sas_token = r""

# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)

# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')

# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

Azure Synapse

# This is a package in preview.
from azureml.opendatasets import UsPopulationCounty

population = UsPopulationCounty()
population_df = population.to_spark_dataframe()

# Display top 5 rows
display(population_df.limit(5))

# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "censusdatacontainer"
blob_relative_path = "release/us_population_county/"
blob_sas_token = r""

# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)

# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')

# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

下一步

檢視開放資料集目錄中的其餘資料集。

共用方式為

美國人口數 (依郡排列)

資料量與保留期

儲存位置

相關資料集

資料行

預覽​​

資料存取

Azure Notebooks

Azure Databricks

Azure Synapse

下一步

意見反應

其他資源

預覽