US Population by County (米国の郡別人口)

[アーティクル]
09/03/2024

2000 年および 2010 年に実施された 10 年ごとの国勢調査をソースとする、米国の郡ごとの性別および人種別の米国人口。

このデータセットは、米国国勢調査局の Decennial Census Dataset API から提供されます。このデータセットの使用に関する諸条件については、「Terms of Service (サービス利用規約)」および「Policies and Notices (ポリシーと通知)」をご覧ください。

Note

Microsoft は、Azure Open Datasets を "現状有姿" で提供します。 Microsoft は、データセットの使用に関して、明示または黙示を問わず、いかなる保証も行わないものとし、条件を定めることもありません。現地の法律の下で認められている範囲内で、Microsoft は、データセットの使用に起因する、直接的、派生的、特別、間接的、偶発的、または懲罰的なものを含めたいかなる損害または損失に対しても一切の責任を負わないものとします。

このデータセットは、Microsoft がソースデータを受け取った元の条件に基づいて提供されます。データセットには、Microsoft が提供するデータが含まれている場合があります。

ボリュームとリテンション期間

このデータセットは Parquet 形式で保存されており、2000 年および 2010 年のデータが含まれています。

保存先

このデータセットは、米国東部 Azure リージョンに保存されています。アフィニティのために、米国東部でコンピューティングリソースを割り当てることをお勧めします。

US Population by ZIP Code (米国の郵便番号別人口)

[列]

Name	データ型	一意	値 (サンプル)	説明
countyName	string	1,960	Washington County Jefferson County	郡名。
decennialTime	string	2	2010 2000	10 年ごとの国勢調査が行われた時期 (例: 2010、2000)。
maxAge	INT	23	9 66	年齢範囲の最大値。 null の場合、すべての年齢が対象となるか、年齢範囲に上限がなくなります (> 85 歳など)。
minAge	INT	23	35 67	年齢範囲の最小値。 null の場合、すべての年齢にわたります。
作成 (population)	INT	47,229	1 2	このセグメントの人口。
race	string	8	ASIAN ALONE TWO OR MORE RACES	国勢調査データの人種カテゴリ。 null の場合、すべての人種にわたります。
sex	string	3	Male Female	男性または女性。 null の場合、男女両方にわたります。
stateName	string	52	Texas Georgia	米国の州の名前。
year	INT	2	2010 2000	10 年ごとの年 (整数)。

プレビュー

decennialTime	stateName	countyName	作成 (population)	race	sex	minAge	maxAge	year
2010	テキサス	Crockett County	123	WHITE ALONE	Male	5	9	2010
2010	テキサス	Crockett County	1	ASIAN ALONE	Female	67	69	2010
2010	テキサス	Crockett County	111	WHITE ALONE	Female	55	59	2010
2010	テキサス	Crockett County	64	TWO OR MORE RACES	null			2010
2010	テキサス	Crockett County	18	null	Male	85		2010
2010	テキサス	Crockett County	16	AMERICAN INDIAN AND ALASKA NATIVE ALONE	Female			2010
2010	テキサス	Crockett County	7	WHITE ALONE	Male	21	21	2010
2010	テキサス	Crockett County	45	null	Female	85		2010
2010	テキサス	Crockett County	0	NATIVE HAWAIIAN AND OTHER PACIFIC ISLANDER ALONE	Female	67	69	2010

データアクセス

Azure Notebooks

# This is a package in preview.
from azureml.opendatasets import UsPopulationCounty

population = UsPopulationCounty()
population_df = population.to_pandas_dataframe()

population_df.info()

# Pip install packages
import os, sys

!{sys.executable} -m pip install azure-storage-blob
!{sys.executable} -m pip install pyarrow
!{sys.executable} -m pip install pandas

# Azure storage access info
azure_storage_account_name = "azureopendatastorage"
azure_storage_sas_token = r""
container_name = "censusdatacontainer"
folder_name = "release/us_population_county/"

from azure.storage.blob import BlockBlobServicefrom azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

if azure_storage_account_name is None or azure_storage_sas_token is None:
    raise Exception(
        "Provide your specific name and key for your Azure Storage account--see the Prerequisites section earlier.")

print('Looking for the first parquet under the folder ' +
      folder_name + ' in container "' + container_name + '"...')
container_url = f"https://{azure_storage_account_name}.blob.core.windows.net/"
blob_service_client = BlobServiceClient(
    container_url, azure_storage_sas_token if azure_storage_sas_token else None)

container_client = blob_service_client.get_container_client(container_name)
blobs = container_client.list_blobs(folder_name)
sorted_blobs = sorted(list(blobs), key=lambda e: e.name, reverse=True)
targetBlobName = ''
for blob in sorted_blobs:
    if blob.name.startswith(folder_name) and blob.name.endswith('.parquet'):
        targetBlobName = blob.name
        break

print('Target blob to download: ' + targetBlobName)
_, filename = os.path.split(targetBlobName)
blob_client = container_client.get_blob_client(targetBlobName)
with open(filename, 'wb') as local_file:
    blob_client.download_blob().download_to_stream(local_file)

# Read the parquet file into Pandas data frame
import pandas as pd

print('Reading the parquet file into Pandas data frame')
df = pd.read_parquet(filename)

# you can add your filter at below
print('Loaded as a Pandas data frame: ')
df

Azure Databricks

# This is a package in preview.
from azureml.opendatasets import UsPopulationCounty

population = UsPopulationCounty()
population_df = population.to_spark_dataframe()

display(population_df.limit(5))

# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "censusdatacontainer"
blob_relative_path = "release/us_population_county/"
blob_sas_token = r""

# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)

# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')

# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

Azure Synapse

# This is a package in preview.
from azureml.opendatasets import UsPopulationCounty

population = UsPopulationCounty()
population_df = population.to_spark_dataframe()

# Display top 5 rows
display(population_df.limit(5))

# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "censusdatacontainer"
blob_relative_path = "release/us_population_county/"
blob_sas_token = r""

# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)

# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')

# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

次のステップ

Open Datasets カタログの残りのデータセットを表示します。

次の方法で共有

US Population by County (米国の郡別人口)

ボリュームとリテンション期間

保存先

[列]

プレビュー

データアクセス

Azure Notebooks

Azure Databricks

Azure Synapse

次のステップ

フィードバック

その他のリソース

次の方法で共有

US Population by County (米国の郡別人口)

ボリュームとリテンション期間

保存先

関連データセット

[列]

プレビュー

データ アクセス

Azure Notebooks

Azure Databricks

Azure Synapse

次のステップ

フィードバック

その他のリソース

データアクセス