All New York City 311 service requests from 2010 to the present.
Note
Microsoft provides Azure Open Datasets on an “as is” basis. Microsoft makes no warranties, express or implied, guarantees or conditions with respect to your use of the datasets. To the extent permitted under your local law, Microsoft disclaims all liability for any damages or losses, including direct, consequential, special, indirect, incidental or punitive, resulting from your use of the datasets.
This dataset is provided under the original terms that Microsoft received source data. The dataset may include data sourced from Microsoft.
Volume and retention
This dataset is stored in Parquet format. It is updated daily, and contains about 12M rows (500 MB) in total as of 2019.
This dataset contains historical records accumulated from 2010 to the present. You can use parameter settings in our SDK to fetch data within a specific time range.
Storage location
This dataset is stored in the East US Azure region. Allocating compute resources in East US is recommended for affinity.
House number of incident address provided by submitter.
category
string
446
Noise - Residential HEAT/HOT WATER
This is the first level of a hierarchy identifying the topic of the incident or condition (Complaint Type). It may have a corresponding subcategory (Descriptor) or may stand alone.
dataSubtype
string
1
311_All
“311_All”
dataType
string
1
Safety
“Safety”
dateTime
timestamp
17,332,609
2013-01-24 00:00:00 2015-01-08 00:00:00
Date Service Request was created.
latitude
double
1,513,691
40.89187241649303 40.72195913199264
Geo based Latitude of the incident location.
longitude
double
1,513,713
-73.86016845296459 -73.80969682426189
Geo based Longitude of the incident location.
status
string
13
Closed Pending
Status of Service Request submitted.
subcategory
string
1,716
Loud Music/Party ENTIRE BUILDING
This is associated to the category (Complaint Type), and provides further detail on the incident or condition. Its values are dependent on the Complaint Type, and are not always required in Service Request.
# This is a package in preview.
from azureml.opendatasets import SanFranciscoSafety
from datetime import datetime
from dateutil import parser
end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = SanFranciscoSafety(start_date=start_date, end_date=end_date)
safety = safety.to_pandas_dataframe()
from azure.storage.blob import BlockBlobServicefrom azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
if azure_storage_account_name is None or azure_storage_sas_token is None:
raise Exception(
"Provide your specific name and key for your Azure Storage account--see the Prerequisites section earlier.")
print('Looking for the first parquet under the folder ' +
folder_name + ' in container "' + container_name + '"...')
container_url = f"https://{azure_storage_account_name}.blob.core.windows.net/"
blob_service_client = BlobServiceClient(
container_url, azure_storage_sas_token if azure_storage_sas_token else None)
container_client = blob_service_client.get_container_client(container_name)
blobs = container_client.list_blobs(folder_name)
sorted_blobs = sorted(list(blobs), key=lambda e: e.name, reverse=True)
targetBlobName = ''
for blob in sorted_blobs:
if blob.name.startswith(folder_name) and blob.name.endswith('.parquet'):
targetBlobName = blob.name
break
print('Target blob to download: ' + targetBlobName)
_, filename = os.path.split(targetBlobName)
blob_client = container_client.get_blob_client(targetBlobName)
with open(filename, 'wb') as local_file:
blob_client.download_blob().download_to_stream(local_file)
# Read the parquet file into Pandas data frame
import pandas as pd
print('Reading the parquet file into Pandas data frame')
df = pd.read_parquet(filename)
# you can add your filter at below
print('Loaded as a Pandas data frame: ')
df
Sample not available for this platform/package combination.
# This is a package in preview.
# You need to pip install azureml-opendatasets in Databricks cluster. https://learn.microsoft.com/azure/data-explorer/connect-from-databricks#install-the-python-library-on-your-azure-databricks-cluster
from azureml.opendatasets import SanFranciscoSafety
from datetime import datetime
from dateutil import parser
end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = SanFranciscoSafety(start_date=start_date, end_date=end_date)
safety = safety.to_spark_dataframe()
display(safety.limit(5))
Sample not available for this platform/package combination.
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))
# This is a package in preview.
from azureml.opendatasets import SanFranciscoSafety
from datetime import datetime
from dateutil import parser
end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = SanFranciscoSafety(start_date=start_date, end_date=end_date)
safety = safety.to_spark_dataframe()
# Display top 5 rows
display(safety.limit(5))
Sample not available for this platform/package combination.
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))
Use Azure Data Explorer to characterize an unfamiliar dataset. Explore the schema of the data, run queries to examine the range of the data, and share these insights with others.