Редактиране

Споделяне чрез


NYC Taxi & Limousine Commission - For-Hire Vehicle (FHV) trip records

The For-Hire Vehicle (“FHV”) trip records include fields capturing the dispatching base license number and the pick-up date, time, and taxi zone location ID (shape file below). These records are generated from the FHV Trip Record submissions made by bases.

Note

Microsoft provides Azure Open Datasets on an “as is” basis. Microsoft makes no warranties, express or implied, guarantees or conditions with respect to your use of the datasets. To the extent permitted under your local law, Microsoft disclaims all liability for any damages or losses, including direct, consequential, special, indirect, incidental or punitive, resulting from your use of the datasets.

This dataset is provided under the original terms that Microsoft received source data. The dataset may include data sourced from Microsoft.

Volume and retention

This dataset is stored in Parquet format. There are about 500M rows (5 GB) as of 2018.

This dataset contains historical records accumulated from 2009 to 2018. You can use parameter settings in our SDK to fetch data within a specific time range.

Storage location

This dataset is stored in the East US Azure region. Allocating compute resources in East US is recommended for affinity.

Additional information

NYC Taxi and Limousine Commission (TLC):

The data was collected and provided to the NYC Taxi and Limousine Commission (TLC) by technology providers authorized under the Taxicab & Livery Passenger Enhancement Programs (TPEP/LPEP). The trip data was not created by the TLC, and TLC makes no representations as to the accuracy of these data.

View the original dataset location and the original terms of use.

Columns

Name Data type Unique Values (sample) Description
dispatchBaseNum string 1,144 B02510 B02764 The TLC Base License Number of the base that dispatched the trip
doLocationId string 267 265 132 TLC Taxi Zone in which the trip ended.
dropOffDateTime timestamp 57,110,352 2017-07-31 23:59:00 2017-10-15 00:44:34 The date and time of the trip dropoff.
pickupDateTime timestamp 111,270,396 2016-08-16 00:00:00 2016-08-17 00:00:00 The date and time of the trip pick-up.
puLocationId string 266 79 161 TLC Taxi Zone in which the trip began.
puMonth int 12 1 12
puYear int 5 2018 2017
srFlag string 44 1 2 Indicates if the trip was a part of a shared ride chain offered by a High Volume FHV company (for example, Uber Pool, Lyft Line). For shared trips, the value is 1. For non-shared rides, this field is null. NOTE: For most High Volume FHV companies, only shared rides that were requested AND matched to another shared-ride request over the course of the journey are flagged. However, Lyft (base license numbers B02510 + B02844) also flags rides for which a shared ride was requested but another passenger was not successfully matched to share the trip—therefore, trips records with SR_Flag=1 from those two bases could indicate EITHER a first trip in a shared trip chain OR a trip for which a shared ride was requested but never matched. Users should anticipate an overcount of successfully shared trips completed by Lyft.

Preview

dispatchBaseNum pickupDateTime dropOffDateTime puLocationId doLocationId srFlag puYear puMonth
B03157 6/30/2019 11:59:57 PM 7/1/2019 12:07:21 AM 264 null null 2019 6
B01667 6/30/2019 11:59:56 PM 7/1/2019 12:28:06 AM 264 null null 2019 6
B02849 6/30/2019 11:59:55 PM 7/1/2019 12:14:10 AM 264 null null 2019 6
B02249 6/30/2019 11:59:53 PM 7/1/2019 12:15:53 AM 264 null null 2019 6
B00887 6/30/2019 11:59:48 PM 7/1/2019 12:29:29 AM 264 null null 2019 6
B01626 6/30/2019 11:59:45 PM 7/1/2019 12:18:20 AM 264 null null 2019 6
B01259 6/30/2019 11:59:44 PM 7/1/2019 12:03:15 AM 264 null null 2019 6
B01145 6/30/2019 11:59:43 PM 7/1/2019 12:11:15 AM 264 null null 2019 6
B00887 6/30/2019 11:59:42 PM 7/1/2019 12:34:21 AM 264 null null 2019 6
B00821 6/30/2019 11:59:40 PM 7/1/2019 12:02:57 AM 264 null null 2019 6

Data access

Azure Notebooks

# This is a package in preview.
from azureml.opendatasets import NycTlcFhv

from datetime import datetime
from dateutil import parser


end_date = parser.parse('2018-06-06')
start_date = parser.parse('2018-05-01')
nyc_tlc = NycTlcFhv(start_date=start_date, end_date=end_date)
nyc_tlc_df = nyc_tlc.to_pandas_dataframe()

nyc_tlc_df.info()

Azure Databricks

# This is a package in preview.
# You need to pip install azureml-opendatasets in Databricks cluster. https://learn.microsoft.com/azure/data-explorer/connect-from-databricks#install-the-python-library-on-your-azure-databricks-cluster
from azureml.opendatasets import NycTlcFhv

from datetime import datetime
from dateutil import parser


end_date = parser.parse('2018-06-06')
start_date = parser.parse('2018-05-01')
nyc_tlc = NycTlcFhv(start_date=start_date, end_date=end_date)
nyc_tlc_df = nyc_tlc.to_spark_dataframe()

display(nyc_tlc_df.limit(5))

Azure Synapse

# This is a package in preview.
from azureml.opendatasets import NycTlcFhv

from datetime import datetime
from dateutil import parser


end_date = parser.parse('2018-06-06')
start_date = parser.parse('2018-05-01')
nyc_tlc = NycTlcFhv(start_date=start_date, end_date=end_date)
nyc_tlc_df = nyc_tlc.to_spark_dataframe()

# Display top 5 rows
display(nyc_tlc_df.limit(5))

Next steps

View the rest of the datasets in the Open Datasets catalog.