NYC Taxi & Limousine Commission - For-Hire Vehicle (FHV) trip records
The For-Hire Vehicle (“FHV”) trip records include fields capturing the dispatching base license number and the pick-up date, time, and taxi zone location ID (shape file below). These records are generated from the FHV Trip Record submissions made by bases.
Note
Microsoft provides Azure Open Datasets on an “as is” basis. Microsoft makes no warranties, express or implied, guarantees or conditions with respect to your use of the datasets. To the extent permitted under your local law, Microsoft disclaims all liability for any damages or losses, including direct, consequential, special, indirect, incidental or punitive, resulting from your use of the datasets.
This dataset is provided under the original terms that Microsoft received source data. The dataset may include data sourced from Microsoft.
Volume and retention
This dataset is stored in Parquet format. There are about 500M rows (5 GB) as of 2018.
This dataset contains historical records accumulated from 2009 to 2018. You can use parameter settings in our SDK to fetch data within a specific time range.
Storage location
This dataset is stored in the East US Azure region. Allocating compute resources in East US is recommended for affinity.
Additional information
NYC Taxi and Limousine Commission (TLC):
The data was collected and provided to the NYC Taxi and Limousine Commission (TLC) by technology providers authorized under the Taxicab & Livery Passenger Enhancement Programs (TPEP/LPEP). The trip data was not created by the TLC, and TLC makes no representations as to the accuracy of these data.
View the original dataset location and the original terms of use.
Columns
Name | Data type | Unique | Values (sample) | Description |
---|---|---|---|---|
dispatchBaseNum | string | 1,144 | B02510 B02764 | The TLC Base License Number of the base that dispatched the trip |
doLocationId | string | 267 | 265 132 | TLC Taxi Zone in which the trip ended. |
dropOffDateTime | timestamp | 57,110,352 | 2017-07-31 23:59:00 2017-10-15 00:44:34 | The date and time of the trip dropoff. |
pickupDateTime | timestamp | 111,270,396 | 2016-08-16 00:00:00 2016-08-17 00:00:00 | The date and time of the trip pick-up. |
puLocationId | string | 266 | 79 161 | TLC Taxi Zone in which the trip began. |
puMonth | int | 12 | 1 12 | |
puYear | int | 5 | 2018 2017 | |
srFlag | string | 44 | 1 2 | Indicates if the trip was a part of a shared ride chain offered by a High Volume FHV company (for example, Uber Pool, Lyft Line). For shared trips, the value is 1. For non-shared rides, this field is null. NOTE: For most High Volume FHV companies, only shared rides that were requested AND matched to another shared-ride request over the course of the journey are flagged. However, Lyft (base license numbers B02510 + B02844) also flags rides for which a shared ride was requested but another passenger was not successfully matched to share the trip—therefore, trips records with SR_Flag=1 from those two bases could indicate EITHER a first trip in a shared trip chain OR a trip for which a shared ride was requested but never matched. Users should anticipate an overcount of successfully shared trips completed by Lyft. |
Preview
dispatchBaseNum | pickupDateTime | dropOffDateTime | puLocationId | doLocationId | srFlag | puYear | puMonth |
---|---|---|---|---|---|---|---|
B03157 | 6/30/2019 11:59:57 PM | 7/1/2019 12:07:21 AM | 264 | null | null | 2019 | 6 |
B01667 | 6/30/2019 11:59:56 PM | 7/1/2019 12:28:06 AM | 264 | null | null | 2019 | 6 |
B02849 | 6/30/2019 11:59:55 PM | 7/1/2019 12:14:10 AM | 264 | null | null | 2019 | 6 |
B02249 | 6/30/2019 11:59:53 PM | 7/1/2019 12:15:53 AM | 264 | null | null | 2019 | 6 |
B00887 | 6/30/2019 11:59:48 PM | 7/1/2019 12:29:29 AM | 264 | null | null | 2019 | 6 |
B01626 | 6/30/2019 11:59:45 PM | 7/1/2019 12:18:20 AM | 264 | null | null | 2019 | 6 |
B01259 | 6/30/2019 11:59:44 PM | 7/1/2019 12:03:15 AM | 264 | null | null | 2019 | 6 |
B01145 | 6/30/2019 11:59:43 PM | 7/1/2019 12:11:15 AM | 264 | null | null | 2019 | 6 |
B00887 | 6/30/2019 11:59:42 PM | 7/1/2019 12:34:21 AM | 264 | null | null | 2019 | 6 |
B00821 | 6/30/2019 11:59:40 PM | 7/1/2019 12:02:57 AM | 264 | null | null | 2019 | 6 |
Data access
Azure Notebooks
# This is a package in preview.
from azureml.opendatasets import NycTlcFhv
from datetime import datetime
from dateutil import parser
end_date = parser.parse('2018-06-06')
start_date = parser.parse('2018-05-01')
nyc_tlc = NycTlcFhv(start_date=start_date, end_date=end_date)
nyc_tlc_df = nyc_tlc.to_pandas_dataframe()
nyc_tlc_df.info()
Azure Databricks
# This is a package in preview.
# You need to pip install azureml-opendatasets in Databricks cluster. https://learn.microsoft.com/azure/data-explorer/connect-from-databricks#install-the-python-library-on-your-azure-databricks-cluster
from azureml.opendatasets import NycTlcFhv
from datetime import datetime
from dateutil import parser
end_date = parser.parse('2018-06-06')
start_date = parser.parse('2018-05-01')
nyc_tlc = NycTlcFhv(start_date=start_date, end_date=end_date)
nyc_tlc_df = nyc_tlc.to_spark_dataframe()
display(nyc_tlc_df.limit(5))
Azure Synapse
# This is a package in preview.
from azureml.opendatasets import NycTlcFhv
from datetime import datetime
from dateutil import parser
end_date = parser.parse('2018-06-06')
start_date = parser.parse('2018-05-01')
nyc_tlc = NycTlcFhv(start_date=start_date, end_date=end_date)
nyc_tlc_df = nyc_tlc.to_spark_dataframe()
# Display top 5 rows
display(nyc_tlc_df.limit(5))
Next steps
View the rest of the datasets in the Open Datasets catalog.