Power Query SDK test framework part 2 - Test data
This multi-part reference covers the setup and running of a standard suite of tests for Power Query connector developers. The reference is meant to be followed sequentially to ensure that your environment is set-up for testing your extension connector.
In this section, you:
- Learn about the dataset used in the standard suite of the tests
- Download a copy of the test data
- Upload the data to your choice of database for the connector you intend to test
This section covers the test data that is used by the test suite to verify your connector.
Overview of the dataset
The test data used for the framework is a modified version of the NewYork City Taxi & Limousine Commission (TLC) green trip record data.
The February 2023 green trip data was converted into "NycTaxiGreen" CSV file and 10,000 rows were sampled from it. Similarly, the "TaxiZoneLookup" CSV file contains 265 rows from the taxi zone lookup table. CSV data format was chosen due to its large applicability to nearly all other data sources, making it easier to import.
The definition for each field might be slightly different from the information found in NYC Taxi and Limousine green dataset - Azure Open Datasets and data_dictionary_trip_records_green.pdf (nyc.gov).
The dataset for our testing is split into two csv files: NycTaxiGreen and TaxiZoneLookup.
Getting the test data
The test data is placed in the Data Connectors git repo. Test data is provided in the form of CSV along with the schema definition for the NycTaxiGreen and TaxiZoneLookup tables. This data should be loaded to your data source as separate tables and you need to ensure that the schema corresponds to the datatypes defined in your data source.
Note
While uploading to the data source, all decimal values should have a scale of 2. That is, the number of digits after the decimal point should be 2.
All timestamp values should be uploaded to the datasouce in MM/DD/YYYY HH24:MI:SS
format.
You can also review the details regarding the test data.
Resources:
NycTaxiGreen table description
The following table provides further information about each field in the "NycTaxiGreen" table:
Name | Type | Description |
---|---|---|
RecordID | Integer | The original record ID order. The CSV starts with it out-of-order due to the random sampling. This field can be used as the primary key of the table. |
VendorID | Integer | Code indicating provider. |
lpep_pickup_datetime | Timestamp | Timezone information isn't included. If the data source requires one, UTC should be used. |
lpep_dropoff_datetime | Timestamp | Timezone information isn't included. If the data source requires one, UTC should be used. |
store_and_fwd_flag | Boolean | A true/false field. Some data sources might import this field as a character or string if the field type isn't specified. For the purposes of testing, this field should be specified as Boolean. |
RatecodeID | Integer | Rate code in effect at end of trip. |
PULocationID | Integer | TLC Taxi Zone when the taximeter engaged. |
DOLocationID | Integer | TLC Taxi Zone when the taximeter disengaged. |
passenger_count | Integer | Number of passengers in vehicle. |
trip_distance | Double | Elapsed trip distance in miles. |
fare_amount | Double | Time-and-distance fare. Two decimal places. |
extra | Double | Misc charges. Two decimal places. |
mta_tax | Double | MTA tax added. Two decimal places. |
tip_amount | Double | Tip amount. Two decimal places. |
tolls_amount | Double | Total number of tolls paid on the trip. Two decimal places. |
improvement_surcharge | Double | An improvement surcharge. Two decimal places. |
total_amount | Double | Total price charged to customers. Doesn't include tips. Two decimal places. |
payment_type | Integer | A numeric code for how the passenger paid. |
trip_type | Integer | A numeric code for whether the taxi was hailed or dispatched. |
congestion_surcharge | Double | An extra charge for congestion. Two decimal places. |
TaxiZoneLookup table description
The following table provides further information about each field in the "TaxiZoneLookup" table:
Name | Type | Description |
---|---|---|
LocationId | Integer | Location ID for any location. It maps to PULocationID and DOLocationID in the "NycTaxiGreen" table. This field can be used as the primary key of the table. |
Borough | String | Borough for the location. |
Zone | String | Name of the Zone for the location. |
service_zone | String | Name of the Service Zone for the location. |
Conclusion
This section described the test data used within the standardized test suite and its schema. Your database should now be populated with tables based on the specified CSV files with the correct schema.
In the next section, you gain an understanding of different sections of the test suite that helps you validate the setup of the dataset and your extension connector.