Running batches with Prompt flow
If you want to test your flow with multiple inputs, you can use the batch run feature. This allows you to run your flow with a list of inputs from either a csv, tsv, JSON line file. Afterwards, all of the outputs will be saved to another JSON line file. In the next article you can then use the output file to evaluate your flow.
To get started, you must first create a JSON lines file that contains sample inputs and the correct ground truth. The following sections will walk you through how to do this.
This article uses the completed solution from the preview article. If you want to follow along, you can use the following samples in the public documentation repository.
Note
Today Prompt flow is only available in Python, so this article will only show how to use Prompt flow to evaluate plugins using Python.
Language | Link to completed solution |
---|---|
C# | Not available |
Java | Not available |
Python | Open solution in GitHub |
Create benchmark data for your Prompt flow
All benchmark data can be found in the data.jsonl file. This file contains a list of JSON objects that contains the input and the correct ground truth. Let's update the data.jsonl file with data that we can use to evaluate our plugin.
{"text": "How many sheep would you have if you started with 3 and got 2 more?", "groundtruth": "5"}
{"text": "What would be the area of a rectangle with a sides of 2ft and 3ft?", "groundtruth": "6"}
{"text": "What would you have left if you spent $3 when you only had $2 to begin with", "groundtruth": "-1"}
{"text": "How many slices of pizza would everyone get if you split 12 slices equally among 3 people", "groundtruth": "4"}
{"text": "What is the sum of 5 and 3?", "groundtruth": "8"}
{"text": "Subtract 7 from 10.", "groundtruth": "3"}
{"text": "Multiply 6 by 4.", "groundtruth": "24"}
{"text": "Divide 20 by 5.", "groundtruth": "4"}
{"text": "What is the square of 7?", "groundtruth": "49"}
{"text": "What is the square root of 81?", "groundtruth": "9"}
Run your Prompt flow with the benchmark data
Now that we have our benchmark data, we can run our flow over the data to see how well it performs. There are several ways to run a flow using either the VS Code extension or the CLI.
Use the visual editor to run a batch of inputs
Open the visual editor.
Select the Batch run icon (the beaker icon).
Select Local JSON Lines File.
Select the data.jsonl file in the file picker.
Select the Run button in the new file.
Use the CLI to run a batch of inputs
Navigate to the root of the flow folder.
cd ./perform_math
Run the following command in your terminal; we'll use the
--name
parameter to name the run perform_math.pf run create --flow . --data data.jsonl --stream --name perform_math
Important
The name of evaluations must be unique. If you run the same evaluation twice, you will need to use a different name otherwise the second run will fail.
Viewing the results
If you use the CLI to name your batch run with the --name
parameter, you can use the following commands to get the results afterwards.
pf run show-details -n perform_math
pf run visualize -n perform_math
After running the pf run show-details
command, you should see the following output.
If you are running the planner with GPT-3.5-turbo, you'll likely run into a few errors, so only some of the results will come back (notice that line 3 failed) and a few of the results may be incorrect (e.g., line 1, 2, and 4).
+----+-------------------------------------------------------------------------------------------+----------------------+-------------------+
| | inputs.text | inputs.line_number | outputs.result |
+====+===========================================================================================+======================+===================+
| 0 | How many sheep would you have if you started with 3 and got 2 more? | 0 | 5.0 |
+----+-------------------------------------------------------------------------------------------+----------------------+-------------------+
| 1 | What would be the area of a rectangle with a sides of 2ft and 3ft? | 1 | 2.449489742783178 |
+----+-------------------------------------------------------------------------------------------+----------------------+-------------------+
| 2 | What would you have left if you spent $3 when you only had $2 to begin with | 2 | |
+----+-------------------------------------------------------------------------------------------+----------------------+-------------------+
| 3 | How many slices of pizza would everyone get if you split 12 slices equally among 3 people | 3 | (Failed) |
+----+-------------------------------------------------------------------------------------------+----------------------+-------------------+
| 4 | What is the sum of 5 and 3? | 4 | 5.0 |
+----+-------------------------------------------------------------------------------------------+----------------------+-------------------+
| 5 | Subtract 7 from 10. | 5 | 3.0 |
+----+-------------------------------------------------------------------------------------------+----------------------+-------------------+
| 6 | Multiply 6 by 4. | 6 | 24.0 |
+----+-------------------------------------------------------------------------------------------+----------------------+-------------------+
| 7 | Divide 20 by 5. | 7 | 4.0 |
+----+-------------------------------------------------------------------------------------------+----------------------+-------------------+
| 8 | What is the square of 7? | 8 | 49.0 |
+----+-------------------------------------------------------------------------------------------+----------------------+-------------------+
| 9 | What is the square root of 81? | 9 | 9.0 |
+----+-------------------------------------------------------------------------------------------+----------------------+-------------------+
As you can see, the results are not yet perfect. In the next article, we'll use Prompt flow's evaluation feature to quantify how well our flow is performing and then we'll update our plugin and planner to improve the results.
Using the VS Code extension to view the results
If you don't know the name of your run, you can also use the Prompt flow VS Code extension to see a history of all your previous runs and visualize them. To do this, follow these steps:
Select the Prompt flow icon in the app bar in VS Code.
In the Batch run history section, select the refresh button.
Select the run you want to view.
Select Visualize & analyze.
Afterwards, you'll get the same visualization that you'd see if you had run
pf run visualize -n perform_math
in your terminal.
View the logs
To see what the flow is doing, you can open and view the logs of the run. To do this, follow these steps.
Run the following command to view the details of the run.
pf run stream -n perform_math
Before any of the errors are output to the terminal, you should see a run summary
======= Run Summary ======= Run name: "perform_math" Run status: "Completed" Start time: "2023-09-07 11:22:05.160936" Duration: "0:00:13.736032" Output path: "/Users/<user>/.promptflow/.runs/perform_math"
Copy the value of the
Output path
property.Navigate to the output path. You should see a folder that looks like the following.
Open the node_artifacts/math_planner folder.
Open the one of the JSON line files. These files contain the logs of a single run of your custom node so you can see what your planner is doing. You should see results like the following.
{ "NodeName": "math_planner", "line_number": 3, "run_info": { "node": "math_planner", "flow_run_id": "perform_math", "run_id": "perform_math_math_planner_3", "status": "Completed", "inputs": { "input1": "How many slices of pizza would everyone get if you split 12 slices equally among 3 people" }, "output": "4.0", "metrics": null, "error": null, "parent_run_id": "perform_math_3", "start_time": "2023-09-05T14:40:55.159904Z", "end_time": "2023-09-05T14:41:02.668920Z", "index": 3, "api_calls": [ { "name": "my_python_tool", "type": "Tool", "inputs": { "input1": "How many slices of pizza would everyone get if you split 12 slices equally among 3 people" }, "output": "4.0", "start_time": 1693921255.159942, "end_time": 1693921262.668671, "error": null, "children": null, "node_name": "math_planner" } ], "variant_id": "", "cached_run_id": null, "cached_flow_run_id": null, "logs": { "stdout": "[2023-09-05T14:41:02+0000] Function: MathPlugin.Divide\n[2023-09-05T14:41:02+0000] Input vars: {'input': '12', 'denominator': '3'}\n[2023-09-05T14:41:02+0000] Output vars: ['RESULT__SLICES_PER_PERSON']\n[2023-09-05T14:41:02+0000] Result: 4.0\n", "stderr": "" }, "system_metrics": { "duration": 7.509016 }, "result": "4.0" }, "start_time": "2023-09-05T14:40:55.159904", "end_time": "2023-09-05T14:41:02.668920", "status": "Completed" }
Any print statements in your code will be logged in the
run_info.logs.stdout
property.
Next steps
Now that you know how to run a batch of inputs on your flow, you can now use the evaluation feature to quantify the actual performance of your flow.