predict_onnx_fl()
Applies to: ✅ Microsoft Fabric ✅ Azure Data Explorer ✅ Azure Monitor ✅ Microsoft Sentinel
The function predict_onnx_fl()
is a user-defined function (UDF) that predicts using an existing trained machine learning model. This model has been converted to ONNX format, serialized to string, and saved in a standard table.
Prerequisites
- The Python plugin must be enabled on the cluster. This is required for the inline Python used in the function.
- The Python plugin must be enabled on the database. This is required for the inline Python used in the function.
Syntax
T | invoke predict_onnx_fl(
models_tbl,
model_name,
features_cols,
pred_col)
Learn more about syntax conventions.
Parameters
Name | Type | Required | Description |
---|---|---|---|
models_tbl | string |
✔️ | The name of the table that contains all serialized models. The table must have the following columns:name : the model nametimestamp : time of model trainingmodel : string representation of the serialized model |
model_name | string |
✔️ | The name of the specific model to use. |
features_cols | synamic | ✔️ | An array containing the names of the features columns that are used by the model for prediction. |
pred_col | string |
✔️ | The name of the column that stores the predictions. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Define the function using the following let statement. No permissions are required.
Important
A let statement can't run on its own. It must be followed by a tabular expression statement. To run a working example of predict_onnx_fl()
, see Example.
let predict_onnx_fl=(samples:(*), models_tbl:(name:string, timestamp:datetime, model:string), model_name:string, features_cols:dynamic, pred_col:string)
{
let model_str = toscalar(models_tbl | where name == model_name | top 1 by timestamp desc | project model);
let kwargs = bag_pack('smodel', model_str, 'features_cols', features_cols, 'pred_col', pred_col);
let code = ```if 1:
import binascii
smodel = kargs["smodel"]
features_cols = kargs["features_cols"]
pred_col = kargs["pred_col"]
bmodel = binascii.unhexlify(smodel)
features_cols = kargs["features_cols"]
pred_col = kargs["pred_col"]
import onnxruntime as rt
sess = rt.InferenceSession(bmodel)
input_name = sess.get_inputs()[0].name
label_name = sess.get_outputs()[0].name
df1 = df[features_cols]
predictions = sess.run([label_name], {input_name: df1.values.astype(np.float32)})[0]
result = df
result[pred_col] = pd.DataFrame(predictions, columns=[pred_col])
```;
samples | evaluate python(typeof(*), code, kwargs)
};
// Write your query to use the function here.
Example
The following example uses the invoke operator to run the function.
To use a query-defined function, invoke it after the embedded function definition.
let predict_onnx_fl=(samples:(*), models_tbl:(name:string, timestamp:datetime, model:string), model_name:string, features_cols:dynamic, pred_col:string)
{
let model_str = toscalar(models_tbl | where name == model_name | top 1 by timestamp desc | project model);
let kwargs = bag_pack('smodel', model_str, 'features_cols', features_cols, 'pred_col', pred_col);
let code = ```if 1:
import binascii
smodel = kargs["smodel"]
features_cols = kargs["features_cols"]
pred_col = kargs["pred_col"]
bmodel = binascii.unhexlify(smodel)
features_cols = kargs["features_cols"]
pred_col = kargs["pred_col"]
import onnxruntime as rt
sess = rt.InferenceSession(bmodel)
input_name = sess.get_inputs()[0].name
label_name = sess.get_outputs()[0].name
df1 = df[features_cols]
predictions = sess.run([label_name], {input_name: df1.values.astype(np.float32)})[0]
result = df
result[pred_col] = pd.DataFrame(predictions, columns=[pred_col])
```;
samples | evaluate python(typeof(*), code, kwargs)
};
//
// Predicts room occupancy from sensors measurements, and calculates the confusion matrix
//
// Occupancy Detection is an open dataset from UCI Repository at https://archive.ics.uci.edu/ml/datasets/Occupancy+Detection+
// It contains experimental data for binary classification of room occupancy from Temperature,Humidity,Light and CO2.
// Ground-truth labels were obtained from time stamped pictures that were taken every minute
//
OccupancyDetection
| where Test == 1
| extend pred_Occupancy=bool(0)
| invoke predict_onnx_fl(ML_Models, 'ONNX-Occupancy', pack_array('Temperature', 'Humidity', 'Light', 'CO2', 'HumidityRatio'), 'pred_Occupancy')
| summarize n=count() by Occupancy, pred_Occupancy
Output
Occupancy | pred_Occupancy | n |
---|---|---|
TRUE | TRUE | 3006 |
FALSE | TRUE | 112 |
TRUE | FALSE | 15 |
FALSE | FALSE | 9284 |