Connect(); 2017

Volume 32 Number 13

Machine Learning - Deliver On-Device Machine Learning Solutions

By Larry O'Brien | Connect(); 2017

You’ve read the headlines: Artificial intelligence and machine learning (AI/ML) technologies are rewriting the benchmarks across a vast swath of hard problems. Whether it’s AlphaGo besting the best human Go player or AlphaGo Zero beating that in three days of learning the game from scratch, or Microsoft Research setting a new benchmark for conversation speech recognition, every week seems to bring some new advance built on “deep learning” and “artificial neural networks.” Or perhaps you’ve been more interested in the headlines about bidding wars on the salaries and signing bonuses for developers with ML competence. Either way, the AI/ML train is leaving the station and you don’t want to be left behind.

While AI/ML research is advancing at a truly giddy pace, a less celebrated but equally exciting trend is the availability of easy-to-use libraries for delivering ML functionality on mobile and edge devices. CoreML on iOS and Tensorflow Android Inference on Android are straightforward and consistent once you understand the tools and workflow. As a career strategy, competence in ML technologies is one of the hottest ways toward career flexibility and higher compensation. (From Twitter: “It’s AI when you’re raising money, it’s ML when you’re hiring devs.”)

It’s easy to be intimidated by the latest headlines about AI systems achieving superhuman performance in voice transcription or game-playing, but as Satya Nadella writes in his book, “Hit Refresh” (HarperBusiness, 2017), “Every organization today needs new cloud-based infrastructure and applications that can convert vast amounts of data into predictive and analytical power through the use of advanced analytics, machine learning, and AI.”

Many articles and demos about device-based ML focus on vision tasks. This is an area where there has been truly astounding advancement in the past decade. Object detection, captioning and image-to-image style transfer have all advanced at a blistering pace. Azure Cognitive Services CustomVision ( makes it ridiculously simple to develop custom classification models that can be deployed on iOS using the techniques described in this article.

While visual and audio understanding are both inherently important and historically challenging, the “deep learning” revolution in ML goes well beyond these areas. Pattern recognition is at the core of modern ML. Many developers work in areas where recognizing patterns in complex and noisy data is central to their business’ value proposition. Classifying data, time-series projection, sequence modeling and even sequence-to-sequence construction are all areas where modern ML may lead to competitive advantage. Many developers, for instance, face the problem of “time-series prediction,” in which they must reason from large amounts of data that have some structure but are very noisy or have many factors contributing to the ups and downs.

A historically important time-series prediction problem is tide prediction. Predicting the tides was, of course, crucially important in the Ages of Sail and Steam both for merchants and the military. One of the most important artifacts of the early days of computing is Lord Kelvin’s tide-predicting machine, described in Charles Petzold’s work-in-progress “Computer of the Tides” (, as “a magnificent assemblage of brass and wood that stands as a tall as a person, as gorgeous as it is mysterious.” The timing of tides is primarily dictated by the geometry of the earth, moon, and sun and by the complex flooding and draining of bays. The forces are so complex that modern accurate tide prediction uses more than 30 site-specific harmonic components, whose values are derived from hundreds of tide gauges spread across the globe. A complete cycle of the system takes 19 years.

Predicting tides is a reasonably difficult task for a modern ML approach. Given 200 historical readings of water level taken every three hours, how accurately can the tide be predicted up to 300 hours into the future?

Modeling the Problem

As chance would have it, I have several thousands of lines of tide-predicting F# code in one of my “finish someday” projects, and it was trivial for me to generate data based on a real harbor (which I’ll call “Contoso Harbor” so that no one is tempted to use this code for navigation). To make the task both more difficult and more real-to-life, I added random noise to the training and validation sets (normally distributed, with a standard deviation of 1.5”).

There are many deep-learning libraries available to developers. The nitty-gritty of deep learning involves lots of parallel multiplication and sums over very large arrays of floating-point numbers. GPU support noticeably speeds up even trivial ML projects, and low-level performance is an area where the various large projects compete with each other, just as with graphic shaders. (Interestingly, ML doesn’t generally require high precision, and the emerging field of Tensor Processing Units [TPUs] will probably trade-off increased parallelism and power efficiency for word size.)

However, given an efficient low-level foundation, most non-­research-level ML architectures can be described using much higher-level abstractions. Some libraries, such as Keras, provide those abstractions on top of various low-level libraries; other libraries, such as Microsoft’s Cognitive Toolkit, provide both high-level and low-level abstractions.

While there are several interchange formats striving to gain mindshare, at the moment there's considerable lock-in to the library you choose for training. If you train in Tensorflow, you most likely have to inference in Tensorflow, if you train in Caffe, you most likely have to inference in Caffe.

Classical neural networks do not have any “memory” of their previous inputs and outputs. This is a serious short-coming when it comes to time-series prediction! Recurrent Neural Networks (RNNs),  as their name implies, combine their current input with previous results that are looped back as additional inputs. This allows RNNs to recognize patterns in sequential data. The Long Short Term Memory (LSTM) cell, developed by Sepp Hochreiter and Jürgen Schmidhuber in 1997, is a form of RNN that uses internal “gates” to selectively amplify (remember) or damp (forget) these recurrent connections. Although  LSTMs are somewhat old fashioned and do not train as fast as modern variants, they have the distinct advantage of being widely supported. An LSTM cell is at the core of my model.

Because Tensorflow is necessary for deployment on Android, I chose Keras on top of Tensorflow to develop the model for this project. The Keras high-level description of my model looks like this:

def build_model(lookback_length, input_feature_count, lstm_layer_output_dimensions,
  prediction_length, dropout_pct):
    model = Sequential()
    model.add(LSTM(lstm_layer_output_dimensions, input_shape=(lookback_length,
    model.compile(loss='mse', optimizer='rmsprop')
    return model

The Long Short Term Memory (LSTM) cell looks back lookback_length samples at an input that has input_feature_count features. In this case, I have only one input feature: the input water levels at previous three-hour intervals. The output of the LSTM layer feeds into a densely interconnected layer that maps from an array of size lstm_layer_output_dimensions to an array of prediction_length that contains the model’s predictions of the water level at future intervals. Figure 1 shows a schematic of the architecture.

Schematic of the Tide-Prediction Neural Network
Figure 1 Schematic of the Tide-Prediction Neural Network

This is about as plain-vanilla a model as one could imagine for a time-series prediction problem. The LSTM cell is a kind of RNN.

Figure 2 shows how the model is built and trained. I use Pandas to read the training and validation data from the file contoso_noisy.txt and set the constants for a particular training experiment—in this case, looking back 200 steps, looking forward 100—with a 128-element hidden layer. The dropout_density sets a random percentage of input data to zero during training, which is immensely helpful for avoiding over-fitting (the problem of the model learning the specific training data and not generalizing to new situations). I convert the input data_frame to inputs and outputs for training and testing (the data-munging function dataframe_to_matrices isn’t shown, but is available in the source code distribution). I call the previously discussed build_model function and then call the function. This hours-long call adjusts the model’s internal values every 100 passes, and repeats either 2,500 times or once the error of the model drops below 12 percent of 1 foot, holding back 15 percent of the data for the validation step. The first few epochs of a typical run are shown in Figure 3 and training and validation errors are shown in Figure 4.

Figure 2 Data Initialization and Model Training

data_frame = pd.read_csv("contoso_noisy.txt", names = ["level"])
input_count = 200 # How far to look back
output_count = 100 # How many steps forward to predict
lstm_layer_output_dimensions = 128 # Size of LSTM output
dropout_pct = 0.15 # Dropout density to avoid over-fitting
(training_inputs, training_targets, test_input) =
  dataframe_to_matrices(data_frame, input_count, output_count)
# How many input features? In this case, 1, but changes from model-to-model
features = training_inputs.shape[2]
model = build_model(input_count, features, lstm_layer_output_dimensions,
  output_count, dropout_pct)
# Train (Experimentally, ~0.12 seems to be an \"elbow\" --
  lower ThresholdStop to gain accuracy by spending training time)
training_history =, training_targets,
  epochs=2500, batch_size=100, validation_split=0.15, callbacks=[ThresholdStop(0.12)])
# Predict and output results, using input data held back from training
predicted = model.predict(test_input)

Figure 3 A Typical Training Session Begins

Using TensorFlow backend.
Train on 2295 samples, validate on 405 samples
Epoch 1/2500
2017-10-30 21:51:49.576493: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-10-30 21:51:50.155264: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\common_runtime\gpu\] Found device 0 with properties:
name: GeForce GTX 960M
major: 5 minor: 0 memoryClockRate (GHz) 1.0975
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.65GiB
2017-10-30 21:51:50.166001: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\common_runtime\gpu\] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0)
2295/2295 [======================] - 9s - loss: 4.4804 - val_loss: 2.8267
Epoch 2/2500
2295/2295 [======================] - 6s - loss: 3.0078 - val_loss: 2.8101
Epoch 3/2500
2295/2295 [======================] - 6s - loss: 2.8734 - val_loss: 2.6333
Epoch 4/2500
2295/2295 [======================] - 6s - loss: 2.5907 - val_loss: 2.2159
Epoch 5/2500
2295/2295 [======================] - 6s - loss: 1.8314 - val_loss: 1.1734
Epoch 6/2500
2295/2295 [======================] - 6s - loss: 0.9937 - val_loss: 0.7333
Epoch 7/2500
2295/2295 [======================] - 6s - loss: 0.7608 - val_loss: 0.6626
Epoch 8/2500
2295/2295 [======================] - 6s - loss: 0.6948 - val_loss: 0.6373

A Typical Training Run
Figure 4 A Typical Training Run

Experienced ML developers might raise their eyebrows at the curves in Figure 4, which show the validation error (the test of the model against data held back from training) being less than that on the training data for quite a while; but the Loss curve includes that data held back by dropout, which artificially raises the error. The graph truncates the Y axis and so it doesn’t show the error for the first few dozen epochs. In general, it’s a pretty good curve, with no sign of overfitting, and pretty rapid convergence on an error around .15 percent of a foot, which is a little less than 2 inches. Pretty good for a noisy training set!

Even better, Figure 5 shows the model and ground-truth predictions going forward 100 timesteps, which for the three-hour timesteps translates into 12 1/2 days. While the amplitudes are off, the timing and general waxing and waning of spring and neap tides are clearly captured by the model. (It’s typical that the water level doesn't average to zero over such a short time peroid.)

The Neural Net Captures Tide Cycles
Figure 5 The Neural Net Captures Tide Cycles

The trained model can be saved in the preferred Keras HDF5 format and reloaded and used as necessary to predict tides in Contoso Harbor given recent or historical tide gauge readings.

Conversion and Deployment to Mobile Devices

While many ML scenarios can use a cloud-based Web service for the calculation, there are several reasons why on-device inferencing might be preferable.

First and foremost is performance. While mobile devices pale in horsepower compared to the GPUs on Azure N-Series machines, inference is vastly less costly than training. The latest iPhones have the A11 Bionic chip, with hardware dedicated to neural net operations, and the Pixel 2 Pixel Visual Core points the way to similar accelerated capabilities on Android.

While my experience is that on-device inference with typical hardware can take upward of a second with large models, good async programming practices can lead to apps with excellent responsiveness. See, for instance, the CoreMLAzureModel and CoreMLVision samples at, and, both of which perform inference on video streams. The StopWatch class can be invaluable for understanding the computational cost of your on-device inferencing.

Second, data volumes can be significant. In scenarios that involve continuous inferencing, audio and image data (much less video streams) can chew up bandwidth quickly.

Finally, there will always be the possibility that users just plain don’t have Internet access at the moment.

On-device inferencing was introduced in the CoreML framework in iOS 11 and macOS, while Android users can use Tensorflow Android Inference (TAI). In the future, Google’s just-announced Neural Networks API (, will likely be preferred over this library.

Whether targeting CoreML or TAI, you have to convert the Keras HDF5 file to compatible formats. Conversion to CoreML is simple:

import coremltools
# Convert to CoreML
coreml_model = coremltools.converters.keras.convert(
  "keras_model_lstm.hdf5", ["readings"], ["predicted_tide_ft"]) = 'Larry O\'Brien'
coreml_model.license = 'MIT''LSTM_TidePrediction.mlmodel')

The CoreML code relies on the coremltools package written by Apple, whose source code is available under the 3-Clause BSD License at CoreML works with a large number of ML models, including non-neural network models, such as Supper Vector Machines, tree ensembles, and linear and logistic regresson models. (See the table at the Xamarin’s CoreML API documentation homepage,

Because this model was trained using Tensorflow, I can extract the underlying Tensorflow computational graph and save the weights, as shown in Figure 6. This code is derived from the work of Amir Abdi (

Figure 6 Extracting and saving Underlying Tensorflow Data from a Keras Model

# Derived from code by Amir H. Abdi released under the MIT Public License
input_fld = '.'
weight_file = 'keras_model_lstm.hdf5'
num_output = 1
write_graph_def_ascii_flag = True
prefix_output_node_names_of_final_network = 'output_node'
output_graph_name = 'TF_LSTM_Inference.pb'
from keras.models import load_model
import tensorflow as tf
import os
import os.path as osp
from keras import backend as K
output_fld = input_fld
if not os.path.isdir(output_fld):
weight_file_path = osp.join(input_fld, weight_file)
net_model = load_model(weight_file_path)
pred = [None]*num_output
pred_node_names = [None]*num_output
for i in range(num_output):
  pred_node_names[i] = prefix_output_node_names_of_final_network+str(i)
  pred[i] = tf.identity(net_model.output[i], name=pred_node_names[i])
print('output nodes names are: ', pred_node_names)
sess = K.get_session()
if write_graph_def_ascii_flag:
  f = 'only_the_graph_def.pb.ascii'
  tf.train.write_graph(sess.graph.as_graph_def(), output_fld, f, as_text=True)
  print('saved the graph definition in ascii format at: ', osp.join(output_fld, f))
from tensorflow.python.framework import graph_util
from tensorflow.python.framework import graph_io
constant_graph = graph_util.convert_variables_to_constants(
  sess, sess.graph.as_graph_def(), pred_node_names)
graph_io.write_graph(constant_graph, output_fld, output_graph_name, as_text=False)
print('saved the constant graph (ready for inference) at: ', osp.join(
  output_fld, output_graph_name))

Writing the Apps

With a well-performing model in hand and converted to device formats, it’s time to develop the app. Although the ML inferencing specifics differ between iOS and Android, naturally I'll have a single source for UI code via Xamarin.Forms.

The completion solution (available at contains four projects: the Xamarin.Forms shared-code project that defines the UX, its iOS and Android implementations, and an Android binding project for the TAI library.

Although Tensorflow is associated with Google, shipping versions of Android don’t have built-in support for ML models. Instead, this project uses the easy-to-use TensorFlowInferenceInterface class, which is defined in the Org.Tensorflow.Contrib.Android namespace and distributed in an 11MB shared library. Projects under Tensorflow’s Contrib directory aren’t officially supported, but this project appears to have an active community of committers and I suspect it will continue into the future.

Although binding an Android or iOS native library sometimes has complexities, in this case the binding project is trivial. It’s literally just a matter of putting the Java files in the appropriate place in the /Jars subdirectory and letting the Xamarin infrastructure take care of the rest.

The Android project’s /Assets directory should contain a copy of the Tensorflow protobuf weight file generated by To use the .mlmodel file produced by Keras2CoreML, though, an additional step is required. While the .mlmodel file is good for sharing, actual runtime use requires it to be compiled into a different format.

On a Macintosh that has Xcode installed, you convert a .mlmodel using xcrun compile LSTM_TidePrediction.mlmodel LSTM_Tide­Prediction. The result is a directory called LSTM_TidePredic­tion.mlmodelc that contains several model-defining files. This folder should be moved to the iOS project’s Resources folder. (The source-code distribution has already performed all the steps necessary and the TAI library, Tensorflow and CoreML models are all in their proper locations.)

With the project structure in place, let’s briefly discuss the UI. This isn’t my strong suit, as Figure 7 amply demonstrates. The app consists of a scrolling graph of predictions (using Aloïs Deniel’s Microcharts package available on NuGet or, three buttons that allow you to load and predict tides based on any of three datasets, and the prediction values themselves.

Xamarin.Forms Under iOS (left) and Android (right), Using CoreML and Tensorflow Android Inference
Figure 7 Xamarin.Forms Under iOS (left) and Android (right), Using CoreML and Tensorflow Android Inference

You’ll notice that while the general shape of the overall prediction is consistent, the fine-grained predictions are often in disagreement by as much as a few inches. There are several possible reasons for this. The output of a deep neural net is the product of thousands of floating-point multiplications and sums; differences in low-level representations could easily accumulate to significant levels. Another possibility is differences between CoreML and Tensorflow in the implementation of the LSTM or feed-forward nodes. A third possibility is that the differences are introduced during the conversion of the Keras model to CoreML. One way or the other, the differences highlight the importance of validating the model’s behavior on the device.

Device-Based Inferencing

Compared to the complexity of developing the model, on-device inferencing is quite simple and consistent between projects:

Load the model into the library’s inferencing class. Once on the device, the model is essentially a blackbox, with little support for reading or modifying internal state. The inferencing functions can be expressed as an interface in the Xamarin.Forms base project:

public interface ITidePredictor
  /// <summary>
  /// From 200 input sea levels, in feet, taken every 3 hours,
  /// predict 100 new sea levels (next 300 hours)
  /// </summary>
  /// <returns>100 levels, in feet, representing the predicted tide
  /// from the final input time to + i * 3 hours</returns>
  /// <param name="sealevelInputs">200 water levels, measured in feet,
  /// taken every 3 hours</param>
  float[] Predict(float[] seaLevelInputs);

The native Android class for inferencing is Org.Tensorflow.Contrib.Android.TensorflowInferenceInterface and the iOS class is CoreML.MLModel. Both are loaded in their device project’s constructor, as shown in Figures 8 and Figure 9.

Figure 8 The Android Inferencing Class

public class TensorflowInferencePredictor : ITidePredictor
  const string MODEL_FILE_URL = "android_asset/TF_LSTM_Inference.pb";
  const string INPUT_ARGUMENT_NAME = "lstm_1_input";
  const string OUTPUT_VARIABLE_NAME = "output_node0";
  const int OUTPUT_SIZE = 100;
  TensorFlowInferenceInterface inferenceInterface;
  public TensorflowInferencePredictor(AssetManager assetManager)
    inferenceInterface = new TensorFlowInferenceInterface(
      assetManager, MODEL_FILE_URL);
  public float[] Predict(float[] inputSeaLevels)
    inferenceInterface.Feed(INPUT_ARGUMENT_NAME, inputSeaLevels,
      inputSeaLevels.Length, 1, 1);
    inferenceInterface.Run(new string[] { OUTPUT_VARIABLE_NAME });
    float[] predictions = new float[OUTPUT_SIZE];
    inferenceInterface.Fetch(OUTPUT_VARIABLE_NAME, predictions);
    return predictions;

Figure 9 The iOS Inferencing Class

public class CoreMLTidePredictor : NSObject, ITidePredictor
  public event EventHandler<EventArgsT<String>> ErrorOccurred = delegate { };
  MLModel model;
  const int OUTPUT_SIZE = 100;
  const int OUTPUT_FIELD_NAME = "predicted_tide_ft";
  public CoreMLTidePredictor()
    // Load the ML model
    var bundle = NSBundle.MainBundle;
    var assetPath = bundle.GetUrlForResource("LSTM_TidePrediction", "mlmodelc");
    NSError mlErr;
    model = MLModel.Create(assetPath, out mlErr);
    if (mlErr != null)
      ErrorOccurred(this, new EventArgsT<string>(mlErr.ToString()));
  public float[] Predict(float[] seaLevelInputs)
    var inputs = new TideInput(seaLevelInputs);
    NSError mlErr;
    var prediction = model.GetPrediction(inputs, out mlErr);
    if(mlErr != null){
      ErrorOccurred(this, new EventArgsT<string>(mlErr.ToString()));
    var predictionMultiArray = prediction.GetFeatureValue(
    var predictedLevels = new float[OUTPUT_SIZE];
    for (int i = 0; i < OUTPUT_SIZE; i++)
      predictedLevels[i] = predictionMultiArray[i].FloatValue;
    return predictedLevels;

In both cases, the model is loaded from a file, but both CoreML and TAI support loading a model from a Web-based URL. While a Web-based model is obviously easier to update, the tradeoff is that many ML models are extremely large. The tide prediction model is only a few hundred kilobytes in size, but image-recognition models often weigh in at hundreds of megabytes.

Configure input data. Because ML libraries have their own internal datatypes and structures, almost all interact with calling programs using dictionaries of strings to input and output data. Although the names can be set in Keras, conversion to CoreML and Tensorflow has a tendency to mangle them. In the case of Android, the input sealevels are associated with the string “lstm_1_input” and the output predictions with “output_node0.” Configuring input is easy in Android, as conversion isn’t necessary. As you can see in the call to Feed in Figure 8, the input array is passed, followed by inputSeaLevels.Length, 1, 1. This encodes the shape of the input data: 200 rows, each containing 1 feature defined by 1 value.

CoreML input and output is more complex. While TAI takes and returns managed arrays, CoreML works with MLFeatureValue datatypes defined in the CoreML namespace and presumably tuned to Apple hardware. The inputs to the model are defined in the TideInput class, shown in Figure 10. Note that TideInput is defined as implementing the IMLFeatureProvider interface. The MLModel object knows the names and types of its expected inputs, and uses the IMLFeatureProvider interface to retrieve that data.  The FeatureNames property must mimic the set of expected variables names, and the GetFeatureValue method must provide the data for the relevant string.

Figure 10 Configuring CoreML Input

class TideInput : NSObject, IMLFeatureProvider
  MLFeatureValue readings;
  MLMultiArray lstm_1_h_in, lstm_1_c_in;
  const int INPUT_SIZE = 200;
  const int MIDDLE_SIZE = 128;
  public NSSet<NSString> FeatureNames
      return new NSSet<NSString>(
        new NSString("readings"),
        new NSString("lstm_1_h_in"),
        new NSString("lstm_1_c_in")
  public MLFeatureValue GetFeatureValue(string featureName)
    switch (featureName)
      case "readings": return readings;
      case "lstm_1_h_in": return MLFeatureValue.Create(lstm_1_h_in);
      case "lstm_1_c_in": return MLFeatureValue.Create(lstm_1_c_in);
      default: throw new ArgumentOutOfRangeException();
  public TideInput(float[] tideInputData)
    // 200 elements, 1 batch, 1 feature
    NSError mlErr;
    var ma = new MLMultiArray(new nint[] { INPUT_SIZE, 1, 1 },
      MLMultiArrayDataType.Double, out mlErr);
    for (int i = 0; i < INPUT_SIZE; i++)
      ma[i] = tideInputData[i];
    readings = MLFeatureValue.Create(ma);
    lstm_1_h_in = new MLMultiArray(new nint[] { MIDDLE_SIZE },
      MLMultiArrayDataType.Double, out mlErr);
    lstm_1_c_in = new MLMultiArray(new nint[] { MIDDLE_SIZE },
      MLMultiArrayDataType.Double, out mlErr);
    for (int i = 0; i < MIDDLE_SIZE; i++)
      lstm_1_h_in[i] = lstm_1_c_in[i] = new NSNumber(0.0);

When converting the Keras tide-prediction model to CoreML, the converter told us that the model takes as input 3 MLMulti­Array objects. The TideInput class needs to initialize those objects. The first is the expected readings input with its [200, 1, 1] shape:

var ma = new MLMultiArray(new nint[] { INPUT_SIZE, 1, 1 },
  MLMultiArrayDataType.Double, out mlErr);
for (int i = 0; i < INPUT_SIZE; i++)
  ma[i] = tideInputData[i];
readings = MLFeatureValue.Create(ma);

The other expected inputs (lstm_1_h_in and lstm_1_c_in of shape [128, 1, 1]) are more surprising, but “h” is often used for an LSTM’s output and “C” for the cell’s state. Setting all the values for these inputs to 0 results in correct predictions, so that’s what TideInput does.

Call the inferencing function. With everything configured, it’s time for the magic! The Android call inferenceInterface.Run(new string[] { OUTPUT_VARIABLE_NAME });, and the iOS call, var prediction = model.GetPrediction(inputs, out mlErr);, perform the actual inferencing. In both cases, this is a synchronous call. Timing, of course, varies from machine to machine and model to model, and is very quick on the small tide-prediction model. With image-recognition models, though, I’ve seen this function take many hundreds of milliseconds. When writing apps that work with video frames, I’ve used simple background processing and a simple flag to throttle requests to the ML library. In iOS 11.1, Apple added new Pressure Level APIs for its video subsystem that can interrupt capture sessions if the hardware gets too hot, which supports the intuition that things like continuous image recognition and augmented reality are pretty darn processor-intense!

Retrieve the output variable or variables. Just as with the inputs, the outputs of the model are associated with strings. In the case of Android:

float[] predictions = new float[OUTPUT_SIZE];
inferenceInterface.Fetch(OUTPUT_VARIABLE_NAME, predictions);

These populate the predictions array, while the iOS code is only slightly more complex:

var predictionMultiArray =
var predictedLevels = new float[OUTPUT_SIZE];
for (int i = 0; i < OUTPUT_SIZE; i++)
  predictedLevels[i] = predictionMultiArray[i].FloatValue;

Both CoreML and TAI are performance-optimized libraries that do little or no input validation. Data is basically treated as raw C buffers; mistakes in input size, shape or format can result in diagnostic-free crashes or, even more confusingly, complaint-free “Garbage In, Garbage Out” results.

The field of ML is developing at a blistering pace, and it would be a full-time job just to evaluate the latest research papers and industry announcements. While the cutting edge is fascinating, huge amounts of value can be unlocked using techniques and architectures that have been around practically forever (that is, years).

There is a rough progression of difficulty in ML tasks from pattern recognition to classification to sequence modeling to sequence-to-sequence generation. Of course, the signal needs to be present in the data, and recognizing a pattern in a very noisy stream may be more difficult than generating something that acts like a simple Markov model (I’m looking at you, “Neural Net Generates Funny Boat Names!” articles).

Keep It Simple

My ML models are almost exclusively done with high-level abstractions and well-known architectures. Even though I can rarely resist pre-processing my data to emphasize necessary features, this often turns out to be premature optimization. It’s amazing how often “the simplest architecture that could possibly work” manages to extract a signal from unprocessed data. Lord Kelvin’s tide computer relied on the results of a Fourier analysis of water levels, but the simple model I developed reproduces at least the short-term epicycles pretty well. (I plan on seeing if I can generate the classic harmonic components from raw data—stay tuned to my Twitter feed.)

Whether your ambitions involve super-human competence, in-the-field business intelligence, or simply boosting your career prospects, modern ML is a fascinating field with powerful capabilities and seemingly unlimited potential for delivering value. Delivering that ML capability on devices is straightforward with Microsoft’s Xamarin technologies and the techniques described in this article. 

8 Key Considerations

Use these eight steps to deliver an AI/ML solution on mobile devices:

  1. Choose an ML library compatible with the device OS or systems you’re targeting.
  2. Develop a data-wrangling training pipeline that allows you to rapidly explore your data and iterate your model.
  3. Consider using cloud-based resources for final training.
  4. Convert your model to device format.
  5. Convert your on-device data to the form expected by the model.
  6. Treat the on-device inferencing as a “black box” function call.
  7. Validate that the on-device inferencing matches your training results.
  8. Consider implementing the ability to download a new model, but be aware of the size implications.

Larry O’Brien is a senior content publication manager for Microsoft’s Xamarin technologies. His first published technical article was “Developing a Neural Network in C++” for AI Expert in 1989. He’s seen ’em come and he’s seen ’em go. He’s on Twitter: @lobrien.

Thanks to the following Microsoft technical experts for reviewing this article: Anuj Bhatia and Alexander Kyte
Alexander Kyte is a software engineer on the Mono compiler team and a published author, who has been following the recent advancements in Machine Learning

Anuj is an Architect on the Azure team at Microsoft. A biochemist by trade, technologist by day, and dinner party host by night. Learn more at

Discuss this article in the MSDN Magazine forum