Getting started with AI experimentation

Data scientists and AI engineers collaborate during the experimentation phase of an AI project. They work on exploratory data analysis, prototyping ML approaches, feature engineering, and testing hypotheses.

While hypothesis driven development and agile approaches are encouraged for ML project management, the focus here is on the engineering aspects of experimentation.

For the data science aspects of experimentation, see Model Experimentation.

💡Key outcomes of engineering experimentation:

  • Normalization, transformation, and other required pre-processing has been applied to the data (or a subset of the data). Pre-processed data is evaluated for feasibility and suitability for solving the business problem.
  • The data has been enriched or augmented to improve its suitability and may even be partially or fully synthetic.
  • A Hypothesis driven approach with centrally tracked experiments and potentially compute has been applied.
  • Experiments are documented, shared, and can be easily reproduced.
  • ML approaches, libraries, and algorithms are tested and evaluated and the best performing approach is selected. The best performing approach may not necessarily be the most accurate model. The best approach could be a trade-off in terms of ease of implementation coupled with accuracy. For example, using AutoML for rapid prototyping and development, or exporting to an ONNX model for deployment to an Edge device.
  • The data distribution is recorded and stored as a reference to measure future drift as the data changes.
  • An automated pipeline has been designed and potentially partially applied to the Experiments.

Experimentation topics

Experimentation guidance is provided in the following articles

Other resources

  • The Data Science Toolkit is an open-source collection of proven ML and AI implementation accelerators. Accelerators enable the automation of commonly repeated development processes to allow data science practitioners to focus on delivering complex business value and spend less time on basic setup.