Getting started with AI experimentation
Data scientists and AI engineers collaborate during the experimentation phase of an AI project. They work on exploratory data analysis, prototyping ML approaches, feature engineering, and testing hypotheses.
While hypothesis driven development and agile approaches are encouraged for ML project management, the focus here is on the engineering aspects of experimentation.
For the data science aspects of experimentation, see Model Experimentation.
💡Key outcomes of engineering experimentation:
- Normalization, transformation, and other required pre-processing has been applied to the data (or a subset of the data). Pre-processed data is evaluated for feasibility and suitability for solving the business problem.
- The data has been enriched or augmented to improve its suitability and may even be partially or fully synthetic.
- A Hypothesis driven approach with centrally tracked experiments and potentially compute has been applied.
- Experiments are documented, shared, and can be easily reproduced.
- ML approaches, libraries, and algorithms are tested and evaluated and the best performing approach is selected. The best performing approach may not necessarily be the most accurate model. The best approach could be a trade-off in terms of ease of implementation coupled with accuracy. For example, using AutoML for rapid prototyping and development, or exporting to an ONNX model for deployment to an Edge device.
- The data distribution is recorded and stored as a reference to measure future drift as the data changes.
- An automated pipeline has been designed and potentially partially applied to the Experiments.
Experimentation topics
Experimentation guidance is provided in the following articles
- Automatically finding an AI algorithm: Prototyping approaches and use of Automated Machine Learning (AutoML)
- Using MLOps during experimentation: How to design experiments to be ready for the Model Development phase.
- Exploratory Data Analysis (EDA): The process of understanding what data we have
- Feature engineering: Transforming and enriching data to forms that better support models that address the business problem
- Responsible AI: Ensuring ML solutions follow responsible AI best practices
- Synthetic data generation: Creating training data with similar statistical properties as the real data while securing privacy
Other resources
- The Data Science Toolkit is an open-source collection of proven ML and AI implementation accelerators. Accelerators enable the automation of commonly repeated development processes to allow data science practitioners to focus on delivering complex business value and spend less time on basic setup.
Feedback
https://aka.ms/ContentUserFeedback.
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see:Submit and view feedback for