Reproducible Data Science with Machine Learning

Being able to explain your own code a few months after you wrote it is hard. Imagine having to explain the decisions of some AI algorithm a few years after it run! However, it is relatively easy to set up your development workflow to make that possible, as long as you realize that the way we build ML and AI is fundamentally different from traditional software engineering. In a nutshell, it is all about: reproducible research, development and deployment. It is made possible by a clever use of modern notebook environments, including Azure ML Compute Instances, as opposed to the more traditional IDEs, like Visual Studio Code. Rafal Lukawiecki has been actively working in data science, machine learning, and data mining for well over a decade, and he has formally studied and used artificial intelligence long before it was popular, back in the '90s. Watch this episode to find out how he organizes his reproducible workflow.

Jump To:

  • [02:30] Learn reproducible research with Rafal Lukawiecki
  • [03:01] Modelling and exploration vs software development
  • [09:28] Steps to a reproducible workflow
  • [15:20] Demo: Workflow using RStudio and RMarkdown running locally
  • [22:25] Demo: RMarkdown notebooks in an Azure ML Compute Instance

More Information:

Don't miss new episodes, subscribe to the AI Show