Coding for biology using Jupyter Notebooks

Guest post by Dr Benjamin Hall  University of Cambridge


Part of the new world of biology is understanding how to best use computers to do research. Whilst the field of computational biology isn’t new, novel ideas constantly in computer science and mathematics opening up new ways of working and addressing problems. Adopting these techniques can be daunting without some examples to work from, and that’s a big part of what my classes are about.

As part of the systems biology course at the University of Cambridge, I talk about how to write simulation engines. That is, software tools that explore how cells grow and die, or how molecules move in space, without doing experiments in a laboratory. In lectures I discuss the motivations and caveats of developing your own code; when to build bespoke systems, and when to use your other peoples. We talk about balancing risks and benefits, and, working from examples, how you can understand complex datasets.

During the associated computing-laboratory practical, I introduce two concepts that are the students haven’t seen before; functional programming in F#, and constraint solving using the theorem prover Z3. They’ve indirectly used both of these in previous studies- in an earlier practical they use the BioModelAnalyzer, which is written with both F# and Z3.

I teach them these less common approaches to show some of the advantages and opportunities that come from solving problems in a different way. One example is type checking and units of measure in F#; these features effectively rule out some bugs by preventing code that gives the wrong final units. This is massively useful in physical simulators, where the units can be checked after transposing complex functions. Similarly, variable immutability in functional programming closes another class of bugs. These examples and others each show how writing their code differently can offer unexpected advantages; insights that can be reused in future programming.

I updated the practical to run in Jupyter Notebooks, and made the underlying code freely available under the MIT license.

This practical is intended as a brief introduction to the F# programming language and the SMT solver Z3. In the course of this practical you will be performing two types of biological simulation; you will be writing a small Gillespie simulator for the single progenitor model of epithelial stem cells, and editing and exploring logical models of small biological networks. This practical builds on the discussions of F# and Z3 in lecture 4, and the demonstrations in the associated supervision. The goal of this practical is to allow you to see how you model different systems using a functional programming language (F#) and formal logic (using Z3). The final questions in each section are more open ended so aim to spend about 1.5 hours on each component.

Parts of this tutorial are available as an Azure Notebook but you can download and install the notebooks via Github on on a Microsoft Data Science Virtual Machine click the button below to launch these in Notebooks:

Azure Notebooks

Jupyter Notebooks are a new way of writing code where code, detailed comments (using Markdown and Latex for formatting) and images coexist in a single page, accessed through a web browser. It’s used by researchers in my lab to write code and generate visualisations of datasets, and is an increasingly popular way of coding and teaching coding.

The primary advantage to me for Jupyter was that the documentation for the course was embedded around the code. This includes latex formatted math symbols, and allowed the documents to stand alone without handouts- the students could sit down, open a web-browser, and get started. A further benefit was the portability; installing anaconda, and the F# kernel takes minutes and makes it easy to work outside of the lab. This further encourages keen students to tinker with parts of the code that interest them most once they’ve left. This portability can be taken advantage of in Azure notebooks- a cloud based Jupyter instance. This further reduces the barriers as all a student needs is a browser and an internet connection.

This is the first year I’ve run the practical using Jupyter Notebooks, and it’s been a big hit. With strong feedback from the students and researchers I excited to see how we will continue to use the technology in future!

Biology in the Cloud

Github Resources for the course

Getting started with F#

Reference for using F# in Jupyter notebooks.

Getting started with Azure Notebooks