Verify your code locally
Whenever you change any code in your machine learning project, you want to verify the code and model quality.
During continuous integration, you create and verify assets for your application. As a data scientist, you'll probably focus on creating scripts used for data preparation and model training. The machine learning engineer uses the scripts later in pipelines to automate these processes.
To verify your scripts, there are two common tasks:
- Linting: Check for programmatic or stylistic errors in Python or R scripts.
- Unit testing: Check the performance of the content of the scripts.
By verifying your code, you prevent bugs or issues when the model is deployed. You can verify your code locally, by running linters and unit tests locally in an IDE like Visual Studio Code.
You can also run linters and unit tests in an automated workflow with Azure Pipelines or GitHub Actions.
You'll learn how to run linting and unit tests in Visual Studio Code.
Lint your code
The quality of your code depends on the standards you and your team agree on. To ensure that the agreed upon quality is met, you can run linters that will check whether the code conforms to the standards of the team.
Depending on the code language you use, there are several options to lint your code. For example, if you work with Python, you can use either Flake8 or Pylint.
Use Flake8 to lint your code
To use Flake8 locally with Visual Studio Code:
- Install Flake8 with
pip install flake8
. - Create a configuration file
.flake8
and store the file in your repo. - Configure Visual Studio Code to use Flake8 as the linter by going to your settings (
Ctrl+,
). - Search for
flake8
. - Enable Python > Linting > Flake8 Enabled.
- Set the Flake8 path to the location in your repo where you stored your
.flake8
file.
To specify what your team's standards are for code quality, you can configure the Flake8 linter. A common method to define the standards is by creating a .flake8
file that is stored with your code.
The .flake8
file should start with [flake8]
, followed by any of the configurations you want to use.
Tip
A full list of possible configuration parameters can be found in the Flake8 documentation.
For example, if you want to specify that the maximum length of any line can't be more than 80 characters, you'll add the following line to your .flake8
file:
[flake8]
max-line-length = 80
Flake8 has a predefined list of errors it can return. Additionally, you can make use of error codes that are based on the PEP 8 style guide. For example, you can include error codes that refer to proper use of indentation or white spaces.
You can choose to either select (select
) a set of error codes that will be part of the linter or select which error codes to ignore (ignore
) from the default list of options.
As a result, your .flake8
configuration file may look like the following example:
[flake8]
ignore =
W504,
C901,
E41
max-line-length = 79
exclude =
.git,
.cache,
per-file-ignores =
code/__init__.py:D104
max-complexity = 10
import-order-style = pep8
Tip
For an overview of error codes you can refer to, review the Flake8 Error list
When you've configured Visual Studio Code to lint your code, you can open any code file to review the lint results. Any warnings or errors will be underlined. You can select View problem to inspect the issue to understand the error.
Lint with Azure Pipelines or GitHub Actions
You can also run the linter automatically with Azure Pipelines or GitHub Actions. The agent provided by either platform will run the linter when you:
- Create a configuration file
.flake8
and store the file in your repo. - Define the continuous integration pipeline or workflow in YAML.
- As a task or step, install Flake8 with
python -m pip install flake8
. - As a task or step, run the
flake8
command to lint your code.
Unit tests
Where linting verifies how you wrote the code, unit tests check how your code works. Units refer to the code you create. Unit testing is therefore also known as code testing.
As a best practice, your code should exist mostly out of functions. Whether you've created functions to prepare data, or to train a model. You can apply unit testing to, for example:
- Check that column names are right.
- Check the prediction level of model on new datasets.
- Check the distribution of prediction levels.
When you work with Python, you can use Pytest and Numpy (which uses the Pytest framework) to test your code. To learn more about working with Pytest, learn how to write tests with Pytest.
Tip
Review a more detailed walkthrough of Python testing in Visual Studio Code.
Imagine you created a training script train.py
, which contains the following function:
# Train the model, return the model
def train_model(data, ridge_args):
reg_model = Ridge(**ridge_args)
reg_model.fit(data["train"]["X"], data["train"]["y"])
return reg_model
Assume you stored the training script in the directory src/model/train.py
within your repo. To test the train_model
function, you must import the function from src.model.train
.
You create the test_train.py
file in the tests
folder. One way to test Python code is to use numpy
. Numpy offers several assert
functions to compare arrays, strings, objects, or items.
Tip
Learn more about testing guidelines when using Numpy testing, and Numpy's test support.
For example, to test the train_model
function, you can use a small training dataset and use assert
to verify whether the predictions are almost equal to your predefined performance metrics.
import numpy as np
from src.model.train import train_model
def test_train_model():
X_train = np.array([1, 2, 3, 4, 5, 6]).reshape(-1, 1)
y_train = np.array([10, 9, 8, 8, 6, 5])
data = {"train": {"X": X_train, "y": y_train}}
reg_model = train_model(data, {"alpha": 1.2})
preds = reg_model.predict([[1], [2]])
np.testing.assert_almost_equal(preds, [9.93939393939394, 9.03030303030303])
To test your code in Visual Studio Code using the UI:
- Install all necessary libraries to run the training script.
- Ensure
pytest
is installed and enabled within Visual Studio Code. - Install the Python extension for Visual Studio Code.
- Select the
train.py
script you want to test. - Select the Testing tab from the left menu.
- Configure Python testing by selecting pytest and setting the test directory to your
tests/
folder. - Run all tests by selecting the play button and review the results.
To run the test in an Azure DevOps Pipeline or GitHub Action:
- Ensure all necessary libraries are installed to run the training script. Ideally, use a
requirements.txt
listing all libraries withpip install -r requirements.txt
- Install
pytest
withpip install pytest
- Run the tests with
pytest tests/
The results of the tests will show in the output of the pipeline or workflow you run.
Note
If either during linting or unit testing, an error is returned, the CI pipeline may fail. It's therefore better to verify your code locally first, before triggering the CI pipeline.