Exercise - Get the rock sample data into Visual Studio Code
Now it's time to get data from the Lunar Sample and Photo Catalog into Visual Studio Code. By doing so, you can use Python to gain insight from the thousands of samples collected across the six Apollo missions that have landed on the Moon.
Set up your coding environment
In this module, you'll learn how to cleanse and manipulate data related to Moon rock samples. To do the following exercises, you need some kind of Python notebook development environment. If you haven't set up a data science development environment yet, the easiest way to do this is to use GitHub Codespaces.
To start using GitHub Codespaces, follow these steps:
- Make sure you're signed in to GitHub.
- Go to Codespaces.
- In the Explore quick start templates section, locate the Jupyter Notebook template and select Use this template.
Note
If the Jupyter Notebook template isn't displayed in the quick start templates, select See all to list all available templates.
The Jupyter Notebook Codespace launches and should look similar to the following screenshot:
The Jupyter Notebook Codespace provides a workspace with Visual Studio Code, Python, and Jupyter Notebooks in your browser.
Tip
For personal GitHub accounts, you can use 120 core hours of Codespaces a month for free. For more information on billing, see About billing for GitHub Codespaces. For more information on how to use Codespaces, see GitHub Codespaces overview.
Note
Instead of using GitHub Codespaces, you can set up a development environment on your own computer by following the instructions in the Visual Studio Code data science setup docs.
After you've created your Codespace, follow these steps to get your environment ready:
- Create a folder called
over-the-moon
. - Open the folder in your Codespace.
- Create a folder inside that folder called
sample-return
. - Create a folder inside that folder called
data
. - Create a file called
sample-return.ipynb
in thesample-return
folder. - Open the
sample-return.ipynb
file in your Codespace.
Your environment should look like this:
Collect and import data
The data that you'll explore during this module is a file full of all the samples collected from the six Apollo missions that landed on the Moon. The rocksamples.csv file was created using information from the Lunar Sample and Photo Catalog.
Copy the contents of the rocksamples.csv
file in GitHub and save to a new CSV file in Codespaces by following these steps:
- Open the
rocksamples.csv
file in GitHub. - In the upper-right corner, select the copy icon. The file contents are copied.
- In Codespaces, select the
over-the-moon/sample-return/data
folder and then select New File. - Name the file
rocksamples.csv
. - Paste the copied data into the file. A message may appear asking you if you want to paste data from another source into your file. Select Allow to paste the data into your file.
Your Codespaces environment should look like this:
In the first Python cell in your sample-return.ipynb
file, import pandas and read the data file in it as a pandas DataFrame:
import pandas as pd
rock_samples = pd.read_csv('data/rocksamples.csv')
Note
If you're prompted to choose a Kernel, select Python Environments and then the recommended version of Python, such as Python 3.10.4.
To make sure that everything is loaded in correctly, print the top five lines of the new DataFrame by using head()
and the information summary by using info()
:
rock_samples.head()
Row | ID | Mission | Type | Subtype | Weight (g) | Pristine (%) |
---|---|---|---|---|---|---|
0 | 10001 | Apollo11 | Soil | Unsieved | 125.8 | 88.36 |
1 | 10002 | Apollo11 | Soil | Unsieved | 5629.0 | 93.73 |
2 | 10003 | Apollo11 | Basalt | Ilmenite | 213.0 | 65.56 |
3 | 10004 | Apollo11 | Core | Unsieved | 44.8 | 71.76 |
4 | 10005 | Apollo11 | Core | Unsieved | 53.4 | 40.31 |
rock_samples.info()
Note
The output is truncated to include only the table summary of data.
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ID 2229 non-null int64
1 Mission 2229 non-null object
2 Type 2229 non-null object
3 Subtype 2226 non-null object
4 Weight(g) 2229 non-null float64
5 Pristine(%) 2229 non-null float64
From this output, we can see that 2,229 samples were collected from the Apollo missions. Looking at a sample of the data, we can see that each row contains:
- ID - The unique ID used to keep track of the sample at NASA.
- Mission - The mission responsible for retrieving the sample.
- Type - The type of sample (type of rock or other classification).
- Subtype - A more specific type classification.
- Weight (g) - The original weight of the sample, in grams.
- Pristine (%) - The percentage of the sample that remains (some sample is used up during research).