Exercise - Prioritize Moon rock sample gathering based on data
Determining which types of samples to collect from the Moon requires expertise, but we can start to make some assumptions to learn how to clean and manipulate data.
First, we can determine how much remains of each sample that was returned from the Apollo missions by multiplying the weight of the sample that was originally collected by the percentage of remaining pristine sample.
rock_samples['Remaining (kg)'] = rock_samples['Weight (kg)'] * (rock_samples['Pristine (%)'] * .01)
rock_samples.head()
Note
You need to multiply the Pristine (%) column by 0.01 because it was being represented as a whole number.
Looking at the head()
or info()
of the rock_samples
DataFrame isn't useful for examining more than 2,000 samples. To get a better understanding of what the dataset contains, you can use the describe()
function:
rock_samples.describe()
ID | Remaining (kg) | Weight (kg) | Pristine (%) | Remaining (kg) |
---|---|---|---|---|
count | 2229.000000 | 2229.000000 | 2229.000000 | 2229.000000 |
mean | 52058.432032 | 0.168253 | 84.512764 | 0.138103 |
std | 26207.651471 | 0.637286 | 22.057299 | 0.525954 |
min | 10001.000000 | 0.000000 | 0.000000 | 0.000000 |
25% | 15437.000000 | 0.003000 | 80.010000 | 0.002432 |
50% | 65527.000000 | 0.010200 | 92.300000 | 0.008530 |
75% | 72142.000000 | 0.093490 | 98.140000 | 0.078240 |
max | 79537.000000 | 11.729000 | 180.000000 | 11.169527 |
This information helps us see that, on average, each sample weighs about .16 kg and has about 84% of the original amount remaining. We can use this knowledge to determine which samples are likely running low, which means that they have been used a lot by researchers.
low_samples = rock_samples.loc[(rock_samples['Weight (kg)'] >= .16) & (rock_samples['Pristine (%)'] <= 50)]
low_samples.head()
Index | ID | Mission | Type | Subtype | Weight (kg) | Pristine (%) | Remaining (kg) |
---|---|---|---|---|---|---|---|
11 | 10017 | Apollo11 | Basalt | Ilmenite | 0.973 | 43.71 | 0.425298 |
14 | 10020 | Apollo11 | Basalt | Ilmenite | 0.425 | 27.88 | 0.118490 |
15 | 10021 | Apollo11 | Breccia | Regolith | 0.250 | 30.21 | 0.075525 |
29 | 10045 | Apollo11 | Basalt | Olivine | 0.185 | 12.13 | 0.022441 |
37 | 10057 | Apollo11 | Basalt | Ilmenite | 0.919 | 35.15 | 0.323028 |
low_samples.info()
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ID 27 non-null int64
1 Mission 27 non-null object
2 Type 27 non-null object
3 Subtype 27 non-null object
4 Weight (kg) 27 non-null float64
5 Pristine (%) 27 non-null float64
6 Remaining (kg) 27 non-null float64
Twenty-seven samples seem like a small amount to base a recommendation on. We can probably find some other samples that are needed for more research here on Earth. To discover them, we can use the unique()
function to see how many unique types we have across the low_samples
and rock_samples
DataFrames.
low_samples.Type.unique()
array(['Basalt', 'Breccia', 'Soil', 'Core'], dtype=object)
rock_samples.Type.unique()
array(['Soil', 'Basalt', 'Core', 'Breccia', 'Special', 'Crustal'], dtype=object)
We can see that, although six unique types were collected across all samples, the samples that are running low are from only four unique types. But this data doesn't tell us everything about the samples we might want to focus on. For example, in our low_samples
DataFrame, how many of each type are considered low?
low_samples.groupby('Type')['Weight (kg)'].count()
Note
Here we are using the Weight (kg) column to count the number of rows for each type that we've grouped by. The actual weight has no impact.
Type
Basalt 14
Breccia 8
Core 1
Soil 4
Name: Weight (kg), dtype: int64
Notice that there are more Basalt and Breccia type rocks with low samples than those of Core and Soil. Additionally, because the likelihood is high that every mission has some Core and Soil collection requirements, we can focus on the Basalt and Breccia rock types for the samples that we need to have collected:
needed_samples = low_samples[low_samples['Type'].isin(['Basalt', 'Breccia'])]
needed_samples.info()
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ID 22 non-null int64
1 Mission 22 non-null object
2 Type 22 non-null object
3 Subtype 22 non-null object
4 Weight (kg) 22 non-null float64
5 Pristine (%) 22 non-null float64
6 Remaining (kg) 22 non-null float64
But are Basalt and Breccia the only two types of rocks we want to look for?