Exercise - Prioritize Moon rock sample gathering based on data

Completed

Determining which types of samples to collect from the Moon requires expertise, but we can start to make some assumptions to learn how to clean and manipulate data.

First, we can determine how much remains of each sample that was returned from the Apollo missions by multiplying the weight of the sample that was originally collected by the percentage of remaining pristine sample.

rock_samples['Remaining (kg)'] = rock_samples['Weight (kg)'] * (rock_samples['Pristine (%)'] * .01)
rock_samples.head()

Note

You need to multiply the Pristine (%) column by 0.01 because it was being represented as a whole number.

Looking at the head() or info() of the rock_samples DataFrame isn't useful for examining more than 2,000 samples. To get a better understanding of what the dataset contains, you can use the describe() function:

rock_samples.describe()
ID Remaining (kg) Weight (kg) Pristine (%) Remaining (kg)
count 2229.000000 2229.000000 2229.000000 2229.000000
mean 52058.432032 0.168253 84.512764 0.138103
std 26207.651471 0.637286 22.057299 0.525954
min 10001.000000 0.000000 0.000000 0.000000
25% 15437.000000 0.003000 80.010000 0.002432
50% 65527.000000 0.010200 92.300000 0.008530
75% 72142.000000 0.093490 98.140000 0.078240
max 79537.000000 11.729000 180.000000 11.169527

This information helps us see that, on average, each sample weighs about .16 kg and has about 84% of the original amount remaining. We can use this knowledge to determine which samples are likely running low, which means that they have been used a lot by researchers.

low_samples = rock_samples.loc[(rock_samples['Weight (kg)'] >= .16) & (rock_samples['Pristine (%)'] <= 50)]
low_samples.head()
Index ID Mission Type Subtype Weight (kg) Pristine (%) Remaining (kg)
11 10017 Apollo11 Basalt Ilmenite 0.973 43.71 0.425298
14 10020 Apollo11 Basalt Ilmenite 0.425 27.88 0.118490
15 10021 Apollo11 Breccia Regolith 0.250 30.21 0.075525
29 10045 Apollo11 Basalt Olivine 0.185 12.13 0.022441
37 10057 Apollo11 Basalt Ilmenite 0.919 35.15 0.323028
low_samples.info()
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   ID             27 non-null     int64  
 1   Mission        27 non-null     object 
 2   Type           27 non-null     object 
 3   Subtype        27 non-null     object 
 4   Weight (kg)     27 non-null     float64
 5   Pristine (%)    27 non-null     float64
 6   Remaining (kg)  27 non-null     float64

Twenty-seven samples seem like a small amount to base a recommendation on. We can probably find some other samples that are needed for more research here on Earth. To discover them, we can use the unique() function to see how many unique types we have across the low_samples and rock_samples DataFrames.

low_samples.Type.unique()
array(['Basalt', 'Breccia', 'Soil', 'Core'], dtype=object)
rock_samples.Type.unique()
array(['Soil', 'Basalt', 'Core', 'Breccia', 'Special', 'Crustal'], dtype=object)

We can see that, although six unique types were collected across all samples, the samples that are running low are from only four unique types. But this data doesn't tell us everything about the samples we might want to focus on. For example, in our low_samples DataFrame, how many of each type are considered low?

low_samples.groupby('Type')['Weight (kg)'].count()

Note

Here we are using the Weight (kg) column to count the number of rows for each type that we've grouped by. The actual weight has no impact.

Type
Basalt     14
Breccia     8
Core        1
Soil        4
Name: Weight (kg), dtype: int64

Notice that there are more Basalt and Breccia type rocks with low samples than those of Core and Soil. Additionally, because the likelihood is high that every mission has some Core and Soil collection requirements, we can focus on the Basalt and Breccia rock types for the samples that we need to have collected:

needed_samples = low_samples[low_samples['Type'].isin(['Basalt', 'Breccia'])]
needed_samples.info()
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   ID             22 non-null     int64  
 1   Mission        22 non-null     object 
 2   Type           22 non-null     object 
 3   Subtype        22 non-null     object 
 4   Weight (kg)     22 non-null     float64
 5   Pristine (%)    22 non-null     float64
 6   Remaining (kg)  22 non-null     float64

But are Basalt and Breccia the only two types of rocks we want to look for?