Column renaming and column dropping¶
Watch it
See the accompanied youtube video at the link here.
Remember our candybars.csv
dataframe?
Let’s bring it back and save it as object named candy
.
candy = pd.read_csv('candybars.csv')
candy
name | weight | chocolate | peanuts | caramel | nougat | cookie_wafer_rice | coconut | white_chocolate | multi | available_canada_america | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Coffee Crisp | 50 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | Canada |
1 | Butterfinger | 184 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | America |
2 | Skor | 39 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | Both |
3 | Smarties | 45 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | Canada |
4 | Twix | 58 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | Both |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
20 | Take 5 | 43 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | America |
21 | Whatchamacallits | 45 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | America |
22 | Almond Joy | 46 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | America |
23 | Oh Henry | 51 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | Both |
24 | Cookies and Cream | 43 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | Both |
25 rows × 11 columns
Column Renaming¶
There will be times where you are unsatisfied with the column names and you may want to change them.
The proper syntax to do that is with .rename()
.
The column name available_canada_america
is a bit long.
Perhaps it would be a good idea to change it to something shorter like
availability
.
Here is how we can accomplish that.
candy = candy.rename(columns={'available_canada_america':'availability'})
candy
name | weight | chocolate | peanuts | caramel | nougat | cookie_wafer_rice | coconut | white_chocolate | multi | availability | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Coffee Crisp | 50 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | Canada |
1 | Butterfinger | 184 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | America |
2 | Skor | 39 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | Both |
3 | Smarties | 45 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | Canada |
4 | Twix | 58 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | Both |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
20 | Take 5 | 43 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | America |
21 | Whatchamacallits | 45 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | America |
22 | Almond Joy | 46 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | America |
23 | Oh Henry | 51 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | Both |
24 | Cookies and Cream | 43 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | Both |
25 rows × 11 columns
This code uses something we’ve never seen before - {}
curly braces,
also called curly brackets.
These have a special meaning but for now, you only need to concentrate
your attention on the fact that the argument columns
needs to have the
format shown on the slide.
columns={'old column name':'new column name'}
You can also rename multiple columns at once by adding a comma between the new and old column pairs within the curly brackets.
It’s important that we always save the dataframe to an object when making column changes or the changes will not be saved in our dataframe.
candy = candy.rename(columns={'available_canada_america':'availability',
'weight':'weight_g'})
candy.head()
name | weight_g | chocolate | peanuts | caramel | nougat | cookie_wafer_rice | coconut | white_chocolate | multi | availability | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Coffee Crisp | 50 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | Canada |
1 | Butterfinger | 184 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | America |
2 | Skor | 39 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | Both |
3 | Smarties | 45 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | Canada |
4 | Twix | 58 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | Both |
Column Dropping¶
.drop()
is the verb we use to delete columns in a dataframe.
Let’s delete the column coconut
by specifying it in the columns
argument of the drop
verb.
candy.drop(columns='coconut')
name | weight_g | chocolate | peanuts | caramel | nougat | cookie_wafer_rice | white_chocolate | multi | availability | |
---|---|---|---|---|---|---|---|---|---|---|
0 | Coffee Crisp | 50 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | Canada |
1 | Butterfinger | 184 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | America |
2 | Skor | 39 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | Both |
3 | Smarties | 45 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | Canada |
4 | Twix | 58 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | Both |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
20 | Take 5 | 43 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | America |
21 | Whatchamacallits | 45 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | America |
22 | Almond Joy | 46 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | America |
23 | Oh Henry | 51 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | Both |
24 | Cookies and Cream | 43 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | Both |
25 rows × 10 columns
candy.drop(columns='coconut')
name | weight_g | chocolate | peanuts | caramel | nougat | cookie_wafer_rice | white_chocolate | multi | availability | |
---|---|---|---|---|---|---|---|---|---|---|
0 | Coffee Crisp | 50 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | Canada |
1 | Butterfinger | 184 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | America |
2 | Skor | 39 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | Both |
3 | Smarties | 45 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | Canada |
4 | Twix | 58 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | Both |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
20 | Take 5 | 43 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | America |
21 | Whatchamacallits | 45 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | America |
22 | Almond Joy | 46 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | America |
23 | Oh Henry | 51 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | Both |
24 | Cookies and Cream | 43 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | Both |
25 rows × 10 columns
If you look again at the code we just wrote you’ll notice we didn’t save
over the dataframe object, so the dataframe candy
still will contain
the coconut
column.
candy.head()
name | weight_g | chocolate | peanuts | caramel | nougat | cookie_wafer_rice | coconut | white_chocolate | multi | availability | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Coffee Crisp | 50 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | Canada |
1 | Butterfinger | 184 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | America |
2 | Skor | 39 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | Both |
3 | Smarties | 45 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | Canada |
4 | Twix | 58 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | Both |
Let’s overwrite the dataframe and remove multiple columns at the same time.
Let’s drop nougat
and coconut
together.
candy = candy.drop(columns=['nougat', 'coconut'])
candy.head()
name | weight_g | chocolate | peanuts | caramel | cookie_wafer_rice | white_chocolate | multi | availability | |
---|---|---|---|---|---|---|---|---|---|
0 | Coffee Crisp | 50 | 1 | 0 | 0 | 1 | 0 | 0 | Canada |
1 | Butterfinger | 184 | 1 | 1 | 1 | 0 | 0 | 0 | America |
2 | Skor | 39 | 1 | 0 | 1 | 0 | 0 | 0 | Both |
3 | Smarties | 45 | 1 | 0 | 0 | 0 | 0 | 1 | Canada |
4 | Twix | 58 | 1 | 0 | 1 | 1 | 0 | 1 | Both |
We put the columns we want to drop in square brackets and this time we
will remember to overwrite over the candy
object.
Now when we call candy.head()
it reflects the dropped columns. They’re
no longer there.
Let’s apply what we learned!
Here is our fruit_salad
dataframe once again.
name colour location seed shape sweetness water-content weight
0 apple red canada True round True 84 100
1 banana yellow mexico False long True 75 120
2 cantaloupe orange spain True round True 90 1360
3 dragon-fruit magenta china True round False 96 600
4 elderberry purple austria False round True 80 5
5 fig purple turkey False oval False 78 40
6 guava green mexico True oval True 83 450
7 huckleberry blue canada True round True 73 5
8 kiwi brown china True round True 80 76
9 lemon yellow mexico False oval False 83 65
Let’s say we run the following code:
fruit_salad.drop(columns = ['colour', 'shape', 'sweetness'])
fruit_salad = fruit_salad.rename(columns={'location':'country',
'weight':'weight_g'})
Use the dataframe and code above to answer the next 2 questions.
1. After running the code above, How many columns (not including the index) are there in fruit_salad
?
a) 9
b) 4
c) 8
2. After running the code above, which of the following is a column in the dataframe fruit_salad
?
a) country
b) location
Solutions!
c) 8
a)
country