Column renaming and column dropping

Watch it

See the accompanied youtube video at the link here.

Remember our candybars.csv dataframe?

Let’s bring it back and save it as object named candy.

candy = pd.read_csv('candybars.csv')
candy
name weight chocolate peanuts caramel nougat cookie_wafer_rice coconut white_chocolate multi available_canada_america
0 Coffee Crisp 50 1 0 0 0 1 0 0 0 Canada
1 Butterfinger 184 1 1 1 0 0 0 0 0 America
2 Skor 39 1 0 1 0 0 0 0 0 Both
3 Smarties 45 1 0 0 0 0 0 0 1 Canada
4 Twix 58 1 0 1 0 1 0 0 1 Both
... ... ... ... ... ... ... ... ... ... ... ...
20 Take 5 43 1 1 1 0 1 0 0 0 America
21 Whatchamacallits 45 1 1 0 0 1 0 0 0 America
22 Almond Joy 46 1 0 0 0 0 1 0 0 America
23 Oh Henry 51 1 1 1 0 0 0 0 0 Both
24 Cookies and Cream 43 0 0 0 0 1 0 1 0 Both

25 rows × 11 columns

Column Renaming

There will be times where you are unsatisfied with the column names and you may want to change them.

The proper syntax to do that is with .rename().

The column name available_canada_america is a bit long.

Perhaps it would be a good idea to change it to something shorter like availability.

Here is how we can accomplish that.

candy = candy.rename(columns={'available_canada_america':'availability'})
candy
name weight chocolate peanuts caramel nougat cookie_wafer_rice coconut white_chocolate multi availability
0 Coffee Crisp 50 1 0 0 0 1 0 0 0 Canada
1 Butterfinger 184 1 1 1 0 0 0 0 0 America
2 Skor 39 1 0 1 0 0 0 0 0 Both
3 Smarties 45 1 0 0 0 0 0 0 1 Canada
4 Twix 58 1 0 1 0 1 0 0 1 Both
... ... ... ... ... ... ... ... ... ... ... ...
20 Take 5 43 1 1 1 0 1 0 0 0 America
21 Whatchamacallits 45 1 1 0 0 1 0 0 0 America
22 Almond Joy 46 1 0 0 0 0 1 0 0 America
23 Oh Henry 51 1 1 1 0 0 0 0 0 Both
24 Cookies and Cream 43 0 0 0 0 1 0 1 0 Both

25 rows × 11 columns

This code uses something we’ve never seen before - {} curly braces, also called curly brackets.

These have a special meaning but for now, you only need to concentrate your attention on the fact that the argument columns needs to have the format shown on the slide.

 columns={'old column name':'new column name'}

You can also rename multiple columns at once by adding a comma between the new and old column pairs within the curly brackets.

It’s important that we always save the dataframe to an object when making column changes or the changes will not be saved in our dataframe.

candy = candy.rename(columns={'available_canada_america':'availability',
                        'weight':'weight_g'})
candy.head()
name weight_g chocolate peanuts caramel nougat cookie_wafer_rice coconut white_chocolate multi availability
0 Coffee Crisp 50 1 0 0 0 1 0 0 0 Canada
1 Butterfinger 184 1 1 1 0 0 0 0 0 America
2 Skor 39 1 0 1 0 0 0 0 0 Both
3 Smarties 45 1 0 0 0 0 0 0 1 Canada
4 Twix 58 1 0 1 0 1 0 0 1 Both

Column Dropping

.drop() is the verb we use to delete columns in a dataframe.

Let’s delete the column coconut by specifying it in the columns argument of the drop verb.

candy.drop(columns='coconut')
name weight_g chocolate peanuts caramel nougat cookie_wafer_rice white_chocolate multi availability
0 Coffee Crisp 50 1 0 0 0 1 0 0 Canada
1 Butterfinger 184 1 1 1 0 0 0 0 America
2 Skor 39 1 0 1 0 0 0 0 Both
3 Smarties 45 1 0 0 0 0 0 1 Canada
4 Twix 58 1 0 1 0 1 0 1 Both
... ... ... ... ... ... ... ... ... ... ...
20 Take 5 43 1 1 1 0 1 0 0 America
21 Whatchamacallits 45 1 1 0 0 1 0 0 America
22 Almond Joy 46 1 0 0 0 0 0 0 America
23 Oh Henry 51 1 1 1 0 0 0 0 Both
24 Cookies and Cream 43 0 0 0 0 1 1 0 Both

25 rows × 10 columns

candy.drop(columns='coconut')
name weight_g chocolate peanuts caramel nougat cookie_wafer_rice white_chocolate multi availability
0 Coffee Crisp 50 1 0 0 0 1 0 0 Canada
1 Butterfinger 184 1 1 1 0 0 0 0 America
2 Skor 39 1 0 1 0 0 0 0 Both
3 Smarties 45 1 0 0 0 0 0 1 Canada
4 Twix 58 1 0 1 0 1 0 1 Both
... ... ... ... ... ... ... ... ... ... ...
20 Take 5 43 1 1 1 0 1 0 0 America
21 Whatchamacallits 45 1 1 0 0 1 0 0 America
22 Almond Joy 46 1 0 0 0 0 0 0 America
23 Oh Henry 51 1 1 1 0 0 0 0 Both
24 Cookies and Cream 43 0 0 0 0 1 1 0 Both

25 rows × 10 columns

If you look again at the code we just wrote you’ll notice we didn’t save over the dataframe object, so the dataframe candy still will contain the coconut column.

candy.head()
name weight_g chocolate peanuts caramel nougat cookie_wafer_rice coconut white_chocolate multi availability
0 Coffee Crisp 50 1 0 0 0 1 0 0 0 Canada
1 Butterfinger 184 1 1 1 0 0 0 0 0 America
2 Skor 39 1 0 1 0 0 0 0 0 Both
3 Smarties 45 1 0 0 0 0 0 0 1 Canada
4 Twix 58 1 0 1 0 1 0 0 1 Both

Let’s overwrite the dataframe and remove multiple columns at the same time.

Let’s drop nougat and coconut together.

candy = candy.drop(columns=['nougat', 'coconut'])
candy.head()
name weight_g chocolate peanuts caramel cookie_wafer_rice white_chocolate multi availability
0 Coffee Crisp 50 1 0 0 1 0 0 Canada
1 Butterfinger 184 1 1 1 0 0 0 America
2 Skor 39 1 0 1 0 0 0 Both
3 Smarties 45 1 0 0 0 0 1 Canada
4 Twix 58 1 0 1 1 0 1 Both

We put the columns we want to drop in square brackets and this time we will remember to overwrite over the candy object.

Now when we call candy.head() it reflects the dropped columns. They’re no longer there.

Let’s apply what we learned!

Here is our fruit_salad dataframe once again.

           name    colour    location    seed   shape  sweetness   water-content  weight
0         apple       red     canada    True   round     True          84         100
1        banana    yellow     mexico   False    long     True          75         120
2    cantaloupe    orange      spain    True   round     True          90        1360
3  dragon-fruit   magenta      china    True   round    False          96         600
4    elderberry    purple    austria   False   round     True          80           5
5           fig    purple     turkey   False    oval    False          78          40
6         guava     green     mexico    True    oval     True          83         450
7   huckleberry      blue     canada    True   round     True          73           5
8          kiwi     brown      china    True   round     True          80          76
9         lemon    yellow     mexico   False    oval    False          83          65

Let’s say we run the following code:

fruit_salad.drop(columns = ['colour', 'shape', 'sweetness'])
fruit_salad = fruit_salad.rename(columns={'location':'country',
                                          'weight':'weight_g'})

Use the dataframe and code above to answer the next 2 questions.

1. After running the code above, How many columns (not including the index) are there in fruit_salad ?
a) 9
b) 4
c) 8

2. After running the code above, which of the following is a column in the dataframe fruit_salad?
a) country
b) location