Column renaming and column dropping¶

Watch it

See the accompanied youtube video at the link here.

Remember our candybars.csv dataframe?

Let’s bring it back and save it as object named candy.

candy = pd.read_csv('candybars.csv')
candy

	name	weight	chocolate	peanuts	caramel	nougat	cookie_wafer_rice	coconut	white_chocolate	multi	available_canada_america
0	Coffee Crisp	50	1	0	0	0	1	0	0	0	Canada
1	Butterfinger	184	1	1	1	0	0	0	0	0	America
2	Skor	39	1	0	1	0	0	0	0	0	Both
3	Smarties	45	1	0	0	0	0	0	0	1	Canada
4	Twix	58	1	0	1	0	1	0	0	1	Both
...	...	...	...	...	...	...	...	...	...	...	...
20	Take 5	43	1	1	1	0	1	0	0	0	America
21	Whatchamacallits	45	1	1	0	0	1	0	0	0	America
22	Almond Joy	46	1	0	0	0	0	1	0	0	America
23	Oh Henry	51	1	1	1	0	0	0	0	0	Both
24	Cookies and Cream	43	0	0	0	0	1	0	1	0	Both

25 rows × 11 columns

Column Renaming¶

There will be times where you are unsatisfied with the column names and you may want to change them.

The proper syntax to do that is with .rename().

The column name available_canada_america is a bit long.

Perhaps it would be a good idea to change it to something shorter like availability.

Here is how we can accomplish that.

candy = candy.rename(columns={'available_canada_america':'availability'})
candy

	name	weight	chocolate	peanuts	caramel	nougat	cookie_wafer_rice	coconut	white_chocolate	multi	availability
0	Coffee Crisp	50	1	0	0	0	1	0	0	0	Canada
1	Butterfinger	184	1	1	1	0	0	0	0	0	America
2	Skor	39	1	0	1	0	0	0	0	0	Both
3	Smarties	45	1	0	0	0	0	0	0	1	Canada
4	Twix	58	1	0	1	0	1	0	0	1	Both
...	...	...	...	...	...	...	...	...	...	...	...
20	Take 5	43	1	1	1	0	1	0	0	0	America
21	Whatchamacallits	45	1	1	0	0	1	0	0	0	America
22	Almond Joy	46	1	0	0	0	0	1	0	0	America
23	Oh Henry	51	1	1	1	0	0	0	0	0	Both
24	Cookies and Cream	43	0	0	0	0	1	0	1	0	Both

25 rows × 11 columns

This code uses something we’ve never seen before - {} curly braces, also called curly brackets.

These have a special meaning but for now, you only need to concentrate your attention on the fact that the argument columns needs to have the format shown on the slide.

 columns={'old column name':'new column name'}

You can also rename multiple columns at once by adding a comma between the new and old column pairs within the curly brackets.

It’s important that we always save the dataframe to an object when making column changes or the changes will not be saved in our dataframe.

candy = candy.rename(columns={'available_canada_america':'availability',
                        'weight':'weight_g'})
candy.head()

	name	weight_g	chocolate	peanuts	caramel	cookie_wafer_rice	multi	availability
0	Coffee Crisp	50	1	0	0	1	0	Canada
1	Butterfinger	184	1	1	1	0	0	America
2	Skor	39	1	0	1	0	0	Both
3	Smarties	45	1	0	0	0	1	Canada
4	Twix	58	1	0	1	1	1	Both

Column Dropping¶

.drop() is the verb we use to delete columns in a dataframe.

Let’s delete the column coconut by specifying it in the columns argument of the drop verb.

candy.drop(columns='coconut')

	name	weight_g	chocolate	peanuts	caramel	nougat	cookie_wafer_rice	white_chocolate	multi	availability
0	Coffee Crisp	50	1	0	0	0	1	0	0	Canada
1	Butterfinger	184	1	1	1	0	0	0	0	America
2	Skor	39	1	0	1	0	0	0	0	Both
3	Smarties	45	1	0	0	0	0	0	1	Canada
4	Twix	58	1	0	1	0	1	0	1	Both
...	...	...	...	...	...	...	...	...	...	...
20	Take 5	43	1	1	1	0	1	0	0	America
21	Whatchamacallits	45	1	1	0	0	1	0	0	America
22	Almond Joy	46	1	0	0	0	0	0	0	America
23	Oh Henry	51	1	1	1	0	0	0	0	Both
24	Cookies and Cream	43	0	0	0	0	1	1	0	Both

25 rows × 10 columns

candy.drop(columns='coconut')

	name	weight_g	chocolate	peanuts	caramel	nougat	cookie_wafer_rice	white_chocolate	multi	availability
0	Coffee Crisp	50	1	0	0	0	1	0	0	Canada
1	Butterfinger	184	1	1	1	0	0	0	0	America
2	Skor	39	1	0	1	0	0	0	0	Both
3	Smarties	45	1	0	0	0	0	0	1	Canada
4	Twix	58	1	0	1	0	1	0	1	Both
...	...	...	...	...	...	...	...	...	...	...
20	Take 5	43	1	1	1	0	1	0	0	America
21	Whatchamacallits	45	1	1	0	0	1	0	0	America
22	Almond Joy	46	1	0	0	0	0	0	0	America
23	Oh Henry	51	1	1	1	0	0	0	0	Both
24	Cookies and Cream	43	0	0	0	0	1	1	0	Both

25 rows × 10 columns

If you look again at the code we just wrote you’ll notice we didn’t save over the dataframe object, so the dataframe candy still will contain the coconut column.

candy.head()

	name	weight_g	chocolate	peanuts	caramel	cookie_wafer_rice	multi	availability
0	Coffee Crisp	50	1	0	0	1	0	Canada
1	Butterfinger	184	1	1	1	0	0	America
2	Skor	39	1	0	1	0	0	Both
3	Smarties	45	1	0	0	0	1	Canada
4	Twix	58	1	0	1	1	1	Both

Let’s overwrite the dataframe and remove multiple columns at the same time.

Let’s drop nougat and coconut together.

candy = candy.drop(columns=['nougat', 'coconut'])
candy.head()

	name	weight_g	chocolate	peanuts	caramel	cookie_wafer_rice	multi	availability
0	Coffee Crisp	50	1	0	0	1	0	Canada
1	Butterfinger	184	1	1	1	0	0	America
2	Skor	39	1	0	1	0	0	Both
3	Smarties	45	1	0	0	0	1	Canada
4	Twix	58	1	0	1	1	1	Both

We put the columns we want to drop in square brackets and this time we will remember to overwrite over the candy object.

Now when we call candy.head() it reflects the dropped columns. They’re no longer there.

Let’s apply what we learned!

Here is our fruit_salad dataframe once again.

           name    colour    location    seed   shape  sweetness   water-content  weight
       apple       red     canada    True   round     True          84         100
      banana    yellow     mexico   False    long     True          75         120
  cantaloupe    orange      spain    True   round     True          90        1360
dragon-fruit   magenta      china    True   round    False          96         600
  elderberry    purple    austria   False   round     True          80           5
         fig    purple     turkey   False    oval    False          78          40
       guava     green     mexico    True    oval     True          83         450
 huckleberry      blue     canada    True   round     True          73           5
        kiwi     brown      china    True   round     True          80          76
       lemon    yellow     mexico   False    oval    False          83          65

Let’s say we run the following code:

fruit_salad.drop(columns = ['colour', 'shape', 'sweetness'])
fruit_salad = fruit_salad.rename(columns={'location':'country',
                                          'weight':'weight_g'})

Use the dataframe and code above to answer the next 2 questions.

1. After running the code above, How many columns (not including the index) are there in fruit_salad ?
a) 9
b) 4
c) 8

2. After running the code above, which of the following is a column in the dataframe fruit_salad?
a) country
b) location

Solutions!

c) 8
a) country

Programming in Python for Data Science

Column renaming and column dropping¶

Column Renaming¶

Column Dropping¶