Slicing only columns using .loc[]¶
Watch it
See the accompanied youtube video at the link here.
What happens now if we wanted all the rows of the dataframe but only the
columns calories
to fiber
?
We can use :
in the row postion of the .loc[]
call to indicate we
want all the rows. So here we write cereal.loc[:, 'calories':'fiber']
.
cereal.loc[:, 'calories':'fiber']
calories | protein | fat | sodium | fiber | |
---|---|---|---|---|---|
0 | 70 | 4 | 1 | 130 | 10.0 |
1 | 120 | 3 | 5 | 15 | 2.0 |
2 | 70 | 4 | 1 | 260 | 9.0 |
3 | 50 | 4 | 0 | 140 | 14.0 |
4 | 110 | 2 | 2 | 200 | 1.0 |
... | ... | ... | ... | ... | ... |
72 | 110 | 2 | 1 | 250 | 0.0 |
73 | 110 | 1 | 1 | 140 | 0.0 |
74 | 100 | 3 | 1 | 230 | 3.0 |
75 | 100 | 3 | 1 | 200 | 3.0 |
76 | 110 | 2 | 1 | 200 | 1.0 |
77 rows × 5 columns
So Far¶
Let’s talk about what we have covered so far.
.loc[]
is used to slice columns and rows by label and within an interval.We always specify row indexing first, then columns.
cereal.loc['row name start':'row name end', 'column name start':'column name end']
If we aren’t slicing any columns, but we are slicing rows we can shorten that to:
df.loc[ 'row name start':'row name end']
However, the reverse is not true. If we want all the rows with only specific columns, we specify we want all the row first with just a colon
:
followed by interval of the columns:
df.loc[:, 'column name start':'column name end']
We can read
:
as “to”.If the indices are labeled with numbers, we do not need “quotations” when calling them. This is only when the labels are using letters.
Let’s apply what we learned!
Using my dataframe object named fruit_salad
, let’s answer some slicing questions.
name colour location seed shape sweetness water-content weight
0 apple red canada True round True 84 100
1 banana yellow mexico False long True 75 120
2 cantaloupe orange spain True round True 90 1360
3 dragon-fruit magenta china True round False 96 600
4 elderberry purple austria False round True 80 5
5 fig purple turkey False oval False 78 40
6 guava green mexico True oval True 83 450
7 huckleberry blue canada True round True 73 5
8 kiwi brown china True round True 80 76
9 lemon yellow mexico False oval False 83 65
1. If you wanted all the rows and only columns seeds
, shape
, sweetness
and water-content,
what would your code look like using index labels?
a) fruit_salad.loc[:, "seed":"weight"]
b) fruit_salad[:, "seed":"water-content"]
c) fruit_salad[0:9, "seed":"water-content"]
d) fruit_salad.loc[:, "seed":"water-content"]
Solutions!
d)
fruit_salad.loc[:, "seed":"water-content"]