Slicing Rows using .loc[]¶
Watch it
See the accompanied youtube video at the link here.
Congratulations on writing your first code!
We have read in our data, and we know the dimensions. Well, now what?
Let’s go over how we would index, slice, and select certain columns or rows of our data.
Let’s start by importing pandas and loading in a dataset named
cereal.csv,
and we will save it as cereal.
import pandas as pd
cereal = pd.read_csv('cereal.csv')
cereal.head()
name | mfr | type | calories | protein | fat | sodium | fiber | carbo | sugars | potass | vitamins | shelf | weight | cups | rating | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 100% Bran | N | Cold | 70 | 4 | 1 | 130 | 10.0 | 5.0 | 6 | 280 | 25 | 3 | 1.0 | 0.33 | 68.402973 |
1 | 100% Natural Bran | Q | Cold | 120 | 3 | 5 | 15 | 2.0 | 8.0 | 8 | 135 | 0 | 3 | 1.0 | 1.00 | 33.983679 |
2 | All-Bran | K | Cold | 70 | 4 | 1 | 260 | 9.0 | 7.0 | 5 | 320 | 25 | 3 | 1.0 | 0.33 | 59.425505 |
3 | All-Bran with Extra Fiber | K | Cold | 50 | 4 | 0 | 140 | 14.0 | 8.0 | 0 | 330 | 25 | 3 | 1.0 | 0.50 | 93.704912 |
4 | Almond Delight | R | Cold | 110 | 2 | 2 | 200 | 1.0 | 14.0 | 8 | 1 | 25 | 3 | 1.0 | 0.75 | 34.384843 |
Attribution:
“80 Cereals” (c) by
Chris Crawford is licensed
under Creative Commons Attribution-ShareAlike 3.0
Unported.
We can see all the columns and the first 5 rows of the dataframe using
.head()
However, let’s say we only want certain rows of the dataframe or certain columns.
We talked about how .head()
will generate the first few rows of a
dataframe (5 as default), but what if we wanted the rows from 5-10?
The first column of this dataframe is called the index.
Each row has a label (the index) as well as a position. In this case, the index label of an observation is the same as it’s position.
This doesn’t always have to be the case. We can assign another column as the index; however, we will wait to discuss this in the next module.
Here are the first 15 rows of the dataframe.
cereal.head(15)
name | mfr | type | calories | protein | fat | sodium | fiber | carbo | sugars | potass | vitamins | shelf | weight | cups | rating | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 100% Bran | N | Cold | 70 | 4 | 1 | 130 | 10.0 | 5.0 | 6 | 280 | 25 | 3 | 1.00 | 0.33 | 68.402973 |
1 | 100% Natural Bran | Q | Cold | 120 | 3 | 5 | 15 | 2.0 | 8.0 | 8 | 135 | 0 | 3 | 1.00 | 1.00 | 33.983679 |
2 | All-Bran | K | Cold | 70 | 4 | 1 | 260 | 9.0 | 7.0 | 5 | 320 | 25 | 3 | 1.00 | 0.33 | 59.425505 |
3 | All-Bran with Extra Fiber | K | Cold | 50 | 4 | 0 | 140 | 14.0 | 8.0 | 0 | 330 | 25 | 3 | 1.00 | 0.50 | 93.704912 |
4 | Almond Delight | R | Cold | 110 | 2 | 2 | 200 | 1.0 | 14.0 | 8 | 1 | 25 | 3 | 1.00 | 0.75 | 34.384843 |
5 | Apple Cinnamon Cheerios | G | Cold | 110 | 2 | 2 | 180 | 1.5 | 10.5 | 10 | 70 | 25 | 1 | 1.00 | 0.75 | 29.509541 |
6 | Apple Jacks | K | Cold | 110 | 2 | 0 | 125 | 1.0 | 11.0 | 14 | 30 | 25 | 2 | 1.00 | 1.00 | 33.174094 |
7 | Basic 4 | G | Cold | 130 | 3 | 2 | 210 | 2.0 | 18.0 | 8 | 100 | 25 | 3 | 1.33 | 0.75 | 37.038562 |
8 | Bran Chex | R | Cold | 90 | 2 | 1 | 200 | 4.0 | 15.0 | 6 | 125 | 25 | 1 | 1.00 | 0.67 | 49.120253 |
9 | Bran Flakes | P | Cold | 90 | 3 | 0 | 210 | 5.0 | 13.0 | 5 | 190 | 25 | 3 | 1.00 | 0.67 | 53.313813 |
10 | Cap'n'Crunch | Q | Cold | 120 | 1 | 2 | 220 | 0.0 | 12.0 | 12 | 35 | 25 | 2 | 1.00 | 0.75 | 18.042851 |
11 | Cheerios | G | Cold | 110 | 6 | 2 | 290 | 2.0 | 17.0 | 1 | 105 | 25 | 1 | 1.00 | 1.25 | 50.764999 |
12 | Cinnamon Toast Crunch | G | Cold | 120 | 1 | 3 | 210 | 0.0 | 13.0 | 9 | 45 | 25 | 2 | 1.00 | 0.75 | 19.823573 |
13 | Clusters | G | Cold | 110 | 3 | 2 | 140 | 2.0 | 13.0 | 7 | 105 | 25 | 3 | 1.00 | 0.50 | 40.400208 |
14 | Cocoa Puffs | G | Cold | 110 | 1 | 1 | 180 | 0.0 | 12.0 | 13 | 55 | 25 | 2 | 1.00 | 1.00 | 22.736446 |
Let’s talk about observation 4 which is named Almond Delight
. Its
index label is 4
as well as it’s index position.
If you just went and counted those again and started screaming “5! It’s the fifth position”, that’s OK. In the Python language, we start counting at position 0 (then 1, 2, 3, and 4 for Almond Delight).
Now let’s say we want all 5 rows past Almond Delight
. That means we
want the rows from Apple Cinnamon Cheerios
to Cap'n'Crunch
.
We will use .loc[]
with square brackets to cut the dataframe from
“Apple Cinnamon Cheerios” to “Cap’n’Crunch,” keeping the columns and
everything between.
This code is written as cereal.loc[5:10]
, and it can be interpreted as
“Obtain the rows in the dataframe located from 5
to 10
.”
cereal.loc[5:10]
name | mfr | type | calories | protein | fat | sodium | fiber | carbo | sugars | potass | vitamins | shelf | weight | cups | rating | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5 | Apple Cinnamon Cheerios | G | Cold | 110 | 2 | 2 | 180 | 1.5 | 10.5 | 10 | 70 | 25 | 1 | 1.00 | 0.75 | 29.509541 |
6 | Apple Jacks | K | Cold | 110 | 2 | 0 | 125 | 1.0 | 11.0 | 14 | 30 | 25 | 2 | 1.00 | 1.00 | 33.174094 |
7 | Basic 4 | G | Cold | 130 | 3 | 2 | 210 | 2.0 | 18.0 | 8 | 100 | 25 | 3 | 1.33 | 0.75 | 37.038562 |
8 | Bran Chex | R | Cold | 90 | 2 | 1 | 200 | 4.0 | 15.0 | 6 | 125 | 25 | 1 | 1.00 | 0.67 | 49.120253 |
9 | Bran Flakes | P | Cold | 90 | 3 | 0 | 210 | 5.0 | 13.0 | 5 | 190 | 25 | 3 | 1.00 | 0.67 | 53.313813 |
10 | Cap'n'Crunch | Q | Cold | 120 | 1 | 2 | 220 | 0.0 | 12.0 | 12 | 35 | 25 | 2 | 1.00 | 0.75 | 18.042851 |
What if we only wanted certain columns now?
Perhaps we were only interested in the calories
to fiber
columns of
the “Apple Cinnamon Cheerios” to “Cap’n’Crunch” rows?
We put in the interval of our desired rows first, and then the columns, and we separate these values with a comma.
.loc[]
is used to slice columns and rows by label, and within an
interval.
cereal.loc[5:10, 'calories':'fiber']
calories | protein | fat | sodium | fiber | |
---|---|---|---|---|---|
5 | 110 | 2 | 2 | 180 | 1.5 |
6 | 110 | 2 | 0 | 125 | 1.0 |
7 | 130 | 3 | 2 | 210 | 2.0 |
8 | 90 | 2 | 1 | 200 | 4.0 |
9 | 90 | 3 | 0 | 210 | 5.0 |
10 | 120 | 1 | 2 | 220 | 0.0 |
The general format to slice both rows and columns together looks like this:
cereal.loc['row name start':'row name end', 'column name start':'column name end']
There is a handy shortcut for slices that include the beginning of a dataframe to a specified row or column label or a specified row or column label to the end of a dataframe.
For example if we want all the rows up to “Apple Jacks” which has a
label equal to 6, we could omit the first label in the code all
together. So we can write cereal.loc[:6]
.
cereal.loc[:6]
name | mfr | type | calories | protein | fat | sodium | fiber | carbo | sugars | potass | vitamins | shelf | weight | cups | rating | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 100% Bran | N | Cold | 70 | 4 | 1 | 130 | 10.0 | 5.0 | 6 | 280 | 25 | 3 | 1.0 | 0.33 | 68.402973 |
1 | 100% Natural Bran | Q | Cold | 120 | 3 | 5 | 15 | 2.0 | 8.0 | 8 | 135 | 0 | 3 | 1.0 | 1.00 | 33.983679 |
2 | All-Bran | K | Cold | 70 | 4 | 1 | 260 | 9.0 | 7.0 | 5 | 320 | 25 | 3 | 1.0 | 0.33 | 59.425505 |
3 | All-Bran with Extra Fiber | K | Cold | 50 | 4 | 0 | 140 | 14.0 | 8.0 | 0 | 330 | 25 | 3 | 1.0 | 0.50 | 93.704912 |
4 | Almond Delight | R | Cold | 110 | 2 | 2 | 200 | 1.0 | 14.0 | 8 | 1 | 25 | 3 | 1.0 | 0.75 | 34.384843 |
5 | Apple Cinnamon Cheerios | G | Cold | 110 | 2 | 2 | 180 | 1.5 | 10.5 | 10 | 70 | 25 | 1 | 1.0 | 0.75 | 29.509541 |
6 | Apple Jacks | K | Cold | 110 | 2 | 0 | 125 | 1.0 | 11.0 | 14 | 30 | 25 | 2 | 1.0 | 1.00 | 33.174094 |
Or we can do something similar for the end of a dataframe. Let’s say now
we want all the rows up to Apple Jacks
and only the columns from
sugars
onward to the end of the dataframe. What we would write in this
case is cereal.loc[:6, 'sugars':]
.
We would omit the ending label this time after the :
(colon) .
cereal.loc[:6, 'sugars':]
sugars | potass | vitamins | shelf | weight | cups | rating | |
---|---|---|---|---|---|---|---|
0 | 6 | 280 | 25 | 3 | 1.0 | 0.33 | 68.402973 |
1 | 8 | 135 | 0 | 3 | 1.0 | 1.00 | 33.983679 |
2 | 5 | 320 | 25 | 3 | 1.0 | 0.33 | 59.425505 |
3 | 0 | 330 | 25 | 3 | 1.0 | 0.50 | 93.704912 |
4 | 8 | 1 | 25 | 3 | 1.0 | 0.75 | 34.384843 |
5 | 10 | 70 | 25 | 1 | 1.0 | 0.75 | 29.509541 |
6 | 14 | 30 | 25 | 2 | 1.0 | 1.00 | 33.174094 |
Let’s apply what we learned!
My dataframe object name is fruit_salad
.
name colour location seed shape sweetness water-content weight
0 apple red canada True round True 84 100
1 banana yellow mexico False long True 75 120
2 cantaloupe orange spain True round True 90 1360
3 dragon-fruit magenta china True round False 96 600
4 elderberry purple austria False round True 80 5
5 fig purple turkey False oval False 78 40
6 guava green mexico True oval True 83 450
7 huckleberry blue canada True round True 73 5
8 kiwi brown china True round True 80 76
9 lemon yellow mexico False oval False 83 65
1. If you wanted only the rows from cantaloupe
to kiwi
, what would your code look like using index labels?
a) fruit_salad.loc[2, 8]
b) fruit_salad[2, 8]
c) fruit_salad[2:8]
d) fruit_salad.loc[2:8]
2. If you wanted all the rows between cantaloupe
and fig
and only columns name
to seed
, what would your code look like using index labels?
a) fruit_salad.loc[2:5, "colour":"seed"]
b) fruit_salad.loc[2:5, "name":"seed"]
c) fruit_salad.loc["name":"seed", 2:5]
d) fruit_salad[2:8, "name":"seed"]
Solutions!
d)
fruit_salad.loc[2:8]
b)
fruit_salad.loc[2:5, "name":"seed"]