Sorting dataframes

Watch it

See the accompanied youtube video at the link here.

When we read in our data, it is generally ordered in the same way it is stored.

We can easily sort the rows of a dataframe based on the values within a column.

The verb that we use for that is .sort_values().

cereal.sort_values(by='rating')
name mfr type calories protein fat sodium ... sugars potass vitamins shelf weight cups rating
10 Cap'n'Crunch Q Cold 120 1 2 220 ... 12 35 25 2 1.00 0.75 18.042851
12 Cinnamon Toast Crunch G Cold 120 1 3 210 ... 9 45 25 2 1.00 0.75 19.823573
35 Honey Graham Ohs Q Cold 120 1 2 220 ... 11 45 25 2 1.00 1.00 21.871292
18 Count Chocula G Cold 110 1 1 180 ... 13 65 25 2 1.00 1.00 22.396513
14 Cocoa Puffs G Cold 110 1 1 180 ... 13 55 25 2 1.00 1.00 22.736446
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
63 Shredded Wheat N Cold 80 2 0 0 ... 0 95 0 1 0.83 1.00 68.235885
0 100% Bran N Cold 70 4 1 130 ... 6 280 25 3 1.00 0.33 68.402973
65 Shredded Wheat spoon size N Cold 90 3 0 0 ... 0 120 0 1 1.00 0.67 72.801787
64 Shredded Wheat 'n'Bran N Cold 90 3 0 0 ... 0 140 0 1 1.00 0.67 74.472949
3 All-Bran with Extra Fiber K Cold 50 4 0 140 ... 0 330 25 3 1.00 0.50 93.704912

77 rows × 16 columns

For example, if we wanted to order the cereals based on rating, we could do so by using the argument by within the .sort_values() verb.

This allows us to see the cereals with lower ratings on the top.

What if we wanted the cereals with higher ratings at the top?

Then we would order them in descending order by setting the argument ascending=False.

sorted_ratings = cereal.sort_values(by='rating', ascending=False)
sorted_ratings
name mfr type calories protein fat sodium ... sugars potass vitamins shelf weight cups rating
3 All-Bran with Extra Fiber K Cold 50 4 0 140 ... 0 330 25 3 1.00 0.50 93.704912
64 Shredded Wheat 'n'Bran N Cold 90 3 0 0 ... 0 140 0 1 1.00 0.67 74.472949
65 Shredded Wheat spoon size N Cold 90 3 0 0 ... 0 120 0 1 1.00 0.67 72.801787
0 100% Bran N Cold 70 4 1 130 ... 6 280 25 3 1.00 0.33 68.402973
63 Shredded Wheat N Cold 80 2 0 0 ... 0 95 0 1 0.83 1.00 68.235885
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
14 Cocoa Puffs G Cold 110 1 1 180 ... 13 55 25 2 1.00 1.00 22.736446
18 Count Chocula G Cold 110 1 1 180 ... 13 65 25 2 1.00 1.00 22.396513
35 Honey Graham Ohs Q Cold 120 1 2 220 ... 11 45 25 2 1.00 1.00 21.871292
12 Cinnamon Toast Crunch G Cold 120 1 3 210 ... 9 45 25 2 1.00 0.75 19.823573
10 Cap'n'Crunch Q Cold 120 1 2 220 ... 12 35 25 2 1.00 0.75 18.042851

77 rows × 16 columns

Perfect, now we have the highest rated cereals at the top of the dataframe.