What is Pandas?

Watch it

See the accompanied youtube video at the link here.

Pandas is an add-on library to Python.

It let’s us do more things with our code, specifically with dataframes.

Importing pandas

To analyze dataframes and load these csv files, we need to make sure that we bring in the pandas library into Python.

Before we start writing any valuable code for loading data and doing data analysis we need to import it with the following code.

import pandas as pd

Reading in Data

Next we can bring in our data named candybars which is stored as a .csv file.

candy = pd.read_csv('candybars.csv')
candy
name weight chocolate peanuts caramel available_canada_america
0 Coffee Crisp 50 1 0 0 Canada
1 Butterfinger 184 1 1 1 America
2 Skor 39 1 0 1 Both
3 Smarties 45 1 0 0 Canada
4 Twix 58 1 0 1 Both
5 Reeses Peanutbutter Cups 43 1 1 0 Both
6 3 Musketeers 54 1 0 0 America
7 Kinder Surprise 20 1 0 0 Canada
8 M & M 48 1 1 0 Both
9 Glosettes 50 1 0 0 Canada
10 KitKat 45 1 0 0 Both
11 Babe Ruth 60 1 1 1 America
12 Caramilk 52 1 0 1 Canada
13 Aero 42 1 0 0 Canada
14 Mars 51 1 0 1 Both
15 Payday 52 0 1 1 America
16 Snickers 48 1 1 1 Both
17 Crunchie 26 1 0 0 Canada
18 Wonderbar 58 1 1 1 Canada
19 100 Grand 43 1 0 1 America
20 Take 5 43 1 1 1 America
21 Whatchamacallits 45 1 1 0 America
22 Almond Joy 46 1 0 0 America
23 Oh Henry 51 1 1 1 Both
24 Cookies and Cream 43 0 0 0 Both

let’s break this up:

  • pd is the short form for pandas, which we are using to manipulate our dataframe.

  • read_csv() is the tool that does the job and, in this case, it is reading in the csv file named candybars.csv.

  • candy is The dataframe is now saved as an object called candy.

The dataframe is stored in an object named candy and we can inspect in by “calling” the object name.

In these section we can differentiate between the code that we typed in with a light grey background and it’s output which has a dark grey background.

From this dataframe, we can see that there are 25 different candy bars and 6 columns.

We can obtain the names of the columns using .columns syntax, and if we wanted to see the dimensions of the whole dataframe we could use .shape after the dataframe name.

candy.columns
Index(['name', 'weight', 'chocolate', 'peanuts', 'caramel',
       'available_canada_america'],
      dtype='object')
candy.shape
(25, 6)

Breaking up the code, we interpret this as:

“From our dataframe that we saved as candy, tell me the columns and shape

What if we don’t want to output the whole table when displaying it as dataframe?

We can specify how many rows of the dataset to show with .head() syntax.

.head(2) will output the first 2 rows of the dataframe.

candy.head(2)
name weight chocolate peanuts caramel available_canada_america
0 Coffee Crisp 50 1 0 0 Canada
1 Butterfinger 184 1 1 1 America

We can specify any number of rows within the parentheses or we can leave it empty which will default to the first 5 rows.

candy.head()
name weight chocolate peanuts caramel available_canada_america
0 Coffee Crisp 50 1 0 0 Canada
1 Butterfinger 184 1 1 1 America
2 Skor 39 1 0 1 Both
3 Smarties 45 1 0 0 Canada
4 Twix 58 1 0 1 Both

This can be really useful when we have dataframes that have hundreds or thousands of rows long.

Functions/Methods and Attributes

404 image

Something you may have noticed is that when we use pd.read_csv() we put our instructions within the parentheses, whereas, when we use .shape or .head() the object that we are operating on comes before our desired command.

In Python, we use functions, methods and attributes. These are special words in Python that take instructions (we call these arguments) and do something.

Attributes

Attributes can be distinguished from methods and functions as they do not have parentheses.

They can be thought of as nouns or adjectives that describe an object.

Take candy.shape as an example.

In this case, our dataframe candy is our object and .shape is the attribute describing it.

Functions

Functions and methods have parentheses.
They can be thought of as verbs that complete an action.

In the example of pd.read_csv(), this function does the action of reading in our data.

This is going to be discussed in more detail later in the course but now, simply be aware of the way we write the different instructions.

Comments

While we write code, it’s often useful to annotate it or include information for humans that we do not want to executed.

The easiest way to do this is with a hash (#) symbol. This creates a single line comment and prevents anything written after it from being executed by Python.

# This line does not execute anything. 
candy.shape  # This will output the shape of the dataframe
(25, 6)

We use comments frequently in the exercises to help you understand what to do and what our intentions are.

It’s good practice to use them to explain our code so if we or someone else wants to read it at a later date, it’s easier to understand.

Let’s apply what we learned!

1. What is Pandas?
a) A useful tool for data manipulation in Python
b) A programming language
c) A datatype

2. Which of the following statements is true?
a) Attribute and methods can be thought of as nouns and functions as verbs
b) Attribute can be thought of as nouns and functions and methods as verbs
c)Functions and methods can be thought of as nouns and attributes as verbs