Programming in Python for Data Science¶
Welcome to Programming in Python for Data Science (The Book)!
This course is part of the Key Capabilities for Data Science program and will teach you how to conduct data analysis in Python. During the course, you will work with powerful Python packages made for data-science, including Pandas for processing tabular data, Altair for data visualization and NumPy for working with numerical data types. You will also learn about iteration, flow control, and the data types relevant to data exploration and analysis. You will leave this course capable of processing raw data into a format suitable for analysis, writing your own analysis functions, and deriving data-driven insights via the creation of interactive visualizations and summary tables.
This course is designed to give you a solid foundation of coding in Python.
No prior python knowledge is needed for this course.
Course Learning Outcomes¶
By the end of the course, students are expected to:
Define tidy data and explain why it is an optimal format for data analysis.
Transform data into the tidy data format using pandas.
Demonstrate fundamental programming concepts such as loops and conditionals.
Understand the key data structures in Python.
Read data into Python data from vanilla (e.g., .csv) and non-standard plain text files, as well as common spreadsheet file types (e.g., .xls).
Construct simple plots using Altair
Manipulate a single data table by: 7.1 Filtering rows based on a criterion or combination of criteria 7.2 Selecting variables 7.3 Creating new variables and modifying pre-existing ones 7.4 Rearranging the observations or variables by sorting.
Manage and manipulate data with dates and times, missing values and categorical variables as well as renaming dataframe columns.
Produce human-readable code that incorporates best practices of programming and coding style.