Pandas jupyter notebook tutorial

#PANDAS JUPYTER NOTEBOOK TUTORIAL HOW TO#

Once you are on the web interface of Jupyter Notebook, you’ll see the names.zip file there.

The numpy package will also be installed if you don’t have it already. If you don’t have any of the packages already installed, install them with pip, as in: seaborn to make our matplotlib statistical graphics more aesthetic.

numpy to support multi-dimensional arrays.Once the file is downloaded, let’s verify that we have all the packages installed that we’ll be using: Within this directory, we can pull the zip file from the Social Security website with the curl command: We can call it names and then move into the directory: Now let’s create a new directory for our project. Let’s activate our Python 3 programming environment on our local machine, or on our server from the correct directory: Setting Up Dataįor this tutorial, we’re going to be working with United States Social Security data on baby names that is available from the Social Security website as an 8MB zip file. If you do not have it already, you should follow our tutorial to install and set up Jupyter Notebook for Python 3. Working with large datasets can be memory intensive, so in either case, the computer will need at least 2GB of memory to perform some of the calculations in this guide.įor this tutorial, we’ll be using Jupyter Notebook to work with the data.

#PANDAS JUPYTER NOTEBOOK TUTORIAL HOW TO#

This guide will cover how to work with data in pandas on either a local desktop or a remote server. To get some familiarity on the pandas package, you can read our tutorial An Introduction to the pandas Package and its Data Structures in Python 3. In this tutorial, we’ll go over setting up a large data set to work with, the groupby() and pivot_table() functions of pandas, and finally how to visualize data. The pandas package offers spreadsheet functionality, but because you’re working with Python, it is much faster and more efficient than a traditional graphical spreadsheet program. The Python pandas package is used for data manipulation and analysis, designed to let you work with labeled or relational data in an intuitive way.