Improved architecture and an enthusiastic user base are driving uptake of the open-source web tool.
Introduction to Jupyter Notebook
Jupyter is a free, open-source, interactive web tool known as a computational notebook, which researchers can use to combine software code, computational output, explanatory text and multimedia resources in a single document. Jupyter is neither the first nor the only example of computational notebooks. As early as the 1980s, notebook interfaces were available through software such as Wolfram Mathematica and MATLAB.
The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include data cleaning and transformation, numerical simulation, statistical modelling, data visualization, machine learning, and much more.
The Jupyter Notebook is a living online notebook that allows basically anyone to write up information (code, data, statistics) with narrative, multimedia, and graphs. It can be used for analysis, statistics, machine learning and many more. Faculty can use it to set up interactive textbooks, full of explanations and examples which students can test out right from their browsers.
Students can use it to explain their reasoning, show their work, and draw connections between their classwork and the world outside. Scientists, journalists, and researchers can use it to open up their data, share the stories behind their computations, and enable future collaboration and innovation.
“To me, it’s about storytelling…We all see this tool being used in different ways, but I see it as a fundamental way of telling stories, both for the faculty to tell stories to the students, but also for students to tell stories back to us.”
–Doug Blank, Bryn Mawr College Computer Science Professor
Installing Jupyter Notebook
Anaconda is a free, open-source distribution of Python and R that comes with more than 1,400 packages, the Conda package manager for installing additional packages. After installing Anaconda, you can use Anaconda Navigator to install new packages. To download and install Anaconda, go to the Anaconda website.
Because of its comparative simplicity and ease of use for beginners, this article use Jupyter Notebook as the software for running notebook files. It’s easiest to use Anaconda to install Jupyter Notebook, but if you already have Python installed on your system and don’t want to deal with the large Anaconda package, you can run
pip3 install jupyter (for Python 3).
The Jupyter architecture
Jupyter is not just a tool: it’s a platform, an ecosystem, that enables others to build tools on top of it. While it is not so important, let us look at the three parts a Jupyter is made of.
- The Jupyter server is either a relatively simple application that runs on your laptop or a multi-user server. The Jupyter project’s JupyterHub is the most widely used multi-user server for Jupyter.
- The kernel protocol allows the server to offload the task of running code to a language-specific kernel. Jupyter ships with kernels for Python 2 and Python 3, but kernels for many other languages are available.
Opening the Jupyter Notebook
A notebook is made up of cells: boxes that contain code or human-readable text. Every cell has a type, which can be selected from the drop-down options in the menu. The default option is “Code”; human-readable text boxes should use the “Markdown” type, and will need to be written using Markdown formatting conventions. To learn more about Markdown, see the “Getting Started With Markdown” Programming Historian lesson.
- The Jupyter Notebook file browser interface is the main way to open a Jupyter notebook (.ipynb) file.
- To view a notebook through the Jupyter interface, you have to launch Jupyter Notebook first and open the file from within Jupyter Notebook. (Using the command line, you can also directly launch a specific notebook, e.g.
jupyter notebook example.ipynb.)
Writing Scripts on the Jupyter Notebook
In this tutorial, we will be focusing on using Python as the scripting language.
- We can use the popular library pandas to read a csv file using Python
- Using the pandas library, lots of statistical analysis can be made. This is particularly useful for Big Data analysis and some data visualization and exploration.
- Extending from statistical analysis, Jupyter Notebook also supports machine learning analysis, interactive data visualisation, data cleaning and transformation, numerical simulation and so much more.
Data Visualisation on the Jupyter Notebook
Apart from data analysis, we can also perform quick and extensive data visualisation. Rich interactive computing is what I love most about Jupyter Notebook. Besides, it’s a perfect web-based environment for performing exploratory analysis.
Saving Jupyter notebook files
Jupyter autosaves your work periodically by creating “checkpoints” saved in the same directory the notebook file is. If something goes wrong with your notebook, you can revert to a previous checkpoint by going to “File”, then “Revert to Checkpoint”, and choosing a timestamp. That said, it’s still important to save your notebook because if you close and shut down the notebook kernel, the checkpoints will be lost.
You can also download the notebook (File > Download as) in several different file formats. Downloading the Notebook format (.ipynb) is useful if you want to share your code in its full notebook format.
Converting Python Code to Jupyter Notebook format
Even if you like the idea of using Jupyter Notebooks, any format conversion requires additional work. If you already have your code written as Python scripts, conversion to Jupyter Notebooks is fairly straightforward. You can copy and paste the code from your .py file into a single code cell of a new notebook, and then split the code cell into segments and add additional Markdown cells as needed.
There are also tools like the p2j package that automatically convert existing Python code to Jupyter notebooks
The Future of Jupyter
As we’ve seen, Jupyter Notebook files are handy. The interface allows you to navigate using your mouse with dropdown menus and buttons or by keyboard shortcuts. They allow you to run small code segments at a time, save them in their current state, or restart and have them return to their original state. In addition to running code, we can also use markdown to organize our notebooks neatly they are presentable to others. Lots of analysis from statistical, financial, machine learning to automation can be performed using Jupyter Notebook.
Moreover, from experimenting with code to documenting workflows, scholarly publication, Jupyter notebooks are a flexible, multi-purpose tool that can support digital research in many different contexts.
Even if you aren’t sure how exactly you’ll use them, it’s fairly easy to install the Jupyter Notebook software and download and explore existing notebooks, or experiment with a few of your own. This article hopefully serves as a quick tutorial on how can anyone setup and run Jupyter Notebook on their local machine without much dependencies.