Salla Rönkä
24.08.2018 · Salla Rönkä

JupyterLab – a great tool for data scientists!

Get the hang of JupyterLab and its features.

Jupyter Notebook is a familiar tool for many data scientists for a good reason. In a single document, you can explore data, create your machine learning models, add comments in Markdown format, run commands and plot graphs. This way all the phases of your analysis can be documented – both executable code and the results – which makes it easier for others to understand your work. It also lets you shine in meetings: you can present your results in an interactive way and answer questions right then and there by running the code with changed parameters. A clear improvement to static reports and promising to get back with answers the next day.
 

What is JupyterLab, then?

Developed as a community driven initiative and its roots in Ipython notebook, Jupyter Notebook has evolved into a more IDE like environment and is now called JupyterLab. Here is a list of some of the great things you can now do with it:

 

  • Instead of opening notebooks in multiple browser windows you can have multiple notebooks open at the same time in the same browser window.
  • You can flexibly drag and drop windows in the view and arrange them the way you like. You can even have the same notebook as a duplicate and work on different parts of the document instead of scrolling back and forth.
  • In addition to Python, Julia and R notebooks, you can also open a terminal, text editor or a data viewer. Imagine pushing to git straight instead of opening a separate terminal connection!
  • At the side you have a full file explorer.
  • You can use syntax completion with tab key and object tool tip inspection using Shift-Tab. Writing ”function?” will print information about the function.
  • Real time collaboration on files hosted on Google Drive. This might not be for enterprise use but cool, right?

 
Cell magics were available already in Jupyter Notebook but if you have not used them then here is the idea:

They start with the % sign and are specific to and provided by the IPython kernel. E.g. %timeit can be used if you are comparing efficiency between two alternative pieces of code. %who gives you every variable defined in your workspace and %whos shows the type and what is in the variable. %debug lets you step to where the exception was thrown. The results of magics can by the way be saved into variables! If you want to learn more, %lsmagic lists all the magics or you can see a complete list here.


Why add JupyterLab to your toolbox?

The notebooks are good for explorative data science work. With some more effort you could also create dashboards using widgets. This is not your BI tool, though, like Tableau, PowerBI or Qlik! What it’s good for, is making the phases of your analysis more transparent and reproducible. With JupyterLab you can create, for example, executable Python code and try out different models. You can then with some effort move the code to another tool of your choice, like PyCharm or any text editor, and reveal your AI applications to the world through APIs. Even some data engineers use notebooks before turning them into scripts for production use.

 

How to collaborate in JupyterLab

At this point you’re probably eager to hear about collaboration possibilities in JupyterLab. If the Google Drive collaboration doesn’t fit your needs, the chaotic way to do it is for everyone to run their own notebooks and send them around through e-mail or with memory sticks. This is not scalable and you can imagine how clueless you’ll be about who has which version of the work. A better way is to have a common directory that everyone uses, for example, in ec2 machine or in AWS S3 that has the latest files. If you are used to using a git repository then that could be an option as well. 


Awesome, how can I try this?

You could install JupyterLab to run on your desktop computer but together with my colleague, we’ve created instructions on how to set it up on AWS either with Docker, using Jupyter/docker-stacks (hierarchical stacks of ready-to-run Jupyter applications on Docker) and EC2 default Amazon instance where you can run it, or with Amazon Deep Learning AMI. We’ll get to the instructions in another blog post.

Category
Analytics Technology

Want to know more about this topic? Please leave your contact info and we'll get back to you.

Latest blog posts