Deploy JupyterHub for Big Data and AI Collaboration in your team

Deploy JupyterHub for Big Data and AI Collaboration in your team

Listen to this article:

Voiced by Amazon Polly

Jupyter is an open source web application that deploys interactive notebooks containing code, text, data visualizations and more. Users can quickly develop and share big data and machine learning programs, without the need to constantly install the required libraries and frameworks.

Jupyter deployments have sharply increased as enterprises embrace big data and machine learning technologies. JupyterHub is a server deployment for multi-user Jupyter network deployments; it helps large research teams and companies deploy notebooks that require the same services.

In this post, learn how to quickly deploy JupyterHub Lite, a streamlined version of Jupyter for 0 to 100 users, on the Google Cloud Platform in under 5 minutes.

Why Jupyter?

At JupyterCon 2017 in New York, William Merchan from Datascience.com postulated three reasons for the recent surge in Jupyter adoption in large corporations:

  1. Data science and machine learning are now critical business functions: Every company is a data company. Managing and manipulating large volumes of data impacts every department in large corporations across the globe. Shareable tools like Jupyter notebooks encourage faster development and testing, every employee in all departments.
  2. Data communities and academics rally around Jupyter: There are several existing open source libraries and frameworks that achieve a significant percentage of required tasks without the need for dedicated big data and machine learning teams. Shared access to the required libraries and frameworks will foster teamwork and agile development.
  3. Python is king for data, and Jupyter works with over 40 other languages as well: Almost every single machine learning library has dedicated documentation and application programming interfaces (APIs) for the python language. When others are required, Jupyter supports up to 40 languages including Ruby, JS, Scala and C.

Jupyter will help your enterprise keep research scientists and functional employees aligned, develop scalable and shareable big data and AI systems, and build cutting edge technology that differentiates your business.

The JupyterLab dashboard lets you deploy Notebooks, launch the terminal, or create text files with ease.

Why Google Cloud Platform?

Google Cloud Platform (GCP) is a great cloud services provider for deploying virtual computing instances using powerful processors with ample RAM and available GPUs. Further you can also deploy your applications in containers using Kubernetes, to maintain the same operating system and kernel and reducing operational burden.

When you create a new account with Google Cloud Platform, you will receive up to $300 in free credits, which you can use to set up your JupyterHub Lite!

So let’s begin.

Create a Cloud Compute Instance

First we need to setup a GCP Cloud Compute instance, with ample RAM, GPU and other performance specifications as required.

  1. Log into Google Cloud Console with your Google Account. On the top left, to the right of the logo, make sure that you select the project you want to work on, or create a new project.
  2. Open the navigation menu on the top left corner to view all the available GCP products. Hover over Compute Engine and select VM Instances. If you are using GCP for the first time, you might need to click an Enable Billing button.
  3. Click the Create button to start a server to run JupyterHub Lite.
  4. Assign your new instance the properties and preferences that you need for your JupyterHub Lite server. For the purpose of this guide, I recommend making the following decisions:
    1. Under Name, give your server any memorable name. Perhaps “team_name-jupyter”
    2. For Region, specify a physical location that is geographically close to where your server’s users will be. Note that your decision here may affect the cost of your instance. For Zone, pick any of the options.
    3. Under Machine Type, select the standard instance, with 1 CPU, 3.75 GB of memory (RAM) and make sure the GPU checkbox is deselected. For applications where you need Tensorflow or similar libraries, make sure you add a GPU. You can always scale your instance later.
    4. Under Boot Disk, click the change button and select Ubuntu 18.04 LTS.
    5. Change the Boot Disk Type to SSD, and give it a size of 25 GBStandard persistent disk type gives you a slower but cheaper disk, similar to a hard drive. SSD persistent disk gives you a faster but more expensive disk, similar to an SSD. Click Select to close this popup.
  5. Under Identity and API access, select No Service Account. This will ensure that no-one in your Jupyter server has access to your other GCP services, increasing security.
  6. Under Firewall, ensure that both HTTP and HTTPS traffic are allowed by checking those boxes.
  7. Click the Management, disks, networking, SSH keys link to configure your instance even further. Copy and paste the following code into the Startup script text box. Be sure to replace <admin-user-name> with your preferred username. Then click the Create button.
#!/bin/bash
curl https://raw.githubusercontent.com/geoffmomin/the-littlest-jupyterhub/master/bootstrap/bootstrap.py \
  | sudo python3 - \
    --admin <admin-user-name>

Now that your server has been deployed, you can access your JupyterHub Lite application by clicking the external IP.

Now we can continue within the JupyterHub Lite interface.

Configure JupyterHub Lite

To log into your JupyterHub Lite service, use the username you added above, and any password. This will become your administration account.

To add more users to your JupyterHub, you need to go into your service’s control panel:

  1. Click the Control Panel button on the top right corner.
  2. Select the Admin tab in the top left, and then you can add users with the Add Users button. 

When you logged into Jupyter for the first time, you may have noticed the warning message stating that you are accessing your server through HTTP and not HTTPS, which is more secure. Let’s set that up now.

Secure your JupyterHub Lite with HTTPS

As an admin user, open Terminal by selecting it from the dropdown on the right:

Make sure that you have registered a domain name for your JupyterHub Lite server, and have configured its DNS correctly to forward to your external IP address. Then enter the following code in Terminal:

sudo -E tljh-config set https.enabled true
sudo -E tljh-config set https.letsencrypt.email you@example.com
sudo -E tljh-config add-item https.letsencrypt.domains yourdomain.com

Then you can verify your configuration with:

sudo -E tljh-config show

Which should look like:

https:
  enabled: true
  letsencrypt:
    email: you@example.com
    domains:
    - yourdomain.com

And then reload the service with:

sudo -E tljh-config reload proxy

Now your JupyterHub Lite is secure! You can access it with https://yourdomain.com.

If you’ll be providing your team with access to third-party, open source libraries and frameworks, you can install them by using the Python pip package installer in Terminal. For example:

sudo -E pip install numpy

Note that we add the -E after the sudo command to ensure that our changes are applied across the entire server.

And that’s it! Now you have a secure, configurable JupyterHub Lite server for your development team to begin working on the next great big data or AI technology!


Geoffrey Momin is an Engineer and Technology Consultant. He is actively researching the application of blockchain, artificial intelligence and conversational interfaces to improve human capital and enterprise management.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Up Next:

Introducing nuOS 2.0: "Helium"

Introducing nuOS 2.0: "Helium"