1. Hosting Jupyter Notebooks on EC2 for Data Science Work


    To host Jupyter Notebooks on EC2 for data science work, you'll need to create an EC2 instance that will serve as the environment for running your notebooks. We will use the AWS provider in Pulumi to configure and create the necessary resources for this setup.

    Here's a step-by-step guide to what we'll be doing:

    1. Create an EC2 instance.
    2. Install Jupyter Notebook software on the instance.
    3. Configure security groups to allow access to the Jupyter Notebook server.

    First, we want to create an EC2 instance using Pulumi's AWS provider. We'll select an instance type that's suitable for data science tasks; for example, t2.medium could be a good balance of cost and performance for lightweight tasks. You might want to choose a more powerful instance type depending on your workload.

    Next, to install Jupyter, we'll use the user data script to execute commands when the instance is launched. This script runs as the root user, so you can install packages, write files, and configure settings.

    Finally, we must ensure that the security group attached to our EC2 instance allows inbound traffic to the default Jupyter Notebook server port (port 8888).

    Below is a Pulumi program that accomplishes these tasks:

    import pulumi import pulumi_aws as aws # Choose an appropriate EC2 instance size instance_type = "t2.medium" # This can be changed to fit the data science workload. # Define our AMI (Amazon Machine Image) - This AMI is based on Amazon Linux 2 ami = aws.ec2.get_ami(most_recent=True, filters=[{"name": "name", "values": ["amzn2-ami-hvm-*-x86_64-gp2"]}]) # Define the user data script to install Jupyter Notebook user_data = """ #!/bin/bash yum update -y amazon-linux-extras install -y python3 pip3 install jupyter jupyter notebook --generate-config jupyter notebook --ip --no-browser --NotebookApp.token='YourSecureToken' & """ # Create a new security group that allows SSH and Jupyter Notebook server traffic security_group = aws.ec2.SecurityGroup("jupyter-notebook-sg", description="Allow SSH and Jupyter Notebook inbound access", ingress=[ {"protocol": "tcp", "from_port": 22, "to_port": 22, "cidr_blocks": [""]}, {"protocol": "tcp", "from_port": 8888, "to_port": 8888, "cidr_blocks": [""]}, ], egress=[{"protocol": "-1", "from_port": 0, "to_port": 0, "cidr_blocks": [""]}]) # Launch an instance to run Jupyter Notebook instance = aws.ec2.Instance("jupyter-notebook-instance", instance_type=instance_type, vpc_security_group_ids=[security_group.id], ami=ami.id, user_data=user_data, tags={"Name": "JupyterNotebookInstance"}) # Export the public IP of the EC2 instance to access the Jupyter Notebook pulumi.export("jupyter_notebook_instance_ip", instance.public_ip)

    This script starts an Amazon Linux 2 EC2 instance, installs Python 3, and sets up Jupyter Notebook to run when the instance starts. The user data script also configures Jupyter Notebook to start a server accessible from any IP address. For security purposes, it is highly recommended to replace 'YourSecureToken' with a secure token of your choice.

    Once you run pulumi up with this program, Pulumi will provision the resources as defined. When the provisioning is complete, you will see the public IP address of the EC2 instance output in your terminal. You can then access the Jupyter Notebook server by navigating to http://<EC2_PUBLIC_IP>:8888 in your web browser and entering the token you set in the user data script.

    Ensure you have the AWS CLI configured and that you have installed Pulumi and set up the Pulumi project and stack before running this program.