Docker & VS Code Setup - CLI Pathway

Command-line bioinformatics with reproducible containerized workflows

Setup Home GitHub Setup Docker Installation

CLI Pathway Students Only

This guide is for students using the command-line pathway (Docker containers with terminal-based bioinformatics tools).

GUI pathway students: You don't need Docker! Instead, go to GUI Tools Setup to download graphical applications.

Prerequisites - Complete These First!

Before starting Docker installation, make sure you've completed:

GitHub Account Created

Using your @ucr.edu email address

Student Developer Pack Applied

You can continue while waiting for approval (1-3 days)

Haven't completed GitHub setup? Go to GitHub Setup Guide first!

What is Docker?

Docker is a platform that packages applications and their dependencies into containers. For this course, we've created a Docker container with all the bioinformatics tools you need pre-installed (Python, R, BLAST, MAFFT, IQ-TREE, etc.).

Why Docker for bioinformatics?

  • No manual installation of complex software dependencies
  • Identical computing environment for all students (no "works on my machine" issues)
  • Reproducible research - your analyses can be replicated exactly
  • Industry standard for computational biology workflows
  • Works on Windows, Mac, and Linux

Container Distribution: Docker Hub vs GitHub Container Registry (GHCR)

Our course containers are available through two methods. Understanding when to use each helps you choose the best option for your system.

What Are Container Registries?

Container registries are storage repositories for Docker images (the templates used to create containers). Think of them like app stores for containers.

Docker Hub (Recommended for Windows/Mac)

What is it?

Docker Hub is the default, official registry for Docker images. It's maintained by Docker Inc.

Pros:

  • Works seamlessly with Docker Desktop
  • Best integration with Windows and Mac
  • Better handling of networking and volume mounts on Windows
  • Most widely used (largest image library)
  • No additional authentication needed

How to use:

docker pull cosmelab/dna-barcoding-analysis:latest

GitHub Container Registry (GHCR) (Alternative for Linux/Advanced Users)

What is it?

GHCR stores container images directly on GitHub alongside the source code. Access requires GitHub authentication.

Pros:

  • No separate Docker Hub account needed
  • Integrated with GitHub (same authentication)
  • Works well with Podman (Docker alternative)
  • Lightweight for Linux users
  • Version control integration

Cons:

  • Requires GitHub authentication
  • Less integration with Docker Desktop on Windows/Mac
  • May require manual setup for volume mounts

How to use:

# Authenticate with GitHub (one-time setup)
echo "YOUR_GITHUB_TOKEN" | docker login ghcr.io -u YOUR_USERNAME --password-stdin

# Pull the image
docker pull ghcr.io/cosmelab/dna-barcoding-analysis:latest

Which Should You Use?

Windows Users: Use Docker Hub

Docker Desktop on Windows handles Docker Hub images better, especially for volume mounts, networking, and WSL 2 integration.

Command: docker pull cosmelab/dna-barcoding-analysis:latest

Mac Users: Either Works

Both Docker Hub and GHCR work well on macOS. Docker Hub is easier (no authentication needed).

Recommended: docker pull cosmelab/dna-barcoding-analysis:latest (Docker Hub)

Linux Users: GHCR with Podman (Lightweight)

Linux users can use Podman (a Docker alternative that doesn't require a daemon) with GHCR. No root privileges needed, lightweight, compatible with Docker commands.

sudo apt-get install podman
podman pull ghcr.io/cosmelab/dna-barcoding-analysis:latest

For This Course: We recommend Docker Hub for all students unless you have specific reasons to use GHCR. Docker Hub is simpler and works consistently across all platforms.

Docker Installation by Operating System

Windows Installation

1. Check System Requirements

  • Windows 10 64-bit: Pro, Enterprise, or Education (Build 19041 or higher)
  • Windows 11 64-bit
  • Enable WSL 2 (Windows Subsystem for Linux)

2. Open PowerShell as Administrator

Right-click the Start button and select "Windows PowerShell (Admin)" or "Terminal (Admin)"

3. Enable WSL 2

In the PowerShell window, run:

wsl --install

Restart your computer when prompted.

4. Install Ubuntu

After restarting, WSL 2 is enabled but you need to install a Linux distribution. Open PowerShell again and run:

wsl --install -d Ubuntu

When Ubuntu launches for the first time:

  • Create a username (use lowercase, no spaces)
  • Create a password (you won't see it as you type - this is normal!)
  • Remember these credentials - you'll need them!

Alternative: You can also install Ubuntu from the Microsoft Store by searching for "Ubuntu".

5. Download Docker Desktop

Visit Docker Desktop for Windows and download the installer.

6. Install Docker Desktop

  • Run the installer
  • Ensure "Use WSL 2 instead of Hyper-V" is selected
  • Restart your computer when installation completes

7. Verify Installation

Open PowerShell and run:

docker --version
docker run hello-world

Success! If you see version info and "Hello from Docker!", Docker is properly installed!

Verify Docker Installation

After installing Docker, verify it's working correctly by running these commands in your terminal (PowerShell on Windows, Terminal on Mac/Linux):

Step 1: Check Docker version

docker --version

Expected output: Docker version XX.X.X, build ...

Step 2: Run test container

docker run hello-world

You should see a "Hello from Docker!" message explaining that Docker is working correctly.

Both commands worked? Great! Docker is properly installed and ready to use.

Having issues? See the Troubleshooting Guide for common Docker installation problems.

VS Code Integration with Docker Containers

VS Code can connect directly to Docker containers, allowing you to edit code, run terminal commands, and use all your extensions inside the container. This gives you a seamless development experience!

Why Connect VS Code to Containers?

  • Edit files directly inside the container environment
  • Run code with all course tools pre-installed (Python, R, BLAST, etc.)
  • Access the container's terminal from VS Code
  • Use VS Code extensions (including Copilot!) inside the container
  • Your changes are saved to your computer (not lost when container stops)

Install VS Code

  1. Go to code.visualstudio.com/download
  2. Download the installer for your operating system (Windows, Mac, or Linux)
  3. Run the installer and follow the prompts
  4. Launch VS Code

Install Dev Containers Extension

  1. Open VS Code
  2. Click Extensions icon in the left sidebar (or press Ctrl+Shift+X / Cmd+Shift+X)
  3. Search for "Dev Containers"
  4. Click "Install" on the extension by Microsoft

Direct link: Dev Containers Extension

Recommended VS Code Extensions

These extensions will make your coding experience better! Install them from the Extensions marketplace in VS Code:

Essential Extensions (Strongly Recommended)

  • Dev Containers (ms-vscode-remote.remote-containers) - Work inside Docker containers
  • Python (ms-python.python) - Python language support, debugging, linting
  • Jupyter (ms-toolsai.jupyter) - Run Jupyter notebooks in VS Code
  • Docker (ms-azuretools.vscode-docker) - Manage Docker containers and images

Helpful Extensions (Optional but Awesome!)

  • GitHub Copilot (GitHub.copilot) - AI coding assistant (requires Student Developer Pack)
  • Dracula Official (dracula-theme.theme-dracula) - Dark theme matching the course website!
  • GitLens (eamodio.gitlens) - Visualize git history, see who changed what
  • Rainbow CSV (mechatroner.rainbow-csv) - Color-code CSV columns for easy reading
  • Prettier (esbenp.prettier-vscode) - Auto-format your code

How to Install Extensions:

  1. Click Extensions icon in VS Code sidebar (or Ctrl+Shift+X / Cmd+Shift+X)
  2. Search for the extension name (e.g., "Dracula Official")
  3. Click "Install"
  4. Some extensions require reloading VS Code - click "Reload" if prompted

Dracula Theme Setup: After installing Dracula Official, press Ctrl+K Ctrl+T (or Cmd+K Cmd+T on Mac) and select "Dracula" from the list. Now your VS Code matches the course website!

Understanding Volume Mounts

Volume mounts let you share files between your computer and the container. This is crucial for saving your work!

What is a volume mount?

A volume mount creates a two-way sync between a folder on your computer (the "host") and a folder inside the Docker container. Changes in either location are immediately reflected in the other.

Example volume mount command:

docker run -v /Users/yourname/assignment:/workspace cosmelab/dna-barcoding-analysis:latest

Breaking it down:

  • -v flag tells Docker to create a volume mount
  • /Users/yourname/assignment - folder on your computer (host path)
  • :/workspace - folder inside the container (container path)
  • Any changes in /workspace inside the container are saved to your computer!

Why this matters: Without volume mounts, all your work would be lost when the container stops! Volume mounts ensure your files persist on your computer.

Create Docker Account & Login

Before pulling containers, you need a Docker account (free) to access Docker Hub.

Step 1: Create Docker Account

  1. Go to hub.docker.com/signup
  2. Create a free account using your @ucr.edu email
  3. Choose a username (you'll need this for login)
  4. Verify your email address

Step 2: Login via Terminal

After Docker is installed and running, login from your terminal:

docker login

You'll be prompted for:

  • Username: Your Docker Hub username (not email)
  • Password: Your Docker Hub password

Success message: Login Succeeded

You only need to login once - Docker will remember your credentials!

Alternative: Login via Docker Desktop GUI

If you prefer, you can also login through Docker Desktop:

  1. Open Docker Desktop application
  2. Click "Sign In" in the top-right corner
  3. Enter your Docker Hub username and password

Note: You must be logged in to Docker Hub to pull the course container. If you see "unauthorized" errors when pulling, make sure you've run docker login first!

Pull the ENTM201L Container

After logging in to Docker Hub, pull our course container:

docker pull cosmelab/dna-barcoding-analysis:latest

What's included in this container?

  • Python 3.11 with BioPython for sequence manipulation
  • R 4.3 with phylogenetic packages (ape, phangorn, tidyverse)
  • BLAST+ for sequence similarity searches
  • MAFFT and MUSCLE for sequence alignment
  • IQ-TREE for phylogenetic tree building
  • Tracy for chromatogram quality control
  • Jupyter notebooks for interactive analysis
  • All dependencies and libraries pre-configured

First-time download: The container is about 2-3 GB and may take 10-20 minutes to download depending on your internet speed. This is a one-time download!

Run Your First Container

Let's test the container by starting a Jupyter server!

Step 1: Run the container with Jupyter

docker run -it --rm -p 8888:8888 cosmelab/dna-barcoding-analysis:latest

What do these flags mean?

  • -it - Interactive terminal (lets you interact with the container)
  • --rm - Automatically remove container when it stops (keeps things clean)
  • -p 8888:8888 - Map port 8888 in container to port 8888 on your computer (for Jupyter)

Step 2: Access Jupyter in your browser

After running the command, look for output like this:

http://127.0.0.1:8888/?token=abc123def456...

Copy the entire URL and paste it into your web browser. You should see the Jupyter interface!

Success! If you see the Jupyter interface in your browser, your Docker container is working perfectly!

Note: To stop Jupyter and exit the container, press Ctrl+C twice in the terminal where the container is running.

Using Zsh Shell in the Container

The course container comes with zsh (Z Shell) pre-installed and configured with useful features like syntax highlighting and auto-suggestions. This makes working in the terminal much more pleasant!

What is Zsh?

Zsh is an advanced shell that enhances the command-line experience with:

  • Syntax highlighting - Commands are colored as you type (green = valid, red = invalid)
  • Auto-suggestions - Suggests commands based on your history (press → to accept)
  • Better tab completion - Smarter file and command completion
  • Git integration - Shows current branch and status in your prompt
  • Command history - Search through previous commands with Ctrl+R

Starting a Zsh Session

Interactive shell (recommended for exploration):

docker run -it --rm -v "$(pwd)":/workspace -w /workspace cosmelab/dna-barcoding-analysis:latest zsh

This gives you a zsh shell inside the container where you can run multiple commands interactively!

One-off commands (for running specific scripts):

docker run --rm --entrypoint="" -v "$(pwd)":/workspace -w /workspace \
  cosmelab/dna-barcoding-analysis:latest python3 script.py

The --entrypoint="" flag lets you run specific commands instead of starting the default shell.

What Do These Flags Mean?

  • -it - Interactive terminal (keeps the shell open so you can type commands)
  • --rm - Automatically remove container when you exit (keeps things clean)
  • -v "$(pwd)":/workspace - Mount current directory to /workspace in container
  • -w /workspace - Set working directory to /workspace inside container
  • --entrypoint="" - Override default entry point (use for one-off commands)

Pro tip: The $(pwd) automatically expands to your current directory path. On Windows PowerShell, use ${PWD} instead.

Example: Building a Phylogenetic Tree

Here's how you'd run a real analysis from the DNA barcoding repository:

# Mac/Linux:
docker run --rm --entrypoint="" -v "$(pwd)":/workspace -w /workspace \
  cosmelab/dna-barcoding-analysis:latest \
  python3 modules/04_phylogeny/build_tree.py \
    results/tutorial/03_alignment/aligned_sequences.fasta \
    results/tutorial/04_phylogeny/

# Windows PowerShell:
docker run --rm --entrypoint="" -v "${PWD}:/workspace" -w /workspace `
  cosmelab/dna-barcoding-analysis:latest `
  python3 modules/04_phylogeny/build_tree.py `
    results/tutorial/03_alignment/aligned_sequences.fasta `
    results/tutorial/04_phylogeny/

Why this works: The container has Python, BioPython, IQ-TREE, and all dependencies pre-installed. Your files are mounted at /workspace, so the script can read inputs and save outputs directly to your computer!

Complete Workflow Example: Working on Assignments

Here's the complete workflow for using Docker containers with VS Code for your assignments:

DNA Barcoding Analysis Repository

Before starting assignments, clone the course analysis repository. This contains tutorials, example data, and Python scripts for DNA barcoding workflows:

https://github.com/cosmelab/dna-barcoding-analysis

# Clone the repository
git clone https://github.com/cosmelab/dna-barcoding-analysis.git
cd dna-barcoding-analysis

# Explore the tutorials and example data
ls modules/        # Analysis scripts organized by module
ls data/           # Example FASTA files and sequences

This repository contains everything you need to learn the bioinformatics workflows!

1. Accept GitHub Classroom assignment

Click the assignment invitation link from your instructor. GitHub Classroom creates a private repository for you.

2. Clone the repository to your computer

git clone https://github.com/entm201l-fall2025/assignment-name-yourUsername.git
cd assignment-name-yourUsername

3. Open the folder in VS Code

code .

Or use File → Open Folder in VS Code

4. Reopen in Dev Container

VS Code will detect the .devcontainer configuration and prompt you to "Reopen in Container". Click it!

Or use Command Palette (Ctrl+Shift+P / Cmd+Shift+P) → "Dev Containers: Reopen in Container"

5. Work on your assignment

  • Read the README.md for instructions
  • Edit code files in VS Code
  • Run commands in the integrated terminal (all tools are available!)
  • Use Jupyter notebooks if provided
  • GitHub Copilot can help with coding (if you have Student Pack)

6. Save your work to GitHub

git add .
git commit -m "Completed DNA barcoding analysis"
git push

What these commands do:

  • git add . - Stage all changed files for commit
  • git commit -m "message" - Save changes with a descriptive message
  • git push - Upload changes to GitHub (submits your work!)

Assignment submitted! Your instructor can now see your work on GitHub. You can push updates until the deadline.

Pro tip: Commit and push frequently! This ensures you don't lose work and lets your instructor see your progress.

Setup Complete! What's Next?

Congratulations! You now have Docker, VS Code, and all the tools needed for command-line bioinformatics.

Ready to Start Analyzing DNA Barcoding Data?

Head to the DNA Barcoding Analysis repository for tutorials and workflows:

DNA Barcoding Analysis Repository

Need Help Troubleshooting?

If you encounter any issues with Docker, VS Code, or containers:

Troubleshooting Guide

Quick Reference - Essential Commands

Repository URL: https://github.com/cosmelab/dna-barcoding-analysis

Essential Docker commands:

# Pull latest course container
docker pull cosmelab/dna-barcoding-analysis:latest

# List downloaded images
docker images

# Run container with Jupyter
docker run -it --rm -p 8888:8888 cosmelab/dna-barcoding-analysis:latest

# Start interactive zsh shell in container (Mac/Linux)
docker run -it --rm -v "$(pwd)":/workspace -w /workspace \
  cosmelab/dna-barcoding-analysis:latest zsh

# Start interactive zsh shell in container (Windows PowerShell)
docker run -it --rm -v "${PWD}:/workspace" -w /workspace `
  cosmelab/dna-barcoding-analysis:latest zsh

# Run a specific Python script (Mac/Linux)
docker run --rm --entrypoint="" -v "$(pwd)":/workspace -w /workspace \
  cosmelab/dna-barcoding-analysis:latest \
  python3 modules/04_phylogeny/build_tree.py input.fasta output/

# Run a specific Python script (Windows PowerShell)
docker run --rm --entrypoint="" -v "${PWD}:/workspace" -w /workspace `
  cosmelab/dna-barcoding-analysis:latest `
  python3 modules/04_phylogeny/build_tree.py input.fasta output/

# List running containers
docker ps

# Stop a running container
docker stop CONTAINER_ID