Introduction

Getting Started

Set up your development environment and deploy the MinION pathogen screening pipeline.

Prerequisites

Local Tools

Required software on your machine

  • AWS CLI 2.0+

    Command line interface for AWS

  • Terraform 1.0+

    Infrastructure as Code tool

  • Python 3.9+

    Programming language runtime

  • Git

    Version control system

AWS Resources

Required AWS access and quotas

  • AWS Account

    With admin/PowerUser access

  • Service Quotas

    GPU instances, Lambda concurrency

  • Region

    Recommended: ap-northeast-1

Installation

1. Clone the Repository

git clone https://github.com/masterleopold/metagenome.git
cd metagenome

2. Install Python Dependencies

pip install -r requirements.txt

3. Configure AWS Credentials

aws configure
# Enter your AWS Access Key ID
# Enter your AWS Secret Access Key
# Default region name: ap-northeast-1
# Default output format: json

4. Set Environment Variables

export AWS_REGION=ap-northeast-1
export ENVIRONMENT=production
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)

Deploy Infrastructure

Initialize Terraform

cd infrastructure/terraform
terraform init

Plan Deployment

terraform plan \
  -var="environment=$ENVIRONMENT" \
  -var="region=$AWS_REGION" \
  -out=tfplan

Apply Infrastructure

terraform apply tfplan

# Save outputs for reference
terraform output -json > outputs.json

This will take approximately 10-15 minutes to complete.

Setup Databases

After infrastructure deployment, set up reference databases on EFS:

# Run database setup script
./tools/database_setup.sh --all

# Verify installation
./tools/database_setup.sh --check

This will download and install Kraken2, RVDB, BLAST databases, and PMDA pathogen sequences. Total download size: ~150GB. Allow 2-4 hours for completion.

Verify Deployment

# Run deployment validation
./tools/deployment_script.sh validate

# Check all services status
./tools/deployment_script.sh status

Deployment Successful

If all checks pass, your MinION pipeline is ready to use!

Next steps: Run your first workflow or explore the API documentation.

Run Your First Workflow

Upload test FAST5 files and start an analysis workflow:

# Upload FAST5 files to S3
aws s3 cp test-data/sample.fast5 s3://minion-data-production/runs/TEST-001/fast5/

# Start workflow via CLI
./tools/workflow_cli.py start \
  --run-id TEST-001 \
  --bucket minion-data-production \
  --input-prefix runs/TEST-001/fast5/

# Monitor progress
./tools/workflow_cli.py status --run-id TEST-001 --watch

Next Steps

Architecture Overview

Learn about system architecture and data flow

API Reference

Explore API endpoints and integration options