Introduction

Getting Started

Set up your development environment and deploy the MinION pathogen screening pipeline.

Prerequisites Checklist

Make sure you have all required tools and access before starting.

Prerequisites

Local Tools

Required software on your machine

AWS CLI 2.0+
Command line interface for AWS
Terraform 1.0+
Infrastructure as Code tool
Python 3.9+
Programming language runtime
Git
Version control system

AWS Resources

Required AWS access and quotas

AWS Account
With admin/PowerUser access
Service Quotas
GPU instances, Lambda concurrency
Region
Recommended: ap-northeast-1

Installation

1. Clone the Repository

git clone https://github.com/masterleopold/metagenome.git
cd metagenome

2. Install Python Dependencies

pip install -r requirements.txt

3. Configure AWS Credentials

aws configure
# Enter your AWS Access Key ID
# Enter your AWS Secret Access Key
# Default region name: ap-northeast-1
# Default output format: json

4. Set Environment Variables

export AWS_REGION=ap-northeast-1
export ENVIRONMENT=production
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)

Deploy Infrastructure

Cost Warning

Deploying this infrastructure will incur AWS charges. Review the cost estimation in the deployment guide.

Initialize Terraform

cd infrastructure/terraform
terraform init

Plan Deployment

terraform plan \
  -var="environment=$ENVIRONMENT" \
  -var="region=$AWS_REGION" \
  -out=tfplan

Apply Infrastructure

terraform apply tfplan

# Save outputs for reference
terraform output -json > outputs.json

This will take approximately 10-15 minutes to complete.

Setup Databases

After infrastructure deployment, set up reference databases on EFS:

# Run database setup script
./tools/database_setup.sh --all

# Verify installation
./tools/database_setup.sh --check

This will download and install Kraken2, RVDB, BLAST databases, and PMDA pathogen sequences. Total download size: ~150GB. Allow 2-4 hours for completion.

Verify Deployment

# Run deployment validation
./tools/deployment_script.sh validate

# Check all services status
./tools/deployment_script.sh status

Deployment Successful

If all checks pass, your MinION pipeline is ready to use!

Next steps: Run your first workflow or explore the API documentation.

Run Your First Workflow

Upload test FAST5 files and start an analysis workflow:

# Upload FAST5 files to S3
aws s3 cp test-data/sample.fast5 s3://minion-data-production/runs/TEST-001/fast5/

# Start workflow via CLI
./tools/workflow_cli.py start \
  --run-id TEST-001 \
  --bucket minion-data-production \
  --input-prefix runs/TEST-001/fast5/

# Monitor progress
./tools/workflow_cli.py status --run-id TEST-001 --watch