Development

Deployment Guide

Step-by-step guide to deploy the MinION pipeline to AWS.

Prerequisites

AWS Account

  • Active AWS account with admin/PowerUser access
  • AWS CLI 2.0+ installed and configured
  • Region: ap-northeast-1 (Tokyo) recommended
  • Service quotas verified for GPU instances

Local Environment

  • Terraform 1.0+ installed
  • Python 3.9+ with pip
  • Git for repository access
  • Docker (optional, for local testing)

Step 1: Environment Configuration

Set Environment Variables

# Set AWS region and environment
export AWS_REGION=ap-northeast-1
export ENVIRONMENT=production
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)

# Create deployment configuration file
cat > deployment.env << EOF
AWS_REGION=$AWS_REGION
ENVIRONMENT=$ENVIRONMENT
AWS_ACCOUNT_ID=$AWS_ACCOUNT_ID
ALERT_EMAIL=admin@your-domain.com
DOMAIN_NAME=api.your-domain.com  # Optional
EOF

# Load configuration
source deployment.env

Step 2: Deploy Infrastructure with Terraform

Initialize Terraform

cd infrastructure/terraform

# Initialize Terraform backend
terraform init \
  -backend-config="bucket=terraform-state-$AWS_ACCOUNT_ID" \
  -backend-config="key=minion-pipeline/$ENVIRONMENT/terraform.tfstate" \
  -backend-config="region=$AWS_REGION"

Plan Infrastructure

# Generate execution plan
terraform plan \
  -var="environment=$ENVIRONMENT" \
  -var="region=$AWS_REGION" \
  -out=tfplan

# Review the plan
terraform show tfplan

Review the planned changes carefully. This will create ~30 AWS resources including VPC, S3, RDS, Lambda, etc.

Apply Infrastructure

# Apply the infrastructure
terraform apply tfplan

# Save outputs for later use
terraform output -json > outputs.json

# View key outputs
terraform output

Step 3: Build AMIs

Basecalling AMI (GPU-enabled)

cd ec2_setup

# Build GPU-optimized AMI with Dorado
./build_basecalling_ami.sh

# Note the AMI ID from output
export BASECALLING_AMI_ID=ami-xxxxxxxxx

This AMI includes CUDA drivers, NVIDIA Docker, and Dorado basecaller. Build time: ~20 minutes.

Analysis AMI

# Build general analysis AMI
./build_analysis_ami.sh

# Note the AMI ID from output
export ANALYSIS_AMI_ID=ami-yyyyyyyyy

This AMI includes Kraken2, BLAST, Minimap2, SAMtools, and all Python dependencies. Build time: ~15 minutes.

Step 4: Setup Reference Databases

Install Databases on EFS

Kraken2, RVDB, BLAST, PMDA databases

# Launch temporary EC2 instance for database setup
aws ec2 run-instances \
  --image-id $ANALYSIS_AMI_ID \
  --instance-type t3.xlarge \
  --subnet-id $(terraform output -raw private_subnet_id) \
  --security-group-ids $(terraform output -raw security_group_id) \
  --iam-instance-profile Name=MinIONEC2Role \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=database-setup}]'

# Get instance ID and connect via SSM
INSTANCE_ID=i-xxxxxxxxx
aws ssm start-session --target $INSTANCE_ID

# On the EC2 instance:
sudo mkdir -p /mnt/efs
sudo mount -t nfs4 $(terraform output -raw efs_dns)/ /mnt/efs

cd /opt/minion
./tools/database_setup.sh --all

# Verify installation
./tools/database_setup.sh --check

Databases Installed

  • ✓ Kraken2 Standard Database (~50GB)
  • ✓ RVDB v30.0 Viral Database (~10GB)
  • ✓ BLAST PMDA Pathogen Database (~5GB)
  • ✓ Sus scrofa Reference Genome (~3GB)
  • ✓ PERV Reference Sequences (~1MB)

Step 5: Deploy Lambda Functions

Package and Deploy Functions

cd lambda

# Package all Lambda functions
./package_functions.sh

# Deploy orchestration functions
aws lambda create-function \
  --function-name minion-pipeline-orchestrator-$ENVIRONMENT \
  --runtime python3.9 \
  --role arn:aws:iam::$AWS_ACCOUNT_ID:role/MinIONLambdaRole \
  --handler pipeline_orchestrator.lambda_handler \
  --zip-file fileb://orchestration.zip \
  --timeout 30 \
  --memory-size 256 \
  --environment Variables="{\"ENVIRONMENT\":\"$ENVIRONMENT\",\"STATE_MACHINE_ARN\":\"arn:aws:states:$AWS_REGION:$AWS_ACCOUNT_ID:stateMachine:minion-pipeline-$ENVIRONMENT\"}"

# Deploy remaining functions (repeat for each)
# - EC2 management functions (3)
# - Data processing functions (3)
# - Monitoring functions (4)
# - Reporting functions (3)

Step 6: Configure S3 Event Triggers

S3 to Lambda Integration

# Create S3 event notification configuration
cat > s3-notification.json << EOF
{
  "LambdaFunctionConfigurations": [
    {
      "LambdaFunctionArn": "arn:aws:lambda:$AWS_REGION:$AWS_ACCOUNT_ID:function:minion-pipeline-orchestrator-$ENVIRONMENT",
      "Events": ["s3:ObjectCreated:*"],
      "Filter": {
        "Key": {
          "FilterRules": [
            {"Name": "prefix", "Value": "runs/"},
            {"Name": "suffix", "Value": ".fast5"}
          ]
        }
      }
    }
  ]
}
EOF

# Apply to S3 bucket
aws s3api put-bucket-notification-configuration \
  --bucket minion-data-$ENVIRONMENT \
  --notification-configuration file://s3-notification.json

Step 7: Validation

Verify Deployment

# Run deployment validation
./tools/deployment_script.sh validate

# Check all services status
./tools/deployment_script.sh status

# Test with sample data
echo "test" > test.fast5
aws s3 cp test.fast5 s3://minion-data-$ENVIRONMENT/runs/TEST-001/fast5/

# Start test workflow
./tools/workflow_cli.py start \
  --run-id TEST-001 \
  --bucket minion-data-$ENVIRONMENT \
  --input-prefix runs/TEST-001/fast5/

# Monitor progress
./tools/workflow_cli.py status --run-id TEST-001 --watch

Post-Deployment

Generate API Keys

# Generate secure API key
API_KEY=$(openssl rand -hex 32)

# Store in AWS Secrets Manager
aws secretsmanager create-secret \
  --name minion-api-key-$ENVIRONMENT \
  --secret-string "$API_KEY"

# Retrieve when needed
aws secretsmanager get-secret-value \
  --secret-id minion-api-key-$ENVIRONMENT \
  --query SecretString --output text

Setup Cost Alerts

# Create billing alarm
aws cloudwatch put-metric-alarm \
  --alarm-name minion-cost-alarm-$ENVIRONMENT \
  --alarm-description "MinION pipeline cost alert" \
  --metric-name EstimatedCharges \
  --namespace AWS/Billing \
  --statistic Maximum \
  --period 86400 \
  --evaluation-periods 1 \
  --threshold 500 \
  --comparison-operator GreaterThanThreshold

Configure Monitoring Dashboard

Access CloudWatch dashboard at:

https://console.aws.amazon.com/cloudwatch/home?region=$AWS_REGION#dashboards:name=minion-pipeline-$ENVIRONMENT

Troubleshooting

Terraform State Lock

terraform force-unlock LOCK_ID

EFS Mount Failed

# Check security group rules
aws ec2 describe-security-groups --group-ids sg-xxxxxxxxx

# Add NFS rule (port 2049)
aws ec2 authorize-security-group-ingress \
  --group-id sg-xxxxxxxxx \
  --protocol tcp \
  --port 2049 \
  --source-group sg-xxxxxxxxx

Lambda Timeout

# Increase Lambda timeout
aws lambda update-function-configuration \
  --function-name FUNCTION_NAME \
  --timeout 300