Development
Deployment Guide
Step-by-step guide to deploy the MinION pipeline to AWS.
Deployment Time & Cost
Full deployment takes approximately 30-45 minutes. AWS infrastructure will incur charges (estimated $100-300/month for development environment).
Prerequisites
AWS Account
- Active AWS account with admin/PowerUser access
- AWS CLI 2.0+ installed and configured
- Region: ap-northeast-1 (Tokyo) recommended
- Service quotas verified for GPU instances
Local Environment
- Terraform 1.0+ installed
- Python 3.9+ with pip
- Git for repository access
- Docker (optional, for local testing)
Step 1: Environment Configuration
Set Environment Variables
# Set AWS region and environment
export AWS_REGION=ap-northeast-1
export ENVIRONMENT=production
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
# Create deployment configuration file
cat > deployment.env << EOF
AWS_REGION=$AWS_REGION
ENVIRONMENT=$ENVIRONMENT
AWS_ACCOUNT_ID=$AWS_ACCOUNT_ID
ALERT_EMAIL=admin@your-domain.com
DOMAIN_NAME=api.your-domain.com # Optional
EOF
# Load configuration
source deployment.envStep 2: Deploy Infrastructure with Terraform
Initialize Terraform
cd infrastructure/terraform
# Initialize Terraform backend
terraform init \
-backend-config="bucket=terraform-state-$AWS_ACCOUNT_ID" \
-backend-config="key=minion-pipeline/$ENVIRONMENT/terraform.tfstate" \
-backend-config="region=$AWS_REGION"Plan Infrastructure
# Generate execution plan
terraform plan \
-var="environment=$ENVIRONMENT" \
-var="region=$AWS_REGION" \
-out=tfplan
# Review the plan
terraform show tfplanReview the planned changes carefully. This will create ~30 AWS resources including VPC, S3, RDS, Lambda, etc.
Apply Infrastructure
# Apply the infrastructure
terraform apply tfplan
# Save outputs for later use
terraform output -json > outputs.json
# View key outputs
terraform outputThis step takes 10-15 minutes to complete. Terraform will create all AWS resources defined in the configuration.
Step 3: Build AMIs
Basecalling AMI (GPU-enabled)
cd ec2_setup
# Build GPU-optimized AMI with Dorado
./build_basecalling_ami.sh
# Note the AMI ID from output
export BASECALLING_AMI_ID=ami-xxxxxxxxxThis AMI includes CUDA drivers, NVIDIA Docker, and Dorado basecaller. Build time: ~20 minutes.
Analysis AMI
# Build general analysis AMI
./build_analysis_ami.sh
# Note the AMI ID from output
export ANALYSIS_AMI_ID=ami-yyyyyyyyyThis AMI includes Kraken2, BLAST, Minimap2, SAMtools, and all Python dependencies. Build time: ~15 minutes.
Step 4: Setup Reference Databases
Install Databases on EFS
Kraken2, RVDB, BLAST, PMDA databases
Database download is ~150GB and takes 2-4 hours. Ensure stable internet connection.
# Launch temporary EC2 instance for database setup
aws ec2 run-instances \
--image-id $ANALYSIS_AMI_ID \
--instance-type t3.xlarge \
--subnet-id $(terraform output -raw private_subnet_id) \
--security-group-ids $(terraform output -raw security_group_id) \
--iam-instance-profile Name=MinIONEC2Role \
--tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=database-setup}]'
# Get instance ID and connect via SSM
INSTANCE_ID=i-xxxxxxxxx
aws ssm start-session --target $INSTANCE_ID
# On the EC2 instance:
sudo mkdir -p /mnt/efs
sudo mount -t nfs4 $(terraform output -raw efs_dns)/ /mnt/efs
cd /opt/minion
./tools/database_setup.sh --all
# Verify installation
./tools/database_setup.sh --checkDatabases Installed
- ✓ Kraken2 Standard Database (~50GB)
- ✓ RVDB v30.0 Viral Database (~10GB)
- ✓ BLAST PMDA Pathogen Database (~5GB)
- ✓ Sus scrofa Reference Genome (~3GB)
- ✓ PERV Reference Sequences (~1MB)
Step 5: Deploy Lambda Functions
Package and Deploy Functions
cd lambda
# Package all Lambda functions
./package_functions.sh
# Deploy orchestration functions
aws lambda create-function \
--function-name minion-pipeline-orchestrator-$ENVIRONMENT \
--runtime python3.9 \
--role arn:aws:iam::$AWS_ACCOUNT_ID:role/MinIONLambdaRole \
--handler pipeline_orchestrator.lambda_handler \
--zip-file fileb://orchestration.zip \
--timeout 30 \
--memory-size 256 \
--environment Variables="{\"ENVIRONMENT\":\"$ENVIRONMENT\",\"STATE_MACHINE_ARN\":\"arn:aws:states:$AWS_REGION:$AWS_ACCOUNT_ID:stateMachine:minion-pipeline-$ENVIRONMENT\"}"
# Deploy remaining functions (repeat for each)
# - EC2 management functions (3)
# - Data processing functions (3)
# - Monitoring functions (4)
# - Reporting functions (3)Step 6: Configure S3 Event Triggers
S3 to Lambda Integration
# Create S3 event notification configuration
cat > s3-notification.json << EOF
{
"LambdaFunctionConfigurations": [
{
"LambdaFunctionArn": "arn:aws:lambda:$AWS_REGION:$AWS_ACCOUNT_ID:function:minion-pipeline-orchestrator-$ENVIRONMENT",
"Events": ["s3:ObjectCreated:*"],
"Filter": {
"Key": {
"FilterRules": [
{"Name": "prefix", "Value": "runs/"},
{"Name": "suffix", "Value": ".fast5"}
]
}
}
}
]
}
EOF
# Apply to S3 bucket
aws s3api put-bucket-notification-configuration \
--bucket minion-data-$ENVIRONMENT \
--notification-configuration file://s3-notification.jsonStep 7: Validation
Verify Deployment
# Run deployment validation
./tools/deployment_script.sh validate
# Check all services status
./tools/deployment_script.sh status
# Test with sample data
echo "test" > test.fast5
aws s3 cp test.fast5 s3://minion-data-$ENVIRONMENT/runs/TEST-001/fast5/
# Start test workflow
./tools/workflow_cli.py start \
--run-id TEST-001 \
--bucket minion-data-$ENVIRONMENT \
--input-prefix runs/TEST-001/fast5/
# Monitor progress
./tools/workflow_cli.py status --run-id TEST-001 --watchDeployment Complete!
If all validation checks pass, your MinION pipeline is ready for production use.
Post-Deployment
Generate API Keys
# Generate secure API key
API_KEY=$(openssl rand -hex 32)
# Store in AWS Secrets Manager
aws secretsmanager create-secret \
--name minion-api-key-$ENVIRONMENT \
--secret-string "$API_KEY"
# Retrieve when needed
aws secretsmanager get-secret-value \
--secret-id minion-api-key-$ENVIRONMENT \
--query SecretString --output textSetup Cost Alerts
# Create billing alarm
aws cloudwatch put-metric-alarm \
--alarm-name minion-cost-alarm-$ENVIRONMENT \
--alarm-description "MinION pipeline cost alert" \
--metric-name EstimatedCharges \
--namespace AWS/Billing \
--statistic Maximum \
--period 86400 \
--evaluation-periods 1 \
--threshold 500 \
--comparison-operator GreaterThanThresholdConfigure Monitoring Dashboard
Access CloudWatch dashboard at:
https://console.aws.amazon.com/cloudwatch/home?region=$AWS_REGION#dashboards:name=minion-pipeline-$ENVIRONMENTTroubleshooting
Terraform State Lock
terraform force-unlock LOCK_IDEFS Mount Failed
# Check security group rules
aws ec2 describe-security-groups --group-ids sg-xxxxxxxxx
# Add NFS rule (port 2049)
aws ec2 authorize-security-group-ingress \
--group-id sg-xxxxxxxxx \
--protocol tcp \
--port 2049 \
--source-group sg-xxxxxxxxxLambda Timeout
# Increase Lambda timeout
aws lambda update-function-configuration \
--function-name FUNCTION_NAME \
--timeout 300