Getting Started
Set up your development environment and deploy the MinION pathogen screening pipeline.
Prerequisites Checklist
Prerequisites
Local Tools
Required software on your machine
- AWS CLI 2.0+
Command line interface for AWS
- Terraform 1.0+
Infrastructure as Code tool
- Python 3.9+
Programming language runtime
- Git
Version control system
AWS Resources
Required AWS access and quotas
- AWS Account
With admin/PowerUser access
- Service Quotas
GPU instances, Lambda concurrency
- Region
Recommended: ap-northeast-1
Installation
1. Clone the Repository
git clone https://github.com/masterleopold/metagenome.git
cd metagenome2. Install Python Dependencies
pip install -r requirements.txt3. Configure AWS Credentials
aws configure
# Enter your AWS Access Key ID
# Enter your AWS Secret Access Key
# Default region name: ap-northeast-1
# Default output format: json4. Set Environment Variables
export AWS_REGION=ap-northeast-1
export ENVIRONMENT=production
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)Deploy Infrastructure
Cost Warning
Initialize Terraform
cd infrastructure/terraform
terraform initPlan Deployment
terraform plan \
-var="environment=$ENVIRONMENT" \
-var="region=$AWS_REGION" \
-out=tfplanApply Infrastructure
terraform apply tfplan
# Save outputs for reference
terraform output -json > outputs.jsonThis will take approximately 10-15 minutes to complete.
Setup Databases
After infrastructure deployment, set up reference databases on EFS:
# Run database setup script
./tools/database_setup.sh --all
# Verify installation
./tools/database_setup.sh --checkThis will download and install Kraken2, RVDB, BLAST databases, and PMDA pathogen sequences. Total download size: ~150GB. Allow 2-4 hours for completion.
Verify Deployment
# Run deployment validation
./tools/deployment_script.sh validate
# Check all services status
./tools/deployment_script.sh statusDeployment Successful
If all checks pass, your MinION pipeline is ready to use!
Next steps: Run your first workflow or explore the API documentation.
Run Your First Workflow
Upload test FAST5 files and start an analysis workflow:
# Upload FAST5 files to S3
aws s3 cp test-data/sample.fast5 s3://minion-data-production/runs/TEST-001/fast5/
# Start workflow via CLI
./tools/workflow_cli.py start \
--run-id TEST-001 \
--bucket minion-data-production \
--input-prefix runs/TEST-001/fast5/
# Monitor progress
./tools/workflow_cli.py status --run-id TEST-001 --watchNext Steps
Architecture Overview
Learn about system architecture and data flow
API Reference
Explore API endpoints and integration options