System Architecture
Understand the cloud-native architecture and component interactions of the MinION pipeline.
Architecture Overview
High-Level Architecture
Data Flow: MinION Sequencer → S3 Upload → Lambda Trigger → Step Functions → EC2 Processing (6 phases) → Results Storage → Report Generation
AWS Services
Compute
Scalable processing power
- LambdaOrchestration
- EC2 (g4dn.xlarge)GPU Basecalling
- EC2 (r5.4xlarge)High-memory
- Step FunctionsWorkflow
Storage
Data persistence and databases
- S3Object Storage
- EFSReference DBs
- RDS AuroraMetadata
Integration
Event-driven communication
- EventBridgeEvents
- SNSNotifications
- API GatewayREST API
Monitoring
Observability and logging
- CloudWatchMetrics & Logs
- CloudWatch AlarmsAlerts
- X-RayTracing
Component Details
Lambda Functions (16 functions)
Serverless orchestration and control
Orchestration
- • Pipeline orchestrator
- • Phase state manager
- • Error handler
EC2 Management
- • Instance launcher
- • Instance monitor
- • Instance terminator
Data Processing
- • FASTQ validator
- • Result aggregator
- • Metric calculator
Monitoring
- • Alert handler
- • Status updater
- • Cost tracker
EC2 Instances
On-demand compute for analysis phases
Security & IAM
Access control and data protection
- ✓VPC Isolation: Private subnets for EC2 instances
- ✓IAM Roles: Least-privilege access for Lambda and EC2
- ✓S3 Encryption: Server-side encryption (SSE-S3)
- ✓RDS Encryption: At-rest encryption with KMS
- ✓API Gateway: API key authentication
Data Flow
Data Upload
FAST5/POD5 files uploaded from MinION to S3 bucket (runs/{run_id}/fast5/)
Event Trigger
S3 ObjectCreated event triggers Lambda orchestrator function
Workflow Initialization
Lambda creates workflow record in RDS and starts Step Functions execution
Phase Execution
Each phase triggers Lambda to launch EC2 instance with appropriate configuration
Analysis Processing
EC2 instance downloads data from S3, processes with analysis tools, uploads results back to S3
Phase Completion
EC2 instance signals completion to Lambda, which updates RDS and terminates instance
Critical Alerts
PERV detection triggers immediate SNS notification to alert recipients
Report Generation
Final phase generates PMDA-compliant reports (PDF, JSON, HTML) and stores in S3
Cost Optimization
Spot Instances
70% cost reduction for basecalling and analysis phases
- • Automatic fallback to on-demand
- • Checkpoint/resume capability
- • Priority-based instance selection
Auto-termination
Instances automatically terminate after phase completion
- • No idle instance costs
- • Lambda-based monitoring
- • Configurable timeout protection
Serverless Services
Pay-per-use pricing for orchestration
- • Lambda (millisecond billing)
- • RDS Aurora Serverless
- • S3 Intelligent-Tiering
Data Lifecycle
Automated data retention and archival
- • Raw data: 30-day retention
- • Results: 90-day retention
- • Reports: 365-day retention