Core Concepts

System Architecture

Understand the cloud-native architecture and component interactions of the MinION pipeline.

High-Level Architecture

MinION
Sequencer
S3
Upload
Lambda Orchestration
Event-driven workflow management
Step Functions Workflow
6-phase analysis pipeline
Basecalling
GPU EC2
QC
EC2
Host Removal
EC2
Pathogen Detection
High-mem EC2
Quantification
EC2
Reporting
EC2
Reports
PDF, JSON, HTML

Data Flow: MinION Sequencer → S3 Upload → Lambda Trigger → Step Functions → EC2 Processing (6 phases) → Results Storage → Report Generation

AWS Services

Compute

Scalable processing power

  • Lambda
    Orchestration
  • EC2 (g4dn.xlarge)
    GPU Basecalling
  • EC2 (r5.4xlarge)
    High-memory
  • Step Functions
    Workflow

Storage

Data persistence and databases

  • S3
    Object Storage
  • EFS
    Reference DBs
  • RDS Aurora
    Metadata

Integration

Event-driven communication

  • EventBridge
    Events
  • SNS
    Notifications
  • API Gateway
    REST API

Monitoring

Observability and logging

  • CloudWatch
    Metrics & Logs
  • CloudWatch Alarms
    Alerts
  • X-Ray
    Tracing

Component Details

Lambda Functions (16 functions)

Serverless orchestration and control

Orchestration

  • • Pipeline orchestrator
  • • Phase state manager
  • • Error handler

EC2 Management

  • • Instance launcher
  • • Instance monitor
  • • Instance terminator

Data Processing

  • • FASTQ validator
  • • Result aggregator
  • • Metric calculator

Monitoring

  • • Alert handler
  • • Status updater
  • • Cost tracker

EC2 Instances

On-demand compute for analysis phases

GPU
g4dn.xlarge
Basecalling with Dorado (NVIDIA T4 GPU)
4 vCPU, 16GB RAM, 1x T4 GPU
Memory
r5.4xlarge
Pathogen detection (Kraken2, BLAST)
16 vCPU, 128GB RAM
General
t3.large / r5.xlarge
QC, Host Removal, Quantification, Reporting
2-4 vCPU, 8-32GB RAM

Security & IAM

Access control and data protection

  • VPC Isolation: Private subnets for EC2 instances
  • IAM Roles: Least-privilege access for Lambda and EC2
  • S3 Encryption: Server-side encryption (SSE-S3)
  • RDS Encryption: At-rest encryption with KMS
  • API Gateway: API key authentication

Data Flow

1

Data Upload

FAST5/POD5 files uploaded from MinION to S3 bucket (runs/{run_id}/fast5/)

2

Event Trigger

S3 ObjectCreated event triggers Lambda orchestrator function

3

Workflow Initialization

Lambda creates workflow record in RDS and starts Step Functions execution

4

Phase Execution

Each phase triggers Lambda to launch EC2 instance with appropriate configuration

5

Analysis Processing

EC2 instance downloads data from S3, processes with analysis tools, uploads results back to S3

6

Phase Completion

EC2 instance signals completion to Lambda, which updates RDS and terminates instance

7

Critical Alerts

PERV detection triggers immediate SNS notification to alert recipients

8

Report Generation

Final phase generates PMDA-compliant reports (PDF, JSON, HTML) and stores in S3

Cost Optimization

Spot Instances

70% cost reduction for basecalling and analysis phases

  • • Automatic fallback to on-demand
  • • Checkpoint/resume capability
  • • Priority-based instance selection

Auto-termination

Instances automatically terminate after phase completion

  • • No idle instance costs
  • • Lambda-based monitoring
  • • Configurable timeout protection

Serverless Services

Pay-per-use pricing for orchestration

  • • Lambda (millisecond billing)
  • • RDS Aurora Serverless
  • • S3 Intelligent-Tiering

Data Lifecycle

Automated data retention and archival

  • • Raw data: 30-day retention
  • • Results: 90-day retention
  • • Reports: 365-day retention