System Requirements
This document outlines the hardware, software, and infrastructure requirements for deploying and running IRIS OCR in various environments.
Hardware Requirements
Minimum Requirements (Development)
| Component | Specification | Notes |
|---|---|---|
| CPU | 4 cores @ 2.0GHz | Intel i5/AMD Ryzen 5 equivalent |
| RAM | 8GB | Minimum for basic development |
| Storage | 20GB free space | Including models and datasets |
| Network | Broadband internet | For model downloads |
Recommended Requirements (Production)
| Component | Specification | Notes |
|---|---|---|
| CPU | 8+ cores @ 2.4GHz | Intel i7/AMD Ryzen 7 or better |
| RAM | 16GB+ | 32GB recommended for high throughput |
| Storage | 50GB+ SSD | Fast storage for model loading |
| Network | Gigabit ethernet | For high-volume processing |
| GPU | NVIDIA GTX 1060+ | Optional but significantly improves performance |
Enterprise Requirements (High-Volume Production)
| Component | Specification | Notes |
|---|---|---|
| CPU | 16+ cores @ 2.8GHz | Server-grade processors recommended |
| RAM | 64GB+ | For concurrent processing |
| Storage | 100GB+ NVMe SSD | Ultra-fast storage for ML models |
| Network | 10Gbps+ | For distributed deployments |
| GPU | NVIDIA RTX 3080+ or Tesla V100+ | Dramatically improves ML processing |
Software Requirements
Operating System Support
Linux (Recommended)
- Ubuntu 20.04 LTS or newer
- CentOS 8+ / RHEL 8+
- Debian 11+
- Amazon Linux 2
Windows
- Windows 10 Pro/Enterprise (with WSL2 for development)
- Windows Server 2019+
macOS
- macOS Big Sur (11.0)+ (Development only)
- Apple Silicon (M1/M2) supported with Docker
Container Runtime
Docker
- Docker Engine 20.10+
- Docker Compose 2.0+
- Docker Desktop (for Windows/macOS development)
Kubernetes (Production)
- Kubernetes 1.21+
- Helm 3.7+
- Ingress controller (NGINX, Traefik, etc.)
Python Runtime
Core Python
- Python 3.8 - 3.11 (3.9 recommended)
- pip 21.0+
- virtualenv or conda
Key Dependencies
# Core ML/AI libraries
torch>=1.12.0
torchvision>=0.13.0
paddlepaddle>=2.4.0
paddleocr>=2.6.0
opencv-python>=4.6.0
pillow>=9.0.0
numpy>=1.21.0
scikit-learn>=1.1.0
# API framework
fastapi>=0.85.0
uvicorn>=0.18.0
pydantic>=1.10.0
# Utilities
requests>=2.28.0
aiofiles>=0.8.0
python-multipart>=0.0.5
GPU Requirements (Optional but Recommended)
NVIDIA GPU Support
Minimum GPU Specifications
- Memory: 4GB VRAM minimum (8GB+ recommended)
- CUDA Compute Capability: 6.0+ (Pascal architecture or newer)
- Driver Version: 470+ (Linux), 472+ (Windows)
Supported GPU Models
| GPU Family | Recommended Models | VRAM | Performance Gain |
|---|---|---|---|
| GTX 16 Series | GTX 1660, 1660 Ti | 6GB | 2-3x faster |
| RTX 20 Series | RTX 2060, 2070, 2080 | 6-8GB | 3-4x faster |
| RTX 30 Series | RTX 3060, 3070, 3080 | 8-12GB | 4-6x faster |
| RTX 40 Series | RTX 4060, 4070, 4080 | 8-16GB | 5-8x faster |
| Tesla/A Series | V100, A100, A10 | 16-80GB | Enterprise-grade |
CUDA Toolkit Requirements
# CUDA 11.2 - 11.8 recommended
nvidia-smi # Check driver version
nvcc --version # Check CUDA toolkit
Performance Comparison
| Configuration | Processing Time per Document | Throughput (docs/hour) |
|---|---|---|
| CPU Only (8 cores) | 8-12 seconds | 300-450 |
| GTX 1660 Ti | 3-5 seconds | 720-1200 |
| RTX 3070 | 2-3 seconds | 1200-1800 |
| RTX 4080 | 1-2 seconds | 1800-3600 |
Network Requirements
Bandwidth Requirements
Development Environment
- Download: 25 Mbps minimum (for model downloads)
- Upload: 5 Mbps (for testing with sample images)
Production Environment
- Internal Network: 1 Gbps+ between services
- External Access: 100 Mbps+ per concurrent user
- Content Delivery: CDN recommended for global deployments
Port Requirements
| Service | Port | Protocol | Access |
|---|---|---|---|
| API Gateway | 8000 | HTTP/HTTPS | External |
| Image Processor | 8001 | HTTP | Internal |
| ML Embeddings | 8002 | HTTP | Internal |
| ML Classifier | 8003 | HTTP | Internal |
| OCR Extractor | 8004 | HTTP | Internal |
| Health Monitoring | 9090 | HTTP | Internal |
| Metrics (Prometheus) | 9091 | HTTP | Internal |
Firewall Configuration
# Allow inbound traffic
sudo ufw allow 8000/tcp # API Gateway
sudo ufw allow 22/tcp # SSH (for management)
sudo ufw allow 443/tcp # HTTPS (production)
# Block direct access to internal services
sudo ufw deny 8001:8004/tcp # Internal services
# Allow internal network access (adjust CIDR as needed)
sudo ufw allow from 10.0.0.0/8 to any port 8001:8004
Storage Requirements
Disk Space Breakdown
| Component | Development | Production | Notes |
|---|---|---|---|
| Base System | 5GB | 10GB | OS and core utilities |
| Docker Images | 8GB | 15GB | All service containers |
| ML Models | 3GB | 5GB | PaddleOCR and classification models |
| Training Data | 2GB | 10GB+ | Sample images and datasets |
| Logs & Monitoring | 1GB | 5GB+ | Application logs and metrics |
| User Data | 1GB | Variable | Processed documents (if stored) |
| **Total Minimum | 20GB | 45GB+ |
Storage Performance
Recommended Storage Types
- Development: Standard SSD (500+ MB/s)
- Production: NVMe SSD (2000+ MB/s)
- Enterprise: NVMe RAID or distributed storage
I/O Requirements
- Random Read: 1000+ IOPS
- Sequential Read: 500+ MB/s
- Random Write: 500+ IOPS (for logging)
Database Requirements (Optional)
Metadata Storage
If using persistent storage for processing history and analytics:
SQLite (Development)
- File Size: 100MB - 1GB
- Concurrent Users: 1-5
- Performance: Basic analytics only
PostgreSQL (Recommended)
- Version: PostgreSQL 12+
- Memory: 2GB+ dedicated
- Storage: 10GB+ with regular backups
- Concurrent Connections: 100+
MongoDB (Document Storage)
- Version: MongoDB 5.0+
- Memory: 4GB+ dedicated
- Storage: 20GB+ for document metadata
- Replica Set: Recommended for production
Cloud Platform Requirements
Amazon Web Services (AWS)
EC2 Instance Types
| Use Case | Instance Type | vCPUs | RAM | Storage | Cost/Month |
|---|---|---|---|---|---|
| Development | t3.large | 2 | 8GB | 30GB EBS | ~$60 |
| Production | c5.2xlarge | 8 | 16GB | 100GB EBS | ~$300 |
| GPU-Enabled | g4dn.xlarge | 4 | 16GB | 125GB SSD | ~$400 |
| High-Volume | c5.4xlarge | 16 | 32GB | 200GB EBS | ~$600 |
Additional AWS Services
- ECS/EKS: Container orchestration
- ALB/NLB: Load balancing
- S3: Model and data storage
- CloudWatch: Monitoring and logging
- VPC: Network isolation
Google Cloud Platform (GCP)
Compute Engine Instance Types
| Use Case | Machine Type | vCPUs | RAM | Storage | Cost/Month |
|---|---|---|---|---|---|
| Development | e2-standard-2 | 2 | 8GB | 30GB SSD | ~$50 |
| Production | c2-standard-8 | 8 | 32GB | 100GB SSD | ~$350 |
| GPU-Enabled | n1-standard-4 + T4 | 4 | 15GB | 100GB SSD | ~$450 |
Additional GCP Services
- GKE: Kubernetes management
- Cloud Load Balancing: Traffic distribution
- Cloud Storage: Object storage
- Cloud Monitoring: Observability
- VPC: Network management
Microsoft Azure
Virtual Machine Sizes
| Use Case | VM Size | vCPUs | RAM | Storage | Cost/Month |
|---|---|---|---|---|---|
| Development | Standard_D2s_v3 | 2 | 8GB | 30GB SSD | ~$70 |
| Production | Standard_D8s_v3 | 8 | 32GB | 100GB SSD | ~$400 |
| GPU-Enabled | Standard_NC6 | 6 | 56GB | 340GB SSD | ~$900 |
Security Requirements
SSL/TLS Requirements
- TLS Version: 1.2 minimum (1.3 recommended)
- Certificate: Valid SSL certificate for production domains
- Cipher Suites: Modern cipher suites only
- HSTS: HTTP Strict Transport Security enabled
Authentication & Authorization
- API Keys: Secure API key management
- Rate Limiting: Request rate limiting per client
- Input Validation: Strict file type and size validation
- Network Security: Firewall and VPC configuration
Compliance (if applicable)
- GDPR: Data protection compliance for EU users
- HIPAA: Healthcare data compliance (if processing medical documents)
- SOC 2: Security controls for enterprise deployments
Monitoring & Observability Requirements
Metrics Collection
- Prometheus: Metrics aggregation
- Grafana: Visualization dashboards
- AlertManager: Alert routing and management
Logging
- Centralized Logging: ELK stack or cloud logging
- Log Retention: 30+ days minimum
- Log Analysis: Search and alerting capabilities
Health Checks
- Service Health: Individual service monitoring
- Dependency Checks: External service monitoring
- Performance Metrics: Response time and throughput tracking
Installation Verification
System Check Script
#!/bin/bash
# IRIS System Requirements Check
echo "=== IRIS OCR System Requirements Check ==="
# Check Python version
python_version=$(python3 --version 2>&1 | cut -d' ' -f2)
echo "Python version: $python_version"
# Check available memory
total_mem=$(free -h | awk '/^Mem:/ {print $2}')
echo "Total memory: $total_mem"
# Check available disk space
disk_space=$(df -h . | awk 'NR==2 {print $4}')
echo "Available disk space: $disk_space"
# Check Docker
if command -v docker &> /dev/null; then
docker_version=$(docker --version | cut -d' ' -f3 | tr -d ',')
echo "Docker version: $docker_version"
else
echo "Docker: Not installed"
fi
# Check GPU (if available)
if command -v nvidia-smi &> /dev/null; then
gpu_info=$(nvidia-smi --query-gpu=name,memory.total --format=csv,noheader,nounits | head -1)
echo "GPU: $gpu_info"
else
echo "GPU: Not available or NVIDIA drivers not installed"
fi
# Check network connectivity
if ping -c 1 google.com &> /dev/null; then
echo "Network: Connected"
else
echo "Network: No internet connection"
fi
echo "=== End System Check ==="
Performance Benchmark
# Run performance test
python scripts/benchmark/system_performance.py
# Expected output for minimum requirements:
# CPU Performance: 1000+ operations/second
# Memory Performance: 5GB/s+ bandwidth
# Disk Performance: 100MB/s+ sequential read
# Network Performance: 25Mbps+ download
Meeting these requirements ensures optimal performance and reliability for IRIS OCR in your target environment. Adjust specifications based on your expected document processing volume and performance requirements.