Saltar al contenido principal

System Requirements

This document outlines the hardware, software, and infrastructure requirements for deploying and running IRIS OCR in various environments.

Hardware Requirements

Minimum Requirements (Development)

ComponentSpecificationNotes
CPU4 cores @ 2.0GHzIntel i5/AMD Ryzen 5 equivalent
RAM8GBMinimum for basic development
Storage20GB free spaceIncluding models and datasets
NetworkBroadband internetFor model downloads
ComponentSpecificationNotes
CPU8+ cores @ 2.4GHzIntel i7/AMD Ryzen 7 or better
RAM16GB+32GB recommended for high throughput
Storage50GB+ SSDFast storage for model loading
NetworkGigabit ethernetFor high-volume processing
GPUNVIDIA GTX 1060+Optional but significantly improves performance

Enterprise Requirements (High-Volume Production)

ComponentSpecificationNotes
CPU16+ cores @ 2.8GHzServer-grade processors recommended
RAM64GB+For concurrent processing
Storage100GB+ NVMe SSDUltra-fast storage for ML models
Network10Gbps+For distributed deployments
GPUNVIDIA RTX 3080+ or Tesla V100+Dramatically improves ML processing

Software Requirements

Operating System Support

  • Ubuntu 20.04 LTS or newer
  • CentOS 8+ / RHEL 8+
  • Debian 11+
  • Amazon Linux 2

Windows

  • Windows 10 Pro/Enterprise (with WSL2 for development)
  • Windows Server 2019+

macOS

  • macOS Big Sur (11.0)+ (Development only)
  • Apple Silicon (M1/M2) supported with Docker

Container Runtime

Docker

  • Docker Engine 20.10+
  • Docker Compose 2.0+
  • Docker Desktop (for Windows/macOS development)

Kubernetes (Production)

  • Kubernetes 1.21+
  • Helm 3.7+
  • Ingress controller (NGINX, Traefik, etc.)

Python Runtime

Core Python

  • Python 3.8 - 3.11 (3.9 recommended)
  • pip 21.0+
  • virtualenv or conda

Key Dependencies

# Core ML/AI libraries
torch>=1.12.0
torchvision>=0.13.0
paddlepaddle>=2.4.0
paddleocr>=2.6.0
opencv-python>=4.6.0
pillow>=9.0.0
numpy>=1.21.0
scikit-learn>=1.1.0

# API framework
fastapi>=0.85.0
uvicorn>=0.18.0
pydantic>=1.10.0

# Utilities
requests>=2.28.0
aiofiles>=0.8.0
python-multipart>=0.0.5

NVIDIA GPU Support

Minimum GPU Specifications

  • Memory: 4GB VRAM minimum (8GB+ recommended)
  • CUDA Compute Capability: 6.0+ (Pascal architecture or newer)
  • Driver Version: 470+ (Linux), 472+ (Windows)

Supported GPU Models

GPU FamilyRecommended ModelsVRAMPerformance Gain
GTX 16 SeriesGTX 1660, 1660 Ti6GB2-3x faster
RTX 20 SeriesRTX 2060, 2070, 20806-8GB3-4x faster
RTX 30 SeriesRTX 3060, 3070, 30808-12GB4-6x faster
RTX 40 SeriesRTX 4060, 4070, 40808-16GB5-8x faster
Tesla/A SeriesV100, A100, A1016-80GBEnterprise-grade

CUDA Toolkit Requirements

# CUDA 11.2 - 11.8 recommended
nvidia-smi # Check driver version
nvcc --version # Check CUDA toolkit

Performance Comparison

ConfigurationProcessing Time per DocumentThroughput (docs/hour)
CPU Only (8 cores)8-12 seconds300-450
GTX 1660 Ti3-5 seconds720-1200
RTX 30702-3 seconds1200-1800
RTX 40801-2 seconds1800-3600

Network Requirements

Bandwidth Requirements

Development Environment

  • Download: 25 Mbps minimum (for model downloads)
  • Upload: 5 Mbps (for testing with sample images)

Production Environment

  • Internal Network: 1 Gbps+ between services
  • External Access: 100 Mbps+ per concurrent user
  • Content Delivery: CDN recommended for global deployments

Port Requirements

ServicePortProtocolAccess
API Gateway8000HTTP/HTTPSExternal
Image Processor8001HTTPInternal
ML Embeddings8002HTTPInternal
ML Classifier8003HTTPInternal
OCR Extractor8004HTTPInternal
Health Monitoring9090HTTPInternal
Metrics (Prometheus)9091HTTPInternal

Firewall Configuration

# Allow inbound traffic
sudo ufw allow 8000/tcp # API Gateway
sudo ufw allow 22/tcp # SSH (for management)
sudo ufw allow 443/tcp # HTTPS (production)

# Block direct access to internal services
sudo ufw deny 8001:8004/tcp # Internal services

# Allow internal network access (adjust CIDR as needed)
sudo ufw allow from 10.0.0.0/8 to any port 8001:8004

Storage Requirements

Disk Space Breakdown

ComponentDevelopmentProductionNotes
Base System5GB10GBOS and core utilities
Docker Images8GB15GBAll service containers
ML Models3GB5GBPaddleOCR and classification models
Training Data2GB10GB+Sample images and datasets
Logs & Monitoring1GB5GB+Application logs and metrics
User Data1GBVariableProcessed documents (if stored)
**Total Minimum20GB45GB+

Storage Performance

  • Development: Standard SSD (500+ MB/s)
  • Production: NVMe SSD (2000+ MB/s)
  • Enterprise: NVMe RAID or distributed storage

I/O Requirements

  • Random Read: 1000+ IOPS
  • Sequential Read: 500+ MB/s
  • Random Write: 500+ IOPS (for logging)

Database Requirements (Optional)

Metadata Storage

If using persistent storage for processing history and analytics:

SQLite (Development)

  • File Size: 100MB - 1GB
  • Concurrent Users: 1-5
  • Performance: Basic analytics only
  • Version: PostgreSQL 12+
  • Memory: 2GB+ dedicated
  • Storage: 10GB+ with regular backups
  • Concurrent Connections: 100+

MongoDB (Document Storage)

  • Version: MongoDB 5.0+
  • Memory: 4GB+ dedicated
  • Storage: 20GB+ for document metadata
  • Replica Set: Recommended for production

Cloud Platform Requirements

Amazon Web Services (AWS)

EC2 Instance Types

Use CaseInstance TypevCPUsRAMStorageCost/Month
Developmentt3.large28GB30GB EBS~$60
Productionc5.2xlarge816GB100GB EBS~$300
GPU-Enabledg4dn.xlarge416GB125GB SSD~$400
High-Volumec5.4xlarge1632GB200GB EBS~$600

Additional AWS Services

  • ECS/EKS: Container orchestration
  • ALB/NLB: Load balancing
  • S3: Model and data storage
  • CloudWatch: Monitoring and logging
  • VPC: Network isolation

Google Cloud Platform (GCP)

Compute Engine Instance Types

Use CaseMachine TypevCPUsRAMStorageCost/Month
Developmente2-standard-228GB30GB SSD~$50
Productionc2-standard-8832GB100GB SSD~$350
GPU-Enabledn1-standard-4 + T4415GB100GB SSD~$450

Additional GCP Services

  • GKE: Kubernetes management
  • Cloud Load Balancing: Traffic distribution
  • Cloud Storage: Object storage
  • Cloud Monitoring: Observability
  • VPC: Network management

Microsoft Azure

Virtual Machine Sizes

Use CaseVM SizevCPUsRAMStorageCost/Month
DevelopmentStandard_D2s_v328GB30GB SSD~$70
ProductionStandard_D8s_v3832GB100GB SSD~$400
GPU-EnabledStandard_NC6656GB340GB SSD~$900

Security Requirements

SSL/TLS Requirements

  • TLS Version: 1.2 minimum (1.3 recommended)
  • Certificate: Valid SSL certificate for production domains
  • Cipher Suites: Modern cipher suites only
  • HSTS: HTTP Strict Transport Security enabled

Authentication & Authorization

  • API Keys: Secure API key management
  • Rate Limiting: Request rate limiting per client
  • Input Validation: Strict file type and size validation
  • Network Security: Firewall and VPC configuration

Compliance (if applicable)

  • GDPR: Data protection compliance for EU users
  • HIPAA: Healthcare data compliance (if processing medical documents)
  • SOC 2: Security controls for enterprise deployments

Monitoring & Observability Requirements

Metrics Collection

  • Prometheus: Metrics aggregation
  • Grafana: Visualization dashboards
  • AlertManager: Alert routing and management

Logging

  • Centralized Logging: ELK stack or cloud logging
  • Log Retention: 30+ days minimum
  • Log Analysis: Search and alerting capabilities

Health Checks

  • Service Health: Individual service monitoring
  • Dependency Checks: External service monitoring
  • Performance Metrics: Response time and throughput tracking

Installation Verification

System Check Script

#!/bin/bash
# IRIS System Requirements Check

echo "=== IRIS OCR System Requirements Check ==="

# Check Python version
python_version=$(python3 --version 2>&1 | cut -d' ' -f2)
echo "Python version: $python_version"

# Check available memory
total_mem=$(free -h | awk '/^Mem:/ {print $2}')
echo "Total memory: $total_mem"

# Check available disk space
disk_space=$(df -h . | awk 'NR==2 {print $4}')
echo "Available disk space: $disk_space"

# Check Docker
if command -v docker &> /dev/null; then
docker_version=$(docker --version | cut -d' ' -f3 | tr -d ',')
echo "Docker version: $docker_version"
else
echo "Docker: Not installed"
fi

# Check GPU (if available)
if command -v nvidia-smi &> /dev/null; then
gpu_info=$(nvidia-smi --query-gpu=name,memory.total --format=csv,noheader,nounits | head -1)
echo "GPU: $gpu_info"
else
echo "GPU: Not available or NVIDIA drivers not installed"
fi

# Check network connectivity
if ping -c 1 google.com &> /dev/null; then
echo "Network: Connected"
else
echo "Network: No internet connection"
fi

echo "=== End System Check ==="

Performance Benchmark

# Run performance test
python scripts/benchmark/system_performance.py

# Expected output for minimum requirements:
# CPU Performance: 1000+ operations/second
# Memory Performance: 5GB/s+ bandwidth
# Disk Performance: 100MB/s+ sequential read
# Network Performance: 25Mbps+ download

Meeting these requirements ensures optimal performance and reliability for IRIS OCR in your target environment. Adjust specifications based on your expected document processing volume and performance requirements.