Development Environment Setup

This guide will help you set up a complete development environment for IRIS OCR, including all microservices, dependencies, and development tools.

Prerequisites

System Requirements

Operating System: Linux (Ubuntu 20.04+), macOS (10.15+), or Windows 10+ with WSL2
Memory: Minimum 8GB RAM (16GB recommended for ML models)
Storage: At least 10GB free space for models and dependencies
Network: Internet connection for downloading models and dependencies

Required Software

Essential Tools

# Python 3.8 or higher
python3 --version
# Should output: Python 3.8.x or higher

# Docker and Docker Compose
docker --version
docker-compose --version

# Git
git --version

# Node.js 16+ (for frontend development)
node --version
npm --version

Development Tools (Recommended)

# Visual Studio Code with extensions
code --list-extensions | grep -E "(python|docker|jupyter)"

# Postman or similar API testing tool
# curl or httpie for command-line testing
# Git GUI tool (optional)

Installation Methods

Choose one of the following installation methods based on your needs:

Method 1: Docker Development (Recommended)

This is the fastest way to get started with all services running.

1. Clone the Repository

git clone https://github.com/your-org/iris.git
cd iris

2. Configure Environment Variables

# Copy example environment file
cp .env.example .env

# Edit configuration (optional for development)
nano .env

Key Environment Variables:

# Development mode
ENVIRONMENT=development
DEBUG=true

# Service ports
API_GATEWAY_PORT=8000
IMAGE_PROCESSOR_PORT=8001
ML_EMBEDDINGS_PORT=8002
ML_CLASSIFIER_PORT=8003
OCR_EXTRACTOR_PORT=8004

# GPU support (if available)
ENABLE_GPU=false

# Model paths
MODELS_PATH=./data/models
TRAINING_DATA_PATH=./data/training

3. Build and Start Services

# Build all services
docker-compose build

# Start all services in development mode
docker-compose up -d

# Check service status
docker-compose ps

4. Verify Installation

# Test API Gateway health
curl http://localhost:8000/health

# Test complete pipeline
curl -X POST "http://localhost:8000/services" | jq .

# Expected output: All services should show "healthy" status

Method 2: Local Development Setup

For development with code editing and debugging capabilities.

1. Python Environment Setup

# Create virtual environment
python3 -m venv iris-dev
source iris-dev/bin/activate  # On Windows: iris-dev\Scripts\activate

# Upgrade pip
pip install --upgrade pip setuptools wheel

2. Install Dependencies for Each Service

# API Gateway
cd packages/api-gateway
pip install -r requirements.txt
cd ../..

# Image Processor
cd packages/image-processor
pip install -r requirements.txt
cd ../..

# ML Embeddings
cd packages/ml-embeddings
pip install -r requirements.txt
cd ../..

# ML Classifier
cd packages/ml-classifier
pip install -r requirements.txt
cd ../..

# OCR Extractor
cd packages/ocr-extractor
pip install -r requirements.txt
cd ../..

# Development tools
pip install pytest pytest-cov black flake8 isort jupyter

3. Download Required Models

# Create models directory
mkdir -p data/models

# Download PaddleOCR models (automatic on first run)
python -c "from paddleocr import PaddleOCR; PaddleOCR(use_angle_cls=True, lang='es')"

# Download ML models
python scripts/setup/download_models.py

4. Start Services Individually

# Terminal 1: API Gateway
cd packages/api-gateway
python -m uvicorn app:app --host 0.0.0.0 --port 8000 --reload

# Terminal 2: Image Processor
cd packages/image-processor
python -m uvicorn app:app --host 0.0.0.0 --port 8001 --reload

# Terminal 3: ML Embeddings
cd packages/ml-embeddings
python -m uvicorn app:app --host 0.0.0.0 --port 8002 --reload

# Terminal 4: ML Classifier
cd packages/ml-classifier
python -m uvicorn app:app --host 0.0.0.0 --port 8003 --reload

# Terminal 5: OCR Extractor
cd packages/ocr-extractor
python -m uvicorn app:app --host 0.0.0.0 --port 8004 --reload

5. Use Development Scripts

# Start all services with one command
python scripts/pipeline/start-dev.py

# Stop all services
python scripts/pipeline/stop-dev.py

# Restart specific service
python scripts/pipeline/restart-service.py --service ml-classifier

Development Workflow

Project Structure Understanding

iris/
├── packages/                    # Microservices
│   ├── api-gateway/            # Main orchestrator
│   ├── image-processor/        # Phase 1: Image preprocessing
│   ├── ml-embeddings/          # Phase 2: Embeddings & clustering
│   ├── ml-classifier/          # Phase 3-4: Classification
│   └── ocr-extractor/          # Phase 5-6: OCR & JSON extraction
├── scripts/                    # Development and deployment scripts
│   ├── pipeline/               # Service management
│   ├── training/               # Model training
│   ├── clustering/             # Data analysis
│   └── setup/                  # Environment setup
├── data/                       # Data and models
│   ├── models/                 # Trained ML models
│   ├── training/               # Training datasets
│   └── examples/               # Sample images
├── docs/                       # Documentation
└── tests/                      # Test suites

Code Style and Standards

Python Code Standards

# Format code with Black
black packages/*/

# Sort imports with isort
isort packages/*/ --profile black

# Lint with flake8
flake8 packages/*/ --max-line-length=88 --extend-ignore=E203,W503

# Type checking with mypy (optional)
mypy packages/api-gateway/app.py

Pre-commit Hooks Setup

# Install pre-commit
pip install pre-commit

# Install hooks
pre-commit install

# Configuration file: .pre-commit-config.yaml
cat > .pre-commit-config.yaml << EOF
repos:
  - repo: https://github.com/psf/black
    rev: 23.1.0
    hooks:
      - id: black
        language_version: python3

  - repo: https://github.com/pycqa/isort
    rev: 5.12.0
    hooks:
      - id: isort
        args: ["--profile", "black"]

  - repo: https://github.com/pycqa/flake8
    rev: 6.0.0
    hooks:
      - id: flake8
        args: ["--max-line-length=88", "--extend-ignore=E203,W503"]
EOF

Testing Setup

Unit Tests

# Run all tests
pytest

# Run tests for specific service
pytest tests/test_api_gateway.py -v

# Run tests with coverage
pytest --cov=packages --cov-report=html

# View coverage report
open htmlcov/index.html

Integration Tests

# Start services for testing
docker-compose -f docker-compose.test.yml up -d

# Run integration tests
pytest tests/integration/ -v

# Test complete pipeline
python tests/test_complete_pipeline.py

Test Data Setup

# Download test images
python scripts/setup/download_test_data.py

# Verify test data
ls data/test_images/
# Should contain: cedula_sample.jpg, ficha_sample.jpg, pasaporte_sample.jpg

Debugging and Development Tools

API Testing with Postman

Import Collection: Import docs/postman/IRIS_API_Collection.json
Set Environment: Configure base URL as http://localhost:8000
Test Endpoints: Use provided examples for each service

Jupyter Notebooks for Development

# Start Jupyter server
jupyter lab --ip=0.0.0.0 --port=8888 --no-browser

# Open development notebooks
# - notebooks/development/API_Testing.ipynb
# - notebooks/development/Model_Analysis.ipynb
# - notebooks/development/Pipeline_Debugging.ipynb

Debugging Individual Services

# Add to service main file for debugging
import debugpy

# Enable debugging on port 5678
debugpy.listen(("0.0.0.0", 5678))
print("Waiting for debugger attach...")
debugpy.wait_for_client()

# VS Code launch.json configuration
{
    "name": "Python: Remote Attach",
    "type": "python",
    "request": "attach",
    "connect": {
        "host": "localhost",
        "port": 5678
    },
    "pathMappings": [
        {
            "localRoot": "${workspaceFolder}",
            "remoteRoot": "/app"
        }
    ]
}

Model Development

Training New Models

# Prepare training data
python scripts/training/prepare_training_data.py --input_dir data/raw_images

# Train classifier model
python scripts/training/train_classifier.py --epochs 50 --batch_size 32

# Evaluate model performance
python scripts/training/evaluate_model.py --model_path data/models/classifier_latest.pth

Data Analysis and Clustering

# Discover document classes automatically
python scripts/clustering/discover_classes.py --image_dir data/examples

# Analyze clustering results
python scripts/clustering/analyze_clusters.py --results_file clustering_results.json

# Visualize embeddings
python scripts/clustering/visualize_embeddings.py --embeddings_file embeddings.json

Configuration Management

Environment-Specific Configuration

Development Configuration

# config/development.py
DEBUG = True
LOG_LEVEL = "DEBUG"
ENABLE_PROFILING = True
MODEL_CACHE_SIZE = 3
OCR_CONFIDENCE_THRESHOLD = 0.2
ENABLE_MOCK_SERVICES = False

Production Configuration

# config/production.py
DEBUG = False
LOG_LEVEL = "INFO"
ENABLE_PROFILING = False
MODEL_CACHE_SIZE = 10
OCR_CONFIDENCE_THRESHOLD = 0.3
ENABLE_HEALTH_CHECKS = True

Service Configuration Files

Each service uses its own configuration:

# packages/ml-classifier/config.yaml
model:
  architecture: "efficientnet_b0"
  num_classes: 5
  pretrained: true
  
training:
  batch_size: 32
  learning_rate: 0.001
  epochs: 100
  
inference:
  confidence_threshold: 0.5
  batch_processing: false

Troubleshooting Common Issues

Service Startup Issues

# Check service logs
docker-compose logs api-gateway
docker-compose logs ml-classifier

# Check port conflicts
netstat -tulpn | grep :8000

# Restart specific service
docker-compose restart ml-classifier

Memory Issues

# Monitor memory usage
docker stats

# Increase Docker memory limits
# Docker Desktop: Settings > Resources > Advanced > Memory

# Clear model cache
curl -X POST "http://localhost:8003/admin/clear_cache"

Model Loading Issues

# Verify model files
ls -la data/models/

# Re-download models
python scripts/setup/download_models.py --force

# Test model loading
python -c "
import torch
model = torch.load('data/models/classifier_latest.pth')
print('Model loaded successfully')
"

GPU Setup (Optional)

NVIDIA GPU Support

# Install NVIDIA Docker support
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update && sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker

Update Docker Compose for GPU

# docker-compose.gpu.yml
services:
  ml-classifier:
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]

Test GPU Support

# Test NVIDIA Docker
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

# Start services with GPU support
docker-compose -f docker-compose.yml -f docker-compose.gpu.yml up -d

Next Steps

Once your development environment is set up:

Explore the API: Start with the API Integration Guide
Run Examples: Test with sample images in data/examples/
Understand the Pipeline: Read the Architecture Overview
Train Custom Models: Follow the Model Training Guide
Deploy to Production: See Deployment Guide

Development Support

GitHub Issues: Report development issues and bugs
Development Discord: Join our development community
Documentation: This documentation is your primary resource
Code Examples: Check examples/ directory for working code samples

Your development environment is now ready! Start by testing the API with sample images to familiarize yourself with the IRIS pipeline.

Prerequisites​

System Requirements​

Required Software​

Essential Tools​

Development Tools (Recommended)​

Installation Methods​

Method 1: Docker Development (Recommended)​

1. Clone the Repository​

2. Configure Environment Variables​

3. Build and Start Services​

4. Verify Installation​

Method 2: Local Development Setup​

1. Python Environment Setup​

2. Install Dependencies for Each Service​

3. Download Required Models​

4. Start Services Individually​

5. Use Development Scripts​

Development Workflow​

Project Structure Understanding​

Code Style and Standards​

Python Code Standards​

Pre-commit Hooks Setup​

Testing Setup​

Unit Tests​

Integration Tests​

Test Data Setup​

Debugging and Development Tools​

API Testing with Postman​

Jupyter Notebooks for Development​

Debugging Individual Services​

Model Development​

Training New Models​

Data Analysis and Clustering​

Configuration Management​

Environment-Specific Configuration​

Development Configuration​

Production Configuration​

Service Configuration Files​

Troubleshooting Common Issues​

Service Startup Issues​

Memory Issues​

Model Loading Issues​

GPU Setup (Optional)​

NVIDIA GPU Support​

Update Docker Compose for GPU​

Test GPU Support​

Next Steps​

Development Support​