Development Environment Setup
This guide will help you set up a complete development environment for IRIS OCR, including all microservices, dependencies, and development tools.
Prerequisites
System Requirements
- Operating System: Linux (Ubuntu 20.04+), macOS (10.15+), or Windows 10+ with WSL2
- Memory: Minimum 8GB RAM (16GB recommended for ML models)
- Storage: At least 10GB free space for models and dependencies
- Network: Internet connection for downloading models and dependencies
Required Software
Essential Tools
# Python 3.8 or higher
python3 --version
# Should output: Python 3.8.x or higher
# Docker and Docker Compose
docker --version
docker-compose --version
# Git
git --version
# Node.js 16+ (for frontend development)
node --version
npm --version
Development Tools (Recommended)
# Visual Studio Code with extensions
code --list-extensions | grep -E "(python|docker|jupyter)"
# Postman or similar API testing tool
# curl or httpie for command-line testing
# Git GUI tool (optional)
Installation Methods
Choose one of the following installation methods based on your needs:
Method 1: Docker Development (Recommended)
This is the fastest way to get started with all services running.
1. Clone the Repository
git clone https://github.com/your-org/iris.git
cd iris
2. Configure Environment Variables
# Copy example environment file
cp .env.example .env
# Edit configuration (optional for development)
nano .env
Key Environment Variables:
# Development mode
ENVIRONMENT=development
DEBUG=true
# Service ports
API_GATEWAY_PORT=8000
IMAGE_PROCESSOR_PORT=8001
ML_EMBEDDINGS_PORT=8002
ML_CLASSIFIER_PORT=8003
OCR_EXTRACTOR_PORT=8004
# GPU support (if available)
ENABLE_GPU=false
# Model paths
MODELS_PATH=./data/models
TRAINING_DATA_PATH=./data/training
3. Build and Start Services
# Build all services
docker-compose build
# Start all services in development mode
docker-compose up -d
# Check service status
docker-compose ps
4. Verify Installation
# Test API Gateway health
curl http://localhost:8000/health
# Test complete pipeline
curl -X POST "http://localhost:8000/services" | jq .
# Expected output: All services should show "healthy" status
Method 2: Local Development Setup
For development with code editing and debugging capabilities.
1. Python Environment Setup
# Create virtual environment
python3 -m venv iris-dev
source iris-dev/bin/activate # On Windows: iris-dev\Scripts\activate
# Upgrade pip
pip install --upgrade pip setuptools wheel
2. Install Dependencies for Each Service
# API Gateway
cd packages/api-gateway
pip install -r requirements.txt
cd ../..
# Image Processor
cd packages/image-processor
pip install -r requirements.txt
cd ../..
# ML Embeddings
cd packages/ml-embeddings
pip install -r requirements.txt
cd ../..
# ML Classifier
cd packages/ml-classifier
pip install -r requirements.txt
cd ../..
# OCR Extractor
cd packages/ocr-extractor
pip install -r requirements.txt
cd ../..
# Development tools
pip install pytest pytest-cov black flake8 isort jupyter
3. Download Required Models
# Create models directory
mkdir -p data/models
# Download PaddleOCR models (automatic on first run)
python -c "from paddleocr import PaddleOCR; PaddleOCR(use_angle_cls=True, lang='es')"
# Download ML models
python scripts/setup/download_models.py
4. Start Services Individually
# Terminal 1: API Gateway
cd packages/api-gateway
python -m uvicorn app:app --host 0.0.0.0 --port 8000 --reload
# Terminal 2: Image Processor
cd packages/image-processor
python -m uvicorn app:app --host 0.0.0.0 --port 8001 --reload
# Terminal 3: ML Embeddings
cd packages/ml-embeddings
python -m uvicorn app:app --host 0.0.0.0 --port 8002 --reload
# Terminal 4: ML Classifier
cd packages/ml-classifier
python -m uvicorn app:app --host 0.0.0.0 --port 8003 --reload
# Terminal 5: OCR Extractor
cd packages/ocr-extractor
python -m uvicorn app:app --host 0.0.0.0 --port 8004 --reload
5. Use Development Scripts
# Start all services with one command
python scripts/pipeline/start-dev.py
# Stop all services
python scripts/pipeline/stop-dev.py
# Restart specific service
python scripts/pipeline/restart-service.py --service ml-classifier
Development Workflow
Project Structure Understanding
iris/
├── packages/ # Microservices
│ ├── api-gateway/ # Main orchestrator
│ ├── image-processor/ # Phase 1: Image preprocessing
│ ├── ml-embeddings/ # Phase 2: Embeddings & clustering
│ ├── ml-classifier/ # Phase 3-4: Classification
│ └── ocr-extractor/ # Phase 5-6: OCR & JSON extraction
├── scripts/ # Development and deployment scripts
│ ├── pipeline/ # Service management
│ ├── training/ # Model training
│ ├── clustering/ # Data analysis
│ └── setup/ # Environment setup
├── data/ # Data and models
│ ├── models/ # Trained ML models
│ ├── training/ # Training datasets
│ └── examples/ # Sample images
├── docs/ # Documentation
└── tests/ # Test suites
Code Style and Standards
Python Code Standards
# Format code with Black
black packages/*/
# Sort imports with isort
isort packages/*/ --profile black
# Lint with flake8
flake8 packages/*/ --max-line-length=88 --extend-ignore=E203,W503
# Type checking with mypy (optional)
mypy packages/api-gateway/app.py
Pre-commit Hooks Setup
# Install pre-commit
pip install pre-commit
# Install hooks
pre-commit install
# Configuration file: .pre-commit-config.yaml
cat > .pre-commit-config.yaml << EOF
repos:
- repo: https://github.com/psf/black
rev: 23.1.0
hooks:
- id: black
language_version: python3
- repo: https://github.com/pycqa/isort
rev: 5.12.0
hooks:
- id: isort
args: ["--profile", "black"]
- repo: https://github.com/pycqa/flake8
rev: 6.0.0
hooks:
- id: flake8
args: ["--max-line-length=88", "--extend-ignore=E203,W503"]
EOF
Testing Setup
Unit Tests
# Run all tests
pytest
# Run tests for specific service
pytest tests/test_api_gateway.py -v
# Run tests with coverage
pytest --cov=packages --cov-report=html
# View coverage report
open htmlcov/index.html
Integration Tests
# Start services for testing
docker-compose -f docker-compose.test.yml up -d
# Run integration tests
pytest tests/integration/ -v
# Test complete pipeline
python tests/test_complete_pipeline.py
Test Data Setup
# Download test images
python scripts/setup/download_test_data.py
# Verify test data
ls data/test_images/
# Should contain: cedula_sample.jpg, ficha_sample.jpg, pasaporte_sample.jpg
Debugging and Development Tools
API Testing with Postman
- Import Collection: Import
docs/postman/IRIS_API_Collection.json - Set Environment: Configure base URL as
http://localhost:8000 - Test Endpoints: Use provided examples for each service
Jupyter Notebooks for Development
# Start Jupyter server
jupyter lab --ip=0.0.0.0 --port=8888 --no-browser
# Open development notebooks
# - notebooks/development/API_Testing.ipynb
# - notebooks/development/Model_Analysis.ipynb
# - notebooks/development/Pipeline_Debugging.ipynb
Debugging Individual Services
# Add to service main file for debugging
import debugpy
# Enable debugging on port 5678
debugpy.listen(("0.0.0.0", 5678))
print("Waiting for debugger attach...")
debugpy.wait_for_client()
# VS Code launch.json configuration
{
"name": "Python: Remote Attach",
"type": "python",
"request": "attach",
"connect": {
"host": "localhost",
"port": 5678
},
"pathMappings": [
{
"localRoot": "${workspaceFolder}",
"remoteRoot": "/app"
}
]
}
Model Development
Training New Models
# Prepare training data
python scripts/training/prepare_training_data.py --input_dir data/raw_images
# Train classifier model
python scripts/training/train_classifier.py --epochs 50 --batch_size 32
# Evaluate model performance
python scripts/training/evaluate_model.py --model_path data/models/classifier_latest.pth
Data Analysis and Clustering
# Discover document classes automatically
python scripts/clustering/discover_classes.py --image_dir data/examples
# Analyze clustering results
python scripts/clustering/analyze_clusters.py --results_file clustering_results.json
# Visualize embeddings
python scripts/clustering/visualize_embeddings.py --embeddings_file embeddings.json
Configuration Management
Environment-Specific Configuration
Development Configuration
# config/development.py
DEBUG = True
LOG_LEVEL = "DEBUG"
ENABLE_PROFILING = True
MODEL_CACHE_SIZE = 3
OCR_CONFIDENCE_THRESHOLD = 0.2
ENABLE_MOCK_SERVICES = False
Production Configuration
# config/production.py
DEBUG = False
LOG_LEVEL = "INFO"
ENABLE_PROFILING = False
MODEL_CACHE_SIZE = 10
OCR_CONFIDENCE_THRESHOLD = 0.3
ENABLE_HEALTH_CHECKS = True
Service Configuration Files
Each service uses its own configuration:
# packages/ml-classifier/config.yaml
model:
architecture: "efficientnet_b0"
num_classes: 5
pretrained: true
training:
batch_size: 32
learning_rate: 0.001
epochs: 100
inference:
confidence_threshold: 0.5
batch_processing: false
Troubleshooting Common Issues
Service Startup Issues
# Check service logs
docker-compose logs api-gateway
docker-compose logs ml-classifier
# Check port conflicts
netstat -tulpn | grep :8000
# Restart specific service
docker-compose restart ml-classifier
Memory Issues
# Monitor memory usage
docker stats
# Increase Docker memory limits
# Docker Desktop: Settings > Resources > Advanced > Memory
# Clear model cache
curl -X POST "http://localhost:8003/admin/clear_cache"
Model Loading Issues
# Verify model files
ls -la data/models/
# Re-download models
python scripts/setup/download_models.py --force
# Test model loading
python -c "
import torch
model = torch.load('data/models/classifier_latest.pth')
print('Model loaded successfully')
"
GPU Setup (Optional)
NVIDIA GPU Support
# Install NVIDIA Docker support
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker
Update Docker Compose for GPU
# docker-compose.gpu.yml
services:
ml-classifier:
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
Test GPU Support
# Test NVIDIA Docker
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
# Start services with GPU support
docker-compose -f docker-compose.yml -f docker-compose.gpu.yml up -d
Next Steps
Once your development environment is set up:
- Explore the API: Start with the API Integration Guide
- Run Examples: Test with sample images in
data/examples/ - Understand the Pipeline: Read the Architecture Overview
- Train Custom Models: Follow the Model Training Guide
- Deploy to Production: See Deployment Guide
Development Support
- GitHub Issues: Report development issues and bugs
- Development Discord: Join our development community
- Documentation: This documentation is your primary resource
- Code Examples: Check
examples/directory for working code samples
Your development environment is now ready! Start by testing the API with sample images to familiarize yourself with the IRIS pipeline.