Saltar al contenido principal

Development Environment Setup

This guide will help you set up a complete development environment for IRIS OCR, including all microservices, dependencies, and development tools.

Prerequisites

System Requirements

  • Operating System: Linux (Ubuntu 20.04+), macOS (10.15+), or Windows 10+ with WSL2
  • Memory: Minimum 8GB RAM (16GB recommended for ML models)
  • Storage: At least 10GB free space for models and dependencies
  • Network: Internet connection for downloading models and dependencies

Required Software

Essential Tools

# Python 3.8 or higher
python3 --version
# Should output: Python 3.8.x or higher

# Docker and Docker Compose
docker --version
docker-compose --version

# Git
git --version

# Node.js 16+ (for frontend development)
node --version
npm --version
# Visual Studio Code with extensions
code --list-extensions | grep -E "(python|docker|jupyter)"

# Postman or similar API testing tool
# curl or httpie for command-line testing
# Git GUI tool (optional)

Installation Methods

Choose one of the following installation methods based on your needs:

This is the fastest way to get started with all services running.

1. Clone the Repository

git clone https://github.com/your-org/iris.git
cd iris

2. Configure Environment Variables

# Copy example environment file
cp .env.example .env

# Edit configuration (optional for development)
nano .env

Key Environment Variables:

# Development mode
ENVIRONMENT=development
DEBUG=true

# Service ports
API_GATEWAY_PORT=8000
IMAGE_PROCESSOR_PORT=8001
ML_EMBEDDINGS_PORT=8002
ML_CLASSIFIER_PORT=8003
OCR_EXTRACTOR_PORT=8004

# GPU support (if available)
ENABLE_GPU=false

# Model paths
MODELS_PATH=./data/models
TRAINING_DATA_PATH=./data/training

3. Build and Start Services

# Build all services
docker-compose build

# Start all services in development mode
docker-compose up -d

# Check service status
docker-compose ps

4. Verify Installation

# Test API Gateway health
curl http://localhost:8000/health

# Test complete pipeline
curl -X POST "http://localhost:8000/services" | jq .

# Expected output: All services should show "healthy" status

Method 2: Local Development Setup

For development with code editing and debugging capabilities.

1. Python Environment Setup

# Create virtual environment
python3 -m venv iris-dev
source iris-dev/bin/activate # On Windows: iris-dev\Scripts\activate

# Upgrade pip
pip install --upgrade pip setuptools wheel

2. Install Dependencies for Each Service

# API Gateway
cd packages/api-gateway
pip install -r requirements.txt
cd ../..

# Image Processor
cd packages/image-processor
pip install -r requirements.txt
cd ../..

# ML Embeddings
cd packages/ml-embeddings
pip install -r requirements.txt
cd ../..

# ML Classifier
cd packages/ml-classifier
pip install -r requirements.txt
cd ../..

# OCR Extractor
cd packages/ocr-extractor
pip install -r requirements.txt
cd ../..

# Development tools
pip install pytest pytest-cov black flake8 isort jupyter

3. Download Required Models

# Create models directory
mkdir -p data/models

# Download PaddleOCR models (automatic on first run)
python -c "from paddleocr import PaddleOCR; PaddleOCR(use_angle_cls=True, lang='es')"

# Download ML models
python scripts/setup/download_models.py

4. Start Services Individually

# Terminal 1: API Gateway
cd packages/api-gateway
python -m uvicorn app:app --host 0.0.0.0 --port 8000 --reload

# Terminal 2: Image Processor
cd packages/image-processor
python -m uvicorn app:app --host 0.0.0.0 --port 8001 --reload

# Terminal 3: ML Embeddings
cd packages/ml-embeddings
python -m uvicorn app:app --host 0.0.0.0 --port 8002 --reload

# Terminal 4: ML Classifier
cd packages/ml-classifier
python -m uvicorn app:app --host 0.0.0.0 --port 8003 --reload

# Terminal 5: OCR Extractor
cd packages/ocr-extractor
python -m uvicorn app:app --host 0.0.0.0 --port 8004 --reload

5. Use Development Scripts

# Start all services with one command
python scripts/pipeline/start-dev.py

# Stop all services
python scripts/pipeline/stop-dev.py

# Restart specific service
python scripts/pipeline/restart-service.py --service ml-classifier

Development Workflow

Project Structure Understanding

iris/
├── packages/ # Microservices
│ ├── api-gateway/ # Main orchestrator
│ ├── image-processor/ # Phase 1: Image preprocessing
│ ├── ml-embeddings/ # Phase 2: Embeddings & clustering
│ ├── ml-classifier/ # Phase 3-4: Classification
│ └── ocr-extractor/ # Phase 5-6: OCR & JSON extraction
├── scripts/ # Development and deployment scripts
│ ├── pipeline/ # Service management
│ ├── training/ # Model training
│ ├── clustering/ # Data analysis
│ └── setup/ # Environment setup
├── data/ # Data and models
│ ├── models/ # Trained ML models
│ ├── training/ # Training datasets
│ └── examples/ # Sample images
├── docs/ # Documentation
└── tests/ # Test suites

Code Style and Standards

Python Code Standards

# Format code with Black
black packages/*/

# Sort imports with isort
isort packages/*/ --profile black

# Lint with flake8
flake8 packages/*/ --max-line-length=88 --extend-ignore=E203,W503

# Type checking with mypy (optional)
mypy packages/api-gateway/app.py

Pre-commit Hooks Setup

# Install pre-commit
pip install pre-commit

# Install hooks
pre-commit install

# Configuration file: .pre-commit-config.yaml
cat > .pre-commit-config.yaml << EOF
repos:
- repo: https://github.com/psf/black
rev: 23.1.0
hooks:
- id: black
language_version: python3

- repo: https://github.com/pycqa/isort
rev: 5.12.0
hooks:
- id: isort
args: ["--profile", "black"]

- repo: https://github.com/pycqa/flake8
rev: 6.0.0
hooks:
- id: flake8
args: ["--max-line-length=88", "--extend-ignore=E203,W503"]
EOF

Testing Setup

Unit Tests

# Run all tests
pytest

# Run tests for specific service
pytest tests/test_api_gateway.py -v

# Run tests with coverage
pytest --cov=packages --cov-report=html

# View coverage report
open htmlcov/index.html

Integration Tests

# Start services for testing
docker-compose -f docker-compose.test.yml up -d

# Run integration tests
pytest tests/integration/ -v

# Test complete pipeline
python tests/test_complete_pipeline.py

Test Data Setup

# Download test images
python scripts/setup/download_test_data.py

# Verify test data
ls data/test_images/
# Should contain: cedula_sample.jpg, ficha_sample.jpg, pasaporte_sample.jpg

Debugging and Development Tools

API Testing with Postman

  1. Import Collection: Import docs/postman/IRIS_API_Collection.json
  2. Set Environment: Configure base URL as http://localhost:8000
  3. Test Endpoints: Use provided examples for each service

Jupyter Notebooks for Development

# Start Jupyter server
jupyter lab --ip=0.0.0.0 --port=8888 --no-browser

# Open development notebooks
# - notebooks/development/API_Testing.ipynb
# - notebooks/development/Model_Analysis.ipynb
# - notebooks/development/Pipeline_Debugging.ipynb

Debugging Individual Services

# Add to service main file for debugging
import debugpy

# Enable debugging on port 5678
debugpy.listen(("0.0.0.0", 5678))
print("Waiting for debugger attach...")
debugpy.wait_for_client()

# VS Code launch.json configuration
{
"name": "Python: Remote Attach",
"type": "python",
"request": "attach",
"connect": {
"host": "localhost",
"port": 5678
},
"pathMappings": [
{
"localRoot": "${workspaceFolder}",
"remoteRoot": "/app"
}
]
}

Model Development

Training New Models

# Prepare training data
python scripts/training/prepare_training_data.py --input_dir data/raw_images

# Train classifier model
python scripts/training/train_classifier.py --epochs 50 --batch_size 32

# Evaluate model performance
python scripts/training/evaluate_model.py --model_path data/models/classifier_latest.pth

Data Analysis and Clustering

# Discover document classes automatically
python scripts/clustering/discover_classes.py --image_dir data/examples

# Analyze clustering results
python scripts/clustering/analyze_clusters.py --results_file clustering_results.json

# Visualize embeddings
python scripts/clustering/visualize_embeddings.py --embeddings_file embeddings.json

Configuration Management

Environment-Specific Configuration

Development Configuration

# config/development.py
DEBUG = True
LOG_LEVEL = "DEBUG"
ENABLE_PROFILING = True
MODEL_CACHE_SIZE = 3
OCR_CONFIDENCE_THRESHOLD = 0.2
ENABLE_MOCK_SERVICES = False

Production Configuration

# config/production.py
DEBUG = False
LOG_LEVEL = "INFO"
ENABLE_PROFILING = False
MODEL_CACHE_SIZE = 10
OCR_CONFIDENCE_THRESHOLD = 0.3
ENABLE_HEALTH_CHECKS = True

Service Configuration Files

Each service uses its own configuration:

# packages/ml-classifier/config.yaml
model:
architecture: "efficientnet_b0"
num_classes: 5
pretrained: true

training:
batch_size: 32
learning_rate: 0.001
epochs: 100

inference:
confidence_threshold: 0.5
batch_processing: false

Troubleshooting Common Issues

Service Startup Issues

# Check service logs
docker-compose logs api-gateway
docker-compose logs ml-classifier

# Check port conflicts
netstat -tulpn | grep :8000

# Restart specific service
docker-compose restart ml-classifier

Memory Issues

# Monitor memory usage
docker stats

# Increase Docker memory limits
# Docker Desktop: Settings > Resources > Advanced > Memory

# Clear model cache
curl -X POST "http://localhost:8003/admin/clear_cache"

Model Loading Issues

# Verify model files
ls -la data/models/

# Re-download models
python scripts/setup/download_models.py --force

# Test model loading
python -c "
import torch
model = torch.load('data/models/classifier_latest.pth')
print('Model loaded successfully')
"

GPU Setup (Optional)

NVIDIA GPU Support

# Install NVIDIA Docker support
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update && sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker

Update Docker Compose for GPU

# docker-compose.gpu.yml
services:
ml-classifier:
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]

Test GPU Support

# Test NVIDIA Docker
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

# Start services with GPU support
docker-compose -f docker-compose.yml -f docker-compose.gpu.yml up -d

Next Steps

Once your development environment is set up:

  1. Explore the API: Start with the API Integration Guide
  2. Run Examples: Test with sample images in data/examples/
  3. Understand the Pipeline: Read the Architecture Overview
  4. Train Custom Models: Follow the Model Training Guide
  5. Deploy to Production: See Deployment Guide

Development Support

  • GitHub Issues: Report development issues and bugs
  • Development Discord: Join our development community
  • Documentation: This documentation is your primary resource
  • Code Examples: Check examples/ directory for working code samples

Your development environment is now ready! Start by testing the API with sample images to familiarize yourself with the IRIS pipeline.