Monitoring & Metrics

The Transfer Learning Video Processing Pipeline includes comprehensive monitoring and metrics tracking capabilities to help you understand performance, resource usage, and processing status.

Real-time Dashboard

The pipeline provides a beautiful real-time dashboard powered by Rich, accessible through the CLI:

transfer-learning monitor

Dashboard Options

video_id

string

Filter metrics for a specific video ID

refresh_rate

number

default:"2.0"

Dashboard refresh rate in seconds

show_history

boolean

default:"false"

Show historical metrics instead of real-time data

Dashboard Sections

The dashboard is divided into several panels:

Header Panel
- Current video being processed
- Processing duration
- Frame progress
Performance Metrics
- Mean and P95 latency
- Throughput statistics
- Total processing time
System Resources
- CPU usage (color-coded)
- Memory usage (color-coded)
- Disk usage
- Network I/O
- GPU metrics (if available)
API Statistics
- API call counts
- Mean latency per API
- Success/failure rates
Custom Metrics
- User-defined metrics
- Application-specific KPIs

Metrics Collection

System Metrics

The pipeline automatically tracks system resource usage:

from transfer_learning.monitoring.metrics import MetricsTracker

metrics = MetricsTracker()
metrics.start_processing("video_id")

# System metrics are collected automatically
# Access current metrics
system_metrics = metrics.current_metrics.system
print(f"CPU Usage: {system_metrics.cpu_percent}%")

Performance Metrics

Track performance-related metrics:

# Record API call latency
metrics.record_api_call("openai_vision", 150.5)  # 150.5ms

# Add custom processing time
metrics.current_metrics.performance.add_processing_time(200.5)

# Record throughput
metrics.current_metrics.performance.add_throughput(10.5)  # 10.5 items/second

Custom Metrics

Add your own custom metrics:

# Add a custom metric
metrics.add_custom_metric("quality_score", 0.95)
metrics.add_custom_metric("detected_objects", ["person", "car", "dog"])

Using the Timer

The pipeline provides a convenient Timer context manager for timing operations:

from transfer_learning.monitoring.metrics import Timer

# Time a processing operation
with Timer(metrics_tracker, "frame_processing", operation_type="processing"):
    process_frame(frame)

# Time an API call
with Timer(metrics_tracker, "openai_api", operation_type="api"):
    response = openai.ChatCompletion.create(...)

Metrics Storage

Metrics are automatically saved to JSON files in dated directories:

metrics/
  └── 2024-03-05/
      └── video_123_20240305_143022.json

Metrics File Format

{
  "video_id": "video_123",
  "start_time": "2024-03-05T14:30:22.123456",
  "end_time": "2024-03-05T14:35:42.987654",
  "frame_count": 150,
  "processed_frames": 150,
  "performance_metrics": {
    "latency_stats": {
      "mean": 125.5,
      "median": 120.0,
      "p95": 180.5,
      "p99": 200.0
    },
    "throughput_stats": {
      "mean": 10.5,
      "max": 15.2
    }
  },
  "system_metrics": {
    "cpu_percent": 45.2,
    "memory_percent": 62.8,
    "disk_usage_percent": 78.5,
    "network_bytes_sent": 1024,
    "network_bytes_recv": 2048,
    "gpu_utilization": 85.2,
    "gpu_memory_used": 4096
  },
  "custom_metrics": {
    "quality_score": 0.95,
    "detected_objects": ["person", "car", "dog"]
  },
  "system_info": {
    "platform": "Darwin-24.3.0-x86_64",
    "python_version": "3.10.0",
    "processor": "i386",
    "memory_total": 17179869184,
    "disk_total": 994662584320
  }
}

Historical Analysis

View historical metrics for a specific video:

transfer-learning monitor --video-id=video_123 --show-history

This will load and display the most recent metrics file for the specified video ID.

Best Practices

Start/End Tracking

metrics.start_processing("video_id")
try:
    # Your processing code
finally:
    metrics.end_processing()

Use Timers for Operations

with Timer(metrics, "operation_name"):
    # Your operation code

Add Custom Metrics

metrics.add_custom_metric("key", value)

Monitor Resource Usage
- Keep an eye on system resource usage
- Set up alerts for high resource utilization
- Use color-coded indicators in the dashboard

Regular Cleanup

# Clean up old metrics files
transfer-learning cleanup --max-age-hours=168  # 1 week

Configuration

Configure monitoring settings in your .env file:

ENABLE_MONITORING=true
LOG_LEVEL=INFO
METRICS_ENABLED=true

Or use the config command:

transfer-learning config

Troubleshooting

Common Issues

High Resource Usage
- Check system metrics panel for bottlenecks
- Adjust batch sizes and concurrent processing
- Consider GPU offloading if available
Slow Processing
- Monitor API latencies in the dashboard
- Check network I/O metrics
- Optimize batch sizes and caching
Missing Metrics
- Ensure metrics tracking is enabled
- Check file permissions in metrics directory
- Verify proper start/end processing calls

Debug Mode

Enable debug logging for more detailed information:

LOG_LEVEL=DEBUG transfer-learning monitor

API Reference

MetricsTracker

start_processing

method

Start tracking metrics for a video processing run

Show Parameters

video_id

string

Unique identifier for the video being processed

end_processing

method

End tracking metrics and save results

record_api_call

method

Record an API call with its latency

Show Parameters

api_name

string

Name of the API being called

latency_ms

float

Latency of the API call in milliseconds

add_custom_metric

method

Add a custom metric

Show Parameters

name

string

Name of the custom metric

value

any

Value of the custom metric

Timer

__init__

constructor

Initialize a timer for operation tracking

Show Parameters

metrics_tracker

MetricsTracker

Instance of MetricsTracker

metric_name

string

Name of the metric to track

operation_type

string

Type of operation (‘processing’ or ‘api’)

Get Started

CLI Commands

Advanced Usage

Monitoring & Metrics

Monitoring & Metrics

Real-time Dashboard

Dashboard Options

Dashboard Sections

Metrics Collection

System Metrics

Performance Metrics

Custom Metrics

Using the Timer

Metrics Storage

Metrics File Format

Historical Analysis

Best Practices

Configuration

Troubleshooting

Common Issues

Debug Mode

API Reference

MetricsTracker

Timer

Get Started

CLI Commands

Advanced Usage

​Monitoring & Metrics

​Real-time Dashboard

​Dashboard Options

​Dashboard Sections

​Metrics Collection

​System Metrics

​Performance Metrics

​Custom Metrics

​Using the Timer

​Metrics Storage

​Metrics File Format

​Historical Analysis

​Best Practices

​Configuration

​Troubleshooting

​Common Issues

​Debug Mode

​API Reference

​MetricsTracker

​Timer

Monitoring & Metrics

Real-time Dashboard

Dashboard Options

Dashboard Sections

Metrics Collection

System Metrics

Performance Metrics

Custom Metrics

Using the Timer

Metrics Storage

Metrics File Format

Historical Analysis

Best Practices

Configuration

Troubleshooting

Common Issues

Debug Mode

API Reference

MetricsTracker

Timer