Monitoring & Metrics
The Transfer Learning Video Processing Pipeline includes comprehensive monitoring and metrics tracking capabilities to help you understand performance, resource usage, and processing status.
Real-time Dashboard
The pipeline provides a beautiful real-time dashboard powered by Rich, accessible through the CLI:
transfer-learning monitor
Dashboard Options
Filter metrics for a specific video ID
Dashboard refresh rate in seconds
Show historical metrics instead of real-time data
Dashboard Sections
The dashboard is divided into several panels:
Header Panel
Current video being processed
Processing duration
Frame progress
Performance Metrics
Mean and P95 latency
Throughput statistics
Total processing time
System Resources
CPU usage (color-coded)
Memory usage (color-coded)
Disk usage
Network I/O
GPU metrics (if available)
API Statistics
API call counts
Mean latency per API
Success/failure rates
Custom Metrics
User-defined metrics
Application-specific KPIs
Metrics Collection
System Metrics
The pipeline automatically tracks system resource usage:
from transfer_learning.monitoring.metrics import MetricsTracker
metrics = MetricsTracker()
metrics.start_processing( "video_id" )
# System metrics are collected automatically
# Access current metrics
system_metrics = metrics.current_metrics.system
print ( f "CPU Usage: { system_metrics.cpu_percent } %" )
Track performance-related metrics:
# Record API call latency
metrics.record_api_call( "openai_vision" , 150.5 ) # 150.5ms
# Add custom processing time
metrics.current_metrics.performance.add_processing_time( 200.5 )
# Record throughput
metrics.current_metrics.performance.add_throughput( 10.5 ) # 10.5 items/second
Custom Metrics
Add your own custom metrics:
# Add a custom metric
metrics.add_custom_metric( "quality_score" , 0.95 )
metrics.add_custom_metric( "detected_objects" , [ "person" , "car" , "dog" ])
Using the Timer
The pipeline provides a convenient Timer context manager for timing operations:
from transfer_learning.monitoring.metrics import Timer
# Time a processing operation
with Timer(metrics_tracker, "frame_processing" , operation_type = "processing" ):
process_frame(frame)
# Time an API call
with Timer(metrics_tracker, "openai_api" , operation_type = "api" ):
response = openai.ChatCompletion.create( ... )
Metrics Storage
Metrics are automatically saved to JSON files in dated directories:
metrics/
└── 2024-03-05/
└── video_123_20240305_143022.json
{
"video_id" : "video_123" ,
"start_time" : "2024-03-05T14:30:22.123456" ,
"end_time" : "2024-03-05T14:35:42.987654" ,
"frame_count" : 150 ,
"processed_frames" : 150 ,
"performance_metrics" : {
"latency_stats" : {
"mean" : 125.5 ,
"median" : 120.0 ,
"p95" : 180.5 ,
"p99" : 200.0
},
"throughput_stats" : {
"mean" : 10.5 ,
"max" : 15.2
}
},
"system_metrics" : {
"cpu_percent" : 45.2 ,
"memory_percent" : 62.8 ,
"disk_usage_percent" : 78.5 ,
"network_bytes_sent" : 1024 ,
"network_bytes_recv" : 2048 ,
"gpu_utilization" : 85.2 ,
"gpu_memory_used" : 4096
},
"custom_metrics" : {
"quality_score" : 0.95 ,
"detected_objects" : [ "person" , "car" , "dog" ]
},
"system_info" : {
"platform" : "Darwin-24.3.0-x86_64" ,
"python_version" : "3.10.0" ,
"processor" : "i386" ,
"memory_total" : 17179869184 ,
"disk_total" : 994662584320
}
}
Historical Analysis
View historical metrics for a specific video:
transfer-learning monitor --video-id=video_123 --show-history
This will load and display the most recent metrics file for the specified video ID.
Best Practices
Start/End Tracking
metrics.start_processing( "video_id" )
try :
# Your processing code
finally :
metrics.end_processing()
Use Timers for Operations
with Timer(metrics, "operation_name" ):
# Your operation code
Add Custom Metrics
metrics.add_custom_metric( "key" , value)
Monitor Resource Usage
Keep an eye on system resource usage
Set up alerts for high resource utilization
Use color-coded indicators in the dashboard
Regular Cleanup
# Clean up old metrics files
transfer-learning cleanup --max-age-hours=168 # 1 week
Configuration
Configure monitoring settings in your .env file:
ENABLE_MONITORING = true
LOG_LEVEL = INFO
METRICS_ENABLED = true
Or use the config command:
Troubleshooting
Common Issues
High Resource Usage
Check system metrics panel for bottlenecks
Adjust batch sizes and concurrent processing
Consider GPU offloading if available
Slow Processing
Monitor API latencies in the dashboard
Check network I/O metrics
Optimize batch sizes and caching
Missing Metrics
Ensure metrics tracking is enabled
Check file permissions in metrics directory
Verify proper start/end processing calls
Debug Mode
Enable debug logging for more detailed information:
LOG_LEVEL = DEBUG transfer-learning monitor
API Reference
MetricsTracker
Start tracking metrics for a video processing run Unique identifier for the video being processed
End tracking metrics and save results
Record an API call with its latency Name of the API being called
Latency of the API call in milliseconds
Add a custom metric Name of the custom metric
Value of the custom metric
Timer
Initialize a timer for operation tracking Instance of MetricsTracker
Name of the metric to track
Type of operation (‘processing’ or ‘api’)