Work
Anonymized case studies showcasing real-world optimizations and deployments on constrained systems.
Low-Latency Edge Audio Enhancement for Multi-Mic Device
Smart Devices
Challenge
Client needed real-time noise suppression and beamforming on a battery-powered device with <256MB RAM and strict latency requirements (<20ms).
Solution
Implemented fixed-point adaptive beamformer with INT16 processing pipeline. Optimized using ARM NEON intrinsics. Deployed on RTOS with dedicated audio thread at highest priority.
Impact
- →Latency: 12ms end-to-end (40% below target)
- →Memory: 180MB peak (30% headroom)
- →Power: 2.5x battery life improvement vs. baseline
- →SNR improvement: 15dB in noisy environments
On-Device Vision Model Optimization for UAV Payload
Defence & UAV
Challenge
Autonomous UAV needed real-time object detection with compute/power/thermal constraints. Intermittent connectivity required offline-first inference.
Solution
Quantized YOLOv8 to INT8 using QAT. Deployed on NPU with custom TFLite delegate. Implemented model versioning and A/B testing on-device.
Impact
- →Inference time: 35ms per frame (1920x1080)
- →Accuracy: 92% mAP (vs 94% FP32 baseline)
- →Model size: 4.2MB (16x compression)
- →Power draw: 1.8W inference (within thermal envelope)
Quantized Inference Pipeline for Constrained Embedded Platform
Industrial IoT
Challenge
Industrial sensor platform needed predictive maintenance ML on Cortex-M4 MCU with 512KB flash and 128KB RAM.
Solution
Trained compact anomaly detection model. Applied aggressive quantization (INT8) and pruning (70% sparsity). Used TFLite Micro runtime with custom ops.
Impact
- →Model fits in 48KB flash
- →Inference: 80ms at 168MHz
- →Anomaly detection: 88% F1 score
- →Deployed to 10,000+ devices via OTA