2026 CS 191 Capstone

SPECTRE Spatial Positioning

0
RTSP Cameras
Axis cameras for multi-angle CV
0
BLE Anchors
ESP32-WROOM-32 triangulation nodes
0
Week Timeline
Full design-to-demo sprint
Hybrid
Fusion System
Computer Vision + BLE positioning
01

Project Overview

Cameras know what they see. BLE beacons know who is nearby. SPECTRE combines both to know who is where — even when walls, furniture, or crowds get in the way.

The vision side runs four Axis RTSP cameras through YOLOv8 for person detection, ByteTrack for cross-frame identity, and MediaPipe Pose for skeletal keypoints. Homography matrices transform pixel detections into floor-plane coordinates. The BLE side uses three ESP32-WROOM-32 anchors broadcasting RSSI through Mosquitto MQTT, with FilterPy Kalman filters smoothing the noisy signal before trilateration produces a position estimate.

The hard part was occlusion. When a person walks behind an obstacle and re-enters frame, pure vision trackers give them a new identity. SPECTRE uses BLE proximity as a persistent identity anchor across gaps in visual coverage, keeping named assignments correct even through extended out-of-view periods.

02

System Architecture

Vision Pipeline

Four Axis RTSP streams feed YOLOv8 for person detection. ByteTrack sustains cross-frame identity across camera boundaries. MediaPipe Pose extracts skeletal keypoints for posture analysis.

BLE Subsystem

Three ESP32-WROOM-32 anchors broadcast RSSI readings via Mosquitto MQTT. FilterPy Kalman filters suppress multipath noise before the signal feeds into trilateration.

Coordinate Fusion

Per-camera homography matrices map pixel detections to floor-plane coordinates. CV and BLE position estimates are weighted by confidence and fused into a single location output.

Identity Persistence

BLE proximity acts as a persistent identity signal through visual occlusions. ByteTrack's appearance model handles re-identification when the person re-enters camera coverage.

03

Technical Stack

Computer Vision

YOLOv8 Person detection
ByteTrack Multi-object tracking
MediaPipe Pose Skeletal keypoints
OpenCV Frame processing

BLE / Embedded

ESP32-WROOM-32 BLE anchor nodes (×3)
Mosquitto MQTT BLE telemetry broker
FilterPy Kalman filter smoothing
Axis RTSP Camera stream protocol

Fusion & Backend

Python Core pipeline
Homography Pixel-to-floor mapping
Trilateration BLE position solving
Sensor Fusion CV + BLE integration
04

Design Challenges

Occlusion and Re-identification

A person walks behind a pillar. For a pure vision tracker, that person ceases to exist and reappears as someone new. The longer the occlusion, the worse the ID churn gets.

Fix. BLE anchor proximity runs independently of the camera feed and maintains a continuous identity signal. When the person re-enters frame, ByteTrack's appearance model and the BLE proximity vote together, restoring the correct assignment.

Multi-Camera Coordinate Alignment

Four cameras, four different fields of view, angles, and distortion profiles. A detection at pixel (340, 280) in camera 1 and at (170, 420) in camera 3 may refer to the exact same person standing in the same spot.

Fix. Pre-calibrated homography matrices for each camera, derived from a known floor grid. Every detection gets projected into a single unified top-down coordinate space before fusion runs.

RSSI Noise and Multipath Interference

BLE RSSI is not a clean distance measurement. Walls reflect signals, furniture scatters them, and a person's body absorbs them differently depending on orientation. Raw RSSI can vary by meters across samples taken milliseconds apart.

Fix. FilterPy Kalman filters run per-anchor, smoothing the RSSI time series before it enters trilateration. BLE output is treated as a soft constraint, not ground truth. When vision confidence is high, it takes precedence. When the camera sees nothing, BLE carries the position.

05

Key Learnings

When two sensor systems disagree, that's not a failure — that's the most useful data point you have. Disagreement tells you exactly where each model breaks down.

01

Sensor fusion isn't averaging. It's knowing which signal to trust more at each moment, and designing the system so the weighting adapts automatically.

02

Homography calibration seems like a one-time setup step. It isn't. Camera drift, floor warping, and temperature changes mean it needs re-verification regularly.

03

ByteTrack's appearance model is powerful but fragile in dense crowds. The identity anchoring from BLE was what made the system usable in real-world conditions.

04

MQTT is a surprisingly good glue layer for heterogeneous embedded systems. It decouples the BLE firmware from the CV pipeline cleanly enough that each can be debugged in isolation.