The future of computing is in allowing our devices to see what we see; envisioned wearable systems will continuously interpret vision data for real-time analytics. However, today’s system software and imaging hardware are ill-suited for such “continuous mobile vision.” Current systems -- highly optimized for photography -- fail to achieve sufficient energy efficiency or privacy preservation. This talk provides a rethinking of the vision system stack that includes application frameworks, operating system and sensor hardware to improve efficiency by two orders of magnitude. This cross-layer rethinking contributes: (1) a split-process application framework that eliminates redundancy in data movement and processing across multiple concurrent applications, (2) operating system optimizations for energy proportional image capture, and (3) a mixed-signal image sensor architecture that processes data in the analog domain to eliminate the efficiency bottleneck of analog-digital conversion. The talk will briefly share future plans to further continuous mobile vision by exploiting the hardware/software boundary for improved energy efficiency and effective privacy preservation, opening the door to integrate our devices with our real-world environments and ultimately, our own lives.