The real-time vehicle sensing at urban scale is essential to various urban services. To date, most existing approaches rely on static infrastructures (e.g., traffic cameras) or mobile services (e.g., smartphone apps). However, these approaches are often inadequate for urban scale vehicle sensing at the individual level because of their static natures or low penetration rates. In this paper, we design a sensing system called coSense to utilize commercial vehicular fleets (e.g., taxis, buses, and trucks) for real-time vehicle sensing at urban scale, given (i) the availability of well-equipped commercial fleets sensing other vehicles by onboard cameras or peer-to-peer communication, and (ii) an increasing trend of connected vehicles and autonomous vehicles with periodical status broadcasts for safety applications. Compared to existing solutions based on cameras and smartphones, the key features of coSense are in its high penetration rates and transparent sensing for participating drivers. The key technical challenge we addressed is how to recover spatiotemporal sensing gaps by considering various mobility patterns of commercial vehicles with deep learning. We evaluate coSense with a preliminary road test and a large-scale trace-driven evaluation based on vehicular fleets in the Chinese city Shenzhen, including 14 thousand taxis, 13 thousand buses, 13 thousand trucks, and 10 thousand regular vehicles. We compare coSense to infrastructure and cellphone-based approaches, and the results show that we increase the sensing accuracy by 10.1% and 16.6% on average.