Desheng Zhang
Assistant Professor

Department of Computer Science
Rutgers University

(848) 445-8307
dz220 AT

Research Overview

Under the Smart Cities Initiative from the White House, my research on cyber-physical-systems is aimed to address emerging urban mobility challenges by big-data-driven analytics. My work is uniquely built upon large-scale urban infrastructures including various transportation systems, along with cellphone and smartcard systems, across different cities in the world. To improve these systems, I have been working on 10 TB vehicle GPS data from 400 thousand vehicles and 2 TB smartcard/cellphone data from 16 million urban residents. Technically, I am interested in (i) designing data fusion techniques (e.g., context-aware tensor decomposition) to improve quality of uncertain data by complementary features of heterogeneous-system data; (ii) formulating probabilistic/graphical model-integration techniques (e.g., Multi-view Learning) to characterize urban phenomena by capturing real-time urban-system interactions; (iii) proposing novel applications to improve mobility efficiency by optimizing data-driven models with application-specific algorithms such as approximation and online algorithms.

Research Topics

My research spans three components of Cyber-Physical Systems. (i) For the communication part, I design networking protocols for accelerated and efficient data collection from heterogeneous mobile systems. (ii) For the computation part, I analyze these collected data to formulate generic models to capture urban phenomena (such as travel pattern, traffic speed, human mobility, passenger demand, and transit supply) by advanced data analytics (such as data fusion, model integration, and interactive visualization). (iii) For the control part, I utilize these formulated models to design advanced applications such as last-mile transit, ridesharing, dispatching and navigation to provide positive feedback to urban systems by robust control (such as receding horizon optimization). As a result, my work formulates a closed-loop data-driven CPS to improve urban mobility by optimizing large-scale urban infrastructures. Some of my projects are as follows.

Modeling (1/5): Travel Pattern

Real-time human travel pattern modeling is essential to various applications. To model such patterns, numerous data-driven techniques have been proposed. However, existing techniques are mostly driven by data from a single view, e.g., a transportation view or a cellphone view, which leads to over-fitting of these single-view models. To address this issue, we propose a human mobility modeling technique based on a generic multi-view learning framework called coMobile. In coMobile, we first improve the performance of single-view models based on tensor decomposition with correlated contexts, and then we integrate these improved single-view models together for multi-view learning to iteratively obtain mutually-reinforced knowledge for real-time human mobility at urban scale. We implement coMobile based on an extremely large dataset in the Chinese city Shenzhen, including data about taxi, bus and subway passengers along with cellphone users, capturing 27 thousand vehicles and 10 million urban residents. The results show that our approach outperforms a single-view model by 51% on average. This paper is published in ACM SIGSPATIAL 2015. [PDF][Data]

Modeling (2/5): Traffic Speed

Data-driven modeling usually suffers from data sparsity, especially for large-scale modeling for urban phenomena based on single-source urban infrastructure data under fine-grained spatial-temporal contexts. To address this challenge, we motivate, design and implement UrbanCPS, a cyber-physical system with heterogeneous model integration, based on extremely-large multi-source infrastructures in a Chinese city Shenzhen, involving 42 thousand vehicles, 10 million residents, and 16 million smartcards. Based on temporal, spatial and contextual contexts, we formulate an optimization problem about how to optimally integrate models based on highly-diverse datasets, under three practical issues, i.e., heterogeneity of models, input data sparsity or unknown ground truth. Based on an integration of five models, we propose a real-world application called Speedometer, inferring real-time traffic speeds in urban areas. The evaluation results show that compared to a real-world system, Speedometer increases the inference accuracy by 21% on average. This work is published in ACM ICCPS 2015. [PDF]

Modeling (3/5): Human Mobility

Expanding our knowledge about human mobility is essential for building mobile applications. Previous mobility studies have typically been built upon empirical single-source data (e.g., cellphone or transit data), which inevitably introduces a bias against residents not contributing this type of data. To address this issue, we propose a novel architecture mPat to explore human mobility using multi-source data. A reference implementation of mPat was developed at an unprecedented scale upon urban infrastructures of Shenzhen, China. The novelty and uniqueness of mPat lie in its three layers: (i) a data feed layer consisting of real-time data feeds from 24 thousand vehicles, 16 million smart cards and 10 million cellphones; (ii) a mobility abstraction layer exploring the correlation and divergence among the multi-source data to analyze and infer human mobility; and (iii) an application layer to improve urban efficiency based on the human mobility findings of the study. The shows that mPat achieves a 75% accuracy, and its real-world application reduces passenger travel time by 36%. This work is published in ACM MobiCom 2014.  [PDF]

Modeling (4/5): Passenger Demand

Investigating passenger demand is essential for the taxicab business. Existing solutions are typically based on offline data collected by manual investigations, which are often dated and inaccurate for real-time analysis. To address this issue, we propose Dmodel, employing roving taxicabs as real-time mobile sensors to (i) infer passenger arriving moments by interactions of vacant taxicabs, and then (ii) infer passenger demand by customized online training with both historical and real-time data. Dmodel utilizes a novel parameter called pickup pattern based on an entropy of pickup events (accounts for various real-world logical information, e.g., bad weather) to reduce the size of big historical taxicab data to be processed. We evaluate Dmodel with a real-world 450 GB dataset of 14, 000 taxicabs for a half year, and results show that compared to the ground truth, Dmodel achieves 83% accuracy and outperforms a statistical model by 42%. We further present an application where Dmodel is used to dispatch vacant taxicabs to achieve an equilibrium between passenger demand and taxicab supply across urban regions. This work has been reported in IEEE BIGDATA 2014.  [PDF]

Modeling (5/5): Transit Supply

Carpooling taxicab services hold the promise of providing additional transit supply, especially in extreme weather or the rush hour when regular taxicabs are insufficient. Although many recommendation systems about regular taxicab services have been proposed recently, little research, if any, has been done to assist passengers to find a successful taxicab ride with carpooling. In this paper, we present the first systematic work to design a unified recommendation system for both the regular and carpooling services, called CallCab, based on a data driven approach. In response to a passenger’s real-time request, CallCab aims to recommend either (i) a vacant taxicab for a regular service with no detour, or (ii) an occupied taxicab heading to the similar direction for a carpooling service with the minimum detour, yet without assuming any knowledge of destinations of passengers already in taxicabs. We evaluate CallCab with a real-world dataset of 14,000 taxicabs, and results show that compared to the ground truth, CallCab reduces 60% of the total mileage to deliver all passengers and 41% of passenger’s waiting time. This work is published in IEEE BIGDATA 2013. [PDF]

Application (1/4): Last-Mile Transit

In this work, we propose a transit service Feeder to tackle the last-mile problem, i.e., passengers’ destinations lay beyond a walking distance from a public transit station. Feeder utilizes ridesharing-based vehicles to deliver passengers from existing transit stations to selected stops closer to their destinations. We infer real-time passenger demand (e.g., exiting stations and times) for Feeder design by utilizing extreme-scale urban infrastructures, which consist of 10 million cellphones, 27 thousand vehicles, and 17 thousand smartcard readers for 16 million smartcards in a Chinese city Shenzhen. Regarding these numerous devices as pervasive sensors, we mine both online and offline data for a two-end Feeder service: a back-end Feeder server to calculate schedules; front-end customized Feeder devices in vehicles for real-time schedule downloading. The results show that compared to the ground truth, Feeder reduces last-mile distances by 68% and time by 52% on average. This work is published in ACM IPSN 2015. [PDF]

Application (2/4): Carpooling

Carpooling has long held the promise of reducing gas consumption by decreasing mileage to deliver co-riders. Although ad hoc carpools already exist in the real world through private arrangements, little research on the topic has been done. In this paper, we present the first systematic work to design, implement, and evaluate a carpool service, called coRide, in a large-scale taxicab network intended to reduce total mileage for less gas consumption. Our coRide system consists of three components, a dispatching cloud server, passenger clients, and an onboard customized device, called TaxiBox. To improve coRide’s efficiency in mileage reduction, we formulate a NP-hard route calculation problem under different practical constraints. We then provide (i) an optimal algorithm using Linear Programming, (ii) a 2 approximation algorithm with a polynomial complexity, and (iii) its corresponding online version. We evaluate coRide with more than 14,000 taxicabs, and the results show that compared with the ground truth, our service reduces 33% of total mileage. This work is published in ACM SenSys 2013. [PDF]

Application (3/4): Dispatching

In the taxicab industry, a long-standing challenge is how to reduce taxicabs’ miles spent without fares, i.e., cruising miles. The current solutions for this challenge usually depend on passengers to actively provide their locations in advance for pickups. To address this challenge without the burden on passengers, in this paper, we propose a cruising system, pCruise, for taxicab drivers to find efficient routes to pick up passengers to reduce cruising miles. According to the real-time pick-up events from nearby taxicabs, pCruise characterizes a cruising process with a cruising graph, and assigns weights on edges of the cruising graph to indicate the utility of cruising corresponding road segments. Our weighting process considers the number of nearby passengers and taxicabs together in real-time, aiming at two scenarios where taxicabs are explicitly or implicitly coordinated with each other. Based on a weighted cruising graph, when a taxicab becomes vacant, pCruise provides a distributed online scheduling strategy to obtain and update an efficient cruising route with the minimum length and at least one arriving passenger. We evaluate pCruise based on a GPS dataset from a Chinese city Shenzhen with 14,000 taxicabs. The evaluation results show that pCruise assists taxicab drivers to reduce cruising miles by 42% on average. This work is published in IEEE RTSS 2012. [PDF]

Application (4/4): Advertising

Nowadays most metro advertising systems schedule advertising slots on digital advertising screens to achieve the maximum exposure to passengers by exploring passenger demand models. However, our empirical results show that these passenger demand models experience uncertainty at fine temporal granularity (e.g., per min). As a result, for fine-grained advertisements (shorter than one minute), a scheduling based on these demand models cannot achieve the maximum advertisement exposure. To address this issue, we propose an online advertising approach, called EveryoneCounts, based on an uncertain passenger demand model. It combines coarse-grained statistical demand modeling and fine-grained bayesian demand modeling by leveraging realtime card-swiping records along with both passenger mobility patterns and travel periods within metro systems. Based on this uncertain demand model, it schedules advertising time online based on robust receding horizon control to maximize the advertisement exposure. We evaluate the proposed approach based on an one-month sample from our 530 GB real-world metro fare dataset with 16 million cards. The results show that our approach provides a 61.5% lower traffic prediction error and 20% improvement on advertising efficiency on average. This work is published in IEEE BIGDATA 2015. [PDF]

Networking (1/2): Neighbor Discovery

As a supporting primitive of many mobile applications, neighbor discovery identifies nearby devices so that they can exchange information and collaborate in a peer-to-peer manner. To date, discovery schemes trade a long latency for energy efficiency and require a collaborative duty cycle pattern, and thus they are not suitable for interactive mobile applications where a user is unable to configure others’ devices. In this paper, we propose Acc, which serves as an on-demand generic discovery accelerating middleware for many deterministic neighbor discovery schemes. Acc leverages the discovery capabilities of neighbor devices, supporting both direct and indirect neighbor discoveries. Our evaluations show that Acc-assisted discovery schemes reduce latency by up to 51.8%, compared with the schemes consuming the same amount of energy. More importantly, to prove the real-world value of Acc, we further present and evaluate a Crowd-Alert application where Acc is employed by taxi drivers to accelerate selection of a direction with fewer competing taxis and more potential passengers, based on a 280 GB dataset of more than 14,000 taxis in Shenzhen, the most crowded city in China. This work is published in ACM SenSys 2012. [PDF]

Networking(2/2): Rendezvous Scheduling

In many mobile sensing applications devices need to discover new neighbors and maintain the rendezvous with known neighbors continuously. Due to the limited energy supply, these devices have to duty cycle their radios to conserve the energy and bandwidth, making neighbor discovery and rendezvous maintenance even more challenging. To date, the main mechanism for device discover and rendezvous maintenance in existing solutions is pairwise, direct one-hop communication. We argue that such pairwise direct communication is sufficient but not necessary: there exist unnecessary active slots that can be eliminated, without affecting discovery and rendezvous. In this work, we propose a novel concept of extended quorum system, which leverages indirect discovery to further conserve energy. Specifically, we use quorum graph to capture all possible information flow paths where knowledge about known-neighbors can propagate among devices. We comprehensively evaluate EQS in three different scales of networks, and the results show that EQS reduces as much as 55% energy consumption with a maximal 5% increase in latency for existing solutions. To test the real-world values of EQS, we further propose a taxicab application called EQS-dispatch to navigate taxicab drivers to the area with less competition based on the discovery results of nearby taxicabs. This work is published in IEEE ICDCS 2012. [PDF]