Human detection has various applications, e.g., autonomous driving car, surveillance system, and retail. In this dissertation, we first exploit multispectral images (i.e., RGB and thermal images) for human detection. We extensively analyze Faster R-CNN for the detection task and then model multispectral human detection into a fusion problem of convolutional networks (ConvNets). We design four distinct ConvNet fusion architectures that integrate two-branch ConvNets on different stages of neural networks, all of which yield better performance com- pared with the baseline detector. In the second part of this dissertation, we leverage instance- level contextual information in crowded scenes boost performance of human detection. Based on a context graph that incorporates both geometric and social contextual patterns from crowds, we apply progressive potential propagation algorithm to discover weak detections that are con- textually compatible with true detections while suppressing irrelevant false alarms. The method significantly improves the performance of any shallow human detectors, obtaining comparable results to deep learning based methods.