Human detection has various applications, e.g., autonomous driving car, surveillance system, and retail. Human detection with multispectral images (i.e., RGB and thermal images) is first studied. We deeply analyze Faster R-CNN for this detection task and then model multispectral human detection into a fusion problem of convolutional networks (ConvNets). We design four distinct ConvNet fusion architectures that integrate two-branch ConvNets on different stages of neural networks, all of which yield better performance compared with the baseline detector. Secondly, instance-level contextual information in crowded scenes is exploited to boost human detection. Based on a context graph that incorporates both geometric and social contextual patterns from crowds, we apply label propagation to discover weak detections that are contextually compatible with true detections while suppressing irrelevant false alarms. The method significantly improve the performance of any shallow human detectors, obtaining comparable results to deep learning based methods.