We consider the problem of robots following natural language commands through previously unknown outdoor environments. A robot receives commands in natural language, such as “Navigate around the building to the car left of the fire hydrant and near the tree.” The robot needs first to classify its surrounding objects into categories, using images obtained from its sensors. The result of this classification is a map of the environment, where each object is given a list of semantic labels, such as “tree” or “car”, with varying degrees of confidence. Then, the robot needs to ground the nouns in the command, i.e., mapping each noun in the command into a physical object in the environment. The robot needs also to ground a specified navigation mode, such as “navigate quickly” or “navigate covertly”, as a cost map. In this work, we show how to ground nouns and navigation modes by learning from examples demonstrated by humans.