Learning to predict how an environment will evolve and the consequences of one's actions is an important ability for autonomous agents, and can enable planning with relatively few interactions with the environment which may be slow or costly. However, learning an accurate predictive model is made difficult due to several challenges, such as partial observability, long-term dependencies and inherent uncertainty in the environment. In this talk, I will present my work on architectures designed to address some of these challenges, as well as work focused on better understanding recurrent network memory over long timescales. I will then present some recent work applying learned environment models for planning, using a simple gradient-based approach which can be used in both discrete and continuous action spaces. This approach is able to match or outperform model-free methods while requiring fewer environment interactions and still enabling real-time performance.