The talk will focus on selected challenges in modern large-scale machine learning in two settings: i) large model (deep learning) setting and ii) large data setting. Despite the success of convex methods, deep learning methods, where the objective is inherently highly non-convex, have enjoyed a resurgence of interest in the last few years and they achieve state-of-the-art performances on a number of tasks. In the first part of the talk we present recent results regarding deep learning models. We demonstrate empirical evidence that for deep networks i) most local minima recovered by SGD-type techniques are equivalent and yield similar performance on a train/test set and ii) local minima recovered by these techniques are flat and generalize well. We show a construction of an algorithm, Entropy-SGD, that is tailored to explore the flat and well-generalizing parts of the energy landscape using local entropy regularization. We also show how the flatness of the landscape can be explored in the parallel optimization setting. The obtained algorithms empirically outperform state-o-the-art tools in terms of the running time and generalization abilities. As a part of the general theme of the talk (accelerating deep network models), the talk will next consider the case when the learning algorithm needs to be scaled to large data. The multi-class classification problem will be addressed, where the number of classes (k) is extremely large, with the goal of obtaining train and test time complexity logarithmic in the number of classes. A reduction of this problem to a set of binary classification problems organized in a tree structure will be discussed. A top-down online tree construction approach for constructing logarithmic depth trees will be demonstrated, which is based on a new objective function. The approach extends to the deep learning setting easily and along with efficient optimization tools presented in the first part of the talk can be used to develop efficient large deep learning systems. Finally, the talk will also mention some of the most recent work of Anna Choromanska and her lab. In particular: exploring orthogonality of parametrization of deep networks in the context of domain adaptation and network compression as well as deep learning structures for source separation problems.
Anna Choromanska is an Assistant Professor in the Department of Electrical and Computer Engineering at NYU Tandon School of Engineering. Before joining NYU Tandon, she did her Post-Doctoral in the Computer Science Department at Courant Institute of Mathematical Sciences, New York University under the supervision of prof. Yann LeCun. She graduated with her PhD from Columbia University, Department of Electrical Engineering, where she was the The Fu Foundation School of Engineering and Applied Science Presidential Fellowship holder. She was advised by prof. Tony Jebara. She completed her MSc with distinctions in the Department of Electronics and Information Technology, Warsaw University of Technology with double specialization, Electronics and Computer Engineering and Electronics and Informatics in Medicine. She was working with various industrial institutions, including AT&T Research Laboratories, IBM T.J. Watson Research Center and Microsoft Research New York. Her research interests are in machine learning, optimization and statistics with applications in biomedicine and neurobiology. She also holds a music degree from Mieczyslaw Karlowicz Music School in Warsaw, Department of Piano Play. She is an avid salsa dancer performing with the Ache Performance Group. Her other hobbies is painting and photography.