BEGIN:VCALENDAR VERSION:2.0 PRODID:-//jEvents 2.0 for Joomla//EN CALSCALE:GREGORIAN METHOD:PUBLISH BEGIN:VTIMEZONE TZID:America/New_York BEGIN:STANDARD DTSTART:20181105T110000 RDATE:20190310T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20191103T010000 RDATE:20200308T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20201101T010000 RDATE:20210314T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20211107T010000 RDATE:20220313T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20221106T010000 RDATE:20230312T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20231105T010000 RDATE:20240310T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20241103T010000 RDATE:20250309T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20251102T010000 RDATE:20260308T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20261101T010000 RDATE:20270314T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20271107T010000 RDATE:20280312T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20281105T010000 RDATE:20290311T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20291104T010000 RDATE:20300310T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20301103T010000 RDATE:20310309T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20311102T010000 RDATE:20320314T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20321107T010000 RDATE:20330313T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20331106T010000 RDATE:20340312T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20341105T010000 RDATE:20350311T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:DAYLIGHT DTSTART:20190310T030000 RDATE:20191103T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20200308T030000 RDATE:20201101T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20210314T030000 RDATE:20211107T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20220313T030000 RDATE:20221106T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20230312T030000 RDATE:20231105T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20240310T030000 RDATE:20241103T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20250309T030000 RDATE:20251102T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20260308T030000 RDATE:20261101T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20270314T030000 RDATE:20271107T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20280312T030000 RDATE:20281105T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20290311T030000 RDATE:20291104T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20300310T030000 RDATE:20301103T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20310309T030000 RDATE:20311102T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20320314T030000 RDATE:20321107T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20330313T030000 RDATE:20331106T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20340312T030000 RDATE:20341105T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT END:VTIMEZONE BEGIN:VEVENT UID:80859faa2bc1b2c3bba832920258e8bb CATEGORIES:Seminar CREATED:20191119T115113 SUMMARY:Recent Advances in Stochastic Gradient Methods: From Convex to Non-convex Optimization and Deep Learning LOCATION:CoRE A 301 DESCRIPTION;ENCODING=QUOTED-PRINTABLE:
For many large-sca le optimization and machine learning problems, first-order methods and thei r accelerated variants based on momentum have been a leading approach for c omputing low-to-medium accuracy solutions because of their cheap iterations and mild dependence on the problem dimension and data size. Even though mo mentum-based accelerated gradient (AG) methods proposed by Nesterov for con vex optimization converges provably faster than gradient descent (GD) in th e absence of noise, the comparison is no longer clear when the gradients ar e stochastic; containing random gradient errors.
In the first part of the talk, we focus o n stochastic gradient (SGD) and accelerated stochastic gradient (ASG) metho ds for convex optimization when the gradient has random errors. We study th e trade-offs between convergence rate and robustness to gradient errors in designing a first-order algorithm and provide a systematic way of trading o ff these two in an optimal fashion. Our results show that stochastic moment um methods can achieve acceleration while being more robust to random gradi ent errors. Our framework also leads to "optimal" algorithms that can perfo rm better than other state-of-the-art methods in the presence of random gra dient noise. We also discuss extensions of our results and algorithms to di stributed convex optimization problems.
In the second part of the talk, we focus on SGD fo r non-convex optimization and deep learning. The gradient noise (GN) in the SGD algorithm is often considered to be Gaussian in the large data regime by assuming that the classical central limit theorem (CLT) kicks in. This a ssumption is often made for mathematical convenience, since it enables SGD to be analyzed as a stochastic differential equation (SDE) driven by a Brow nian motion. We argue that the Gaussianity assumption might fail to hold in deep learning settings and hence render the Brownian motion-based analyses inappropriate. Inspired by non-Gaussian natural phenomena, we consider the GN in a more general context and invoke the generalized CLT (GCLT), which suggests that the GN converges to a heavy-tailed α-stable random variable. Accordingly, we propose to analyze SGD as an SDE driven by a Lévy motion. S uch SDEs can incur ‘jumps’, which force the SDE transition from narrow mini ma to wider minima, as proven by existing metastability theory. To validate the α-stable assumption, we conduct extensive experiments on common deep l earning architectures and show that in all settings, the GN is highly non-G aussian and admits heavy-tails. We further investigate the tail behavior in varying network architectures and sizes, loss functions, and datasets. Our results open up a different perspective and shed more light on the belief that SGD prefers wide minima.
DTSTAMP:20240328T154414Z DTSTART;TZID=America/New_York:20191106T110000 SEQUENCE:0 TRANSP:OPAQUE END:VEVENT END:VCALENDAR