10
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      When does gradient descent with logistic loss find interpolating two-layer networks?

      Preprint
      , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We study the training of finite-width two-layer smoothed ReLU networks for binary classification using the logistic loss. We show that gradient descent drives the training loss to zero if the initial loss is small enough. When the data satisfies certain cluster and separation conditions and the network is wide enough, we show that one step of gradient descent reduces the loss sufficiently that the first result applies. In contrast, all past analyses of fixed-width networks that we know do not guarantee that the training loss goes to zero.

          Related collections

          Author and article information

          Journal
          04 December 2020
          Article
          2012.02409
          a417fb8f-2355-4b58-902b-bb365dfe0cd0

          http://arxiv.org/licenses/nonexclusive-distrib/1.0/

          History
          Custom metadata
          43 pages, 4 figures
          stat.ML cs.LG math.OC

          Numerical methods,Machine learning,Artificial intelligence
          Numerical methods, Machine learning, Artificial intelligence

          Comments

          Comment on this article