0
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Distributed \(k\)-Clustering for Data with Heavy Noise

      Preprint
      ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          In this paper, we consider the \(k\)-center/median/means clustering with outliers problems (or the \((k, z)\)-center/median/means problems) in the distributed setting. Most previous distributed algorithms have their communication costs linearly depending on \(z\), the number of outliers. Recently Guha et al. overcame this dependence issue by considering bi-criteria approximation algorithms that output solutions with \(2z\) outliers. For the case where \(z\) is large, the extra \(z\) outliers discarded by the algorithms might be too large, considering that the data gathering process might be costly. In this paper, we improve the number of outliers to the best possible \((1+\epsilon)z\), while maintaining the \(O(1)\)-approximation ratio and independence of communication cost on \(z\). The problems we consider include the \((k, z)\)-center problem, and \((k, z)\)-median/means problems in Euclidean metrics. Implementation of the our algorithm for \((k, z)\)-center shows that it outperforms many previous algorithms, both in terms of the communication cost and quality of the output solution.

          Related collections

          Most cited references7

          • Record: found
          • Abstract: not found
          • Article: not found

          Clustering data streams: theory and practice

            Bookmark
            • Record: found
            • Abstract: not found
            • Conference Proceedings: not found

            k-means–: A unified approach to clustering and outlier detection

              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              Better Guarantees for k-Means and Euclidean k-Median by Primal-Dual Algorithms

                Bookmark

                Author and article information

                Journal
                17 October 2018
                Article
                1810.07852
                f4b8075d-bff8-4447-9b81-25624aecf595

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                accepted into NIPS'18
                cs.DC cs.DS cs.LG

                Data structures & Algorithms,Artificial intelligence,Networking & Internet architecture

                Comments

                Comment on this article