15
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Tera-SLASH: A Distributed Energy-Efficient MPI based LSH System for Tera-Scale Similarity Search

      Preprint
      ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We present Tera-SLASH, an MPI (Message Passing Interface) based distributed system for approximate similarity search over tera-scale datasets. SLASH provides a multi-node implementation of the popular LSH (locality sensitive hashing) algorithm, which is generally implemented on a single machine. We offer a novel design and sketching solution to reduce the inter-machine communication overheads exponentially. In a direct comparison on comparable hardware, SLASH is more than 10000x faster than the popular LSH package in PySpark. PySpark is a widely-adopted distributed implementation of the LSH algorithm for large datasets and is deployed in commercial platforms. In the end, we show how our system scale to Tera-scale Criteo dataset with more than 4 billion samples. SLASH can index this 2.3 terabyte data over 20 nodes (on a shared cluster at Rice) in under an hour, with a query time in a fraction of milliseconds. To the best of our knowledge, there is no open-source system that can index and perform a similarity search on Criteo with a commodity cluster.

          Related collections

          Author and article information

          Journal
          05 August 2020
          Article
          2008.03260
          36b52a16-0da3-4c55-ad7c-044c452a1009

          http://arxiv.org/licenses/nonexclusive-distrib/1.0/

          History
          Custom metadata
          cs.DB cs.DC cs.IR cs.LG

          Databases,Information & Library science,Artificial intelligence,Networking & Internet architecture

          Comments

          Comment on this article