30
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Characterizing Data Analysis Workloads in Data Centers

      Preprint
      , , , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          As the amount of data explodes rapidly, more and more corporations are using data centers to make effective decisions and gain a competitive edge. Data analysis applications play a significant role in data centers, and hence it has became increasingly important to understand their behaviors in order to further improve the performance of data center computer systems. In this paper, after investigating three most important application domains in terms of page views and daily visitors, we choose eleven representative data analysis workloads and characterize their micro-architectural characteristics by using hardware performance counters, in order to understand the impacts and implications of data analysis workloads on the systems equipped with modern superscalar out-of-order processors. Our study on the workloads reveals that data analysis applications share many inherent characteristics, which place them in a different class from desktop (SPEC CPU2006), HPC (HPCC), and service workloads, including traditional server workloads (SPECweb2005) and scale-out service workloads (four among six benchmarks in CloudSuite), and accordingly we give several recommendations for architecture and system optimizations. On the basis of our workload characterization work, we released a benchmark suite named DCBench for typical datacenter workloads, including data analysis and service workloads, with an open-source license on our project home page on http://prof.ict.ac.cn/DCBench. We hope that DCBench is helpful for performing architecture and small-to-medium scale system researches for datacenter computing.

          Related collections

          Most cited references4

          • Record: found
          • Abstract: not found
          • Conference Proceedings: not found

          Benchmarking cloud serving systems with YCSB

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines

              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              The HiBench benchmark suite: Characterization of the MapReduce-based data analysis

                Bookmark

                Author and article information

                Journal
                30 July 2013
                Article
                1307.8013
                b240377e-fccb-4f83-9bfd-4b365d4f9dae

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                11 pages, 12 figures, IISWC2013
                cs.PF

                Comments

                Comment on this article