# On the Computational Complexity of MapReduce

@inproceedings{Fish2015OnTC, title={On the Computational Complexity of MapReduce}, author={Benjamin Fish and Jeremy Kun and {\'A}d{\'a}m D{\'a}niel Lelkes and L. Reyzin and Gy{\"o}rgy Tur{\'a}n}, booktitle={DISC}, year={2015} }

In this paper we study the MapReduce Class MRC defined by Karloffi?źeti?źal., which is a formal complexity-theoretic model of MapReduce. We show that constant-round MRC computations can decide regular languages and simulate sublogarithmic space-bounded Turing machines. In addition, we prove hierarchy theorems for MRC under certain complexity-theoretic assumptions. These theorems show that sufficiently increasing the number of rounds or the amount of time per processor strictly increases the… Expand

#### Topics from this paper

#### 20 Citations

Algorithms and Complexity Results for Learning and Big Data

- Computer Science
- 2017

The complexity-theoretic properties of MapReduce, one of the most ubiquitous distributed computing frameworks for big data, are explored, new algorithms are given and computational hardness results for a model of clustering are given, and the issue of fairness in machine learning is addressed. Expand

Efficient Circuit Simulation in MapReduce

- Mathematics, Computer Science
- ISAAC
- 2019

This work closely relates the standard, uniform NC hierarchy modeling parallel computations to the deterministic MapReduce hierarchy DMRC by proving that NC^(i+1) is contained in DMRC^i for all natural i, including 0. Expand

Shuffles and Circuits (On Lower Bounds for Modern Parallel Computation)

- Computer Science
- J. ACM
- 2018

An abstract model of massively parallel computation, where essentially the only restrictions are that the “fan-in” of each machine is limited to s bits, where s is smaller than the input size n, and that computation proceeds in synchronized rounds. Expand

Shuffles and Circuits: (On Lower Bounds for Modern Parallel Computation)

- Computer Science
- SPAA
- 2016

An abstract model of massively parallel computation, where essentially the only restrictions are that the "fan-in" of each machine is limited to s bits, and that computation proceeds in synchronized rounds, which is proved to be the best one could hope for. Expand

A Conditional Lower Bound on Graph Connectivity in MapReduce

- Mathematics, Computer Science
- ArXiv
- 2019

Some of the first lower bounds for a natural graph problem, albeit for a restricted class of algorithms, are introduced for the basic question of determining if a graph is connected or not. Expand

On the Hardness of Massively Parallel Computation

- Computer Science, Mathematics
- SPAA
- 2020

The existence of hard functions that are essentially not parallelizable in the MPC model is shown by adapting the so-called compression argument from the data structure lower bounds and cryptography literature to the context of massively parallel computation. Expand

DFA Minimization Algorithms in Map-Reduce

- Mathematics
- 2016

Map-Reduce has been a highly popular parallel-distributed programming model. In this thesis, we study the problem of minimizing Deterministic Finite State Automata (DFA). We focus our attention on… Expand

A scalable and composable map-reduce system

- Computer Science
- 2016 IEEE International Conference on Big Data (Big Data)
- 2016

This paper presents a novel map-reduce runtime system that is designed for scalability and for composition with other parallel software, built over the Cilk programming language and outperforms Phoenix++, by 1.5x–4x for 5 out of 7 map- reduce benchmarks on 48 threads. Expand

Security and privacy aspects in MapReduce on clouds: A survey

- Computer Science
- Comput. Sci. Rev.
- 2016

This paper investigates and discusses security and privacy challenges and requirements, considering a variety of adversarial capabilities, and characteristics in the scope of MapReduce, and provides a review of existingSecurity and privacy protocols for MapReduced and discusses their overhead issues. Expand

Massively Parallel Algorithms and Hardness for Single-Linkage Clustering Under $\ell_p$-Distances

- Mathematics, Computer Science
- ICML
- 2018

Efficiency of implementation of the algorithms in Apache Spark is demonstrated and constant-factor inapproximability results for $o(\log n)$-round algorithms under standard MPC hardness assumptions are shown. Expand

#### References

SHOWING 1-10 OF 32 REFERENCES

A model of computation for MapReduce

- Computer Science
- SODA '10
- 2010

A simulation lemma is proved showing that a large class of PRAM algorithms can be efficiently simulated via MapReduce, and it is demonstrated how algorithms can take advantage of this fact to compute an MST of a dense graph in only two rounds. Expand

Sorting, Searching, and Simulation in the MapReduce Framework

- Computer Science
- ISAAC
- 2011

This work designs optimal simulations of the the well-established PRAM and BSP models in MapReduce, immediately resulting in optimal solutions to the problems of computing fixed-dimensional linear programming and 2-D and 3-D convex hulls. Expand

Upper and Lower Bounds on the Cost of a Map-Reduce Computation

- Computer Science
- Proc. VLDB Endow.
- 2013

A model of problems that can be solved in a single round of map-reduce computation is introduced and enables a generic recipe for discovering lower bounds on communication cost as a function of the maximum number of inputs that could be assigned to one reducer. Expand

Time-Space Tradeoffs for Satisfiability

- Computer Science, Mathematics
- J. Comput. Syst. Sci.
- 2000

It is shown that SAT cannot be solved in n1+o(1) time and n1?? space for any general random-access nondeterministic Turing machines, and lower bounds for log-space uniform NC1 circuits and branching programs are given. Expand

BSP vs MapReduce

- Computer Science
- ICCS
- 2012

This work links MapReduce to the BSP model of computation, underlining the relevance of BSP to modern parallel algorithm design and deﬁning a subclass of B SP algorithms that can be efficiently implemented in Map Reduce. Expand

MapReduce: Simplified Data Processing on Large Clusters

- Computer Science
- OSDI
- 2004

This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable. Expand

On distributing symmetric streaming computations

- Computer Science
- SODA '08
- 2008

A simple algorithmic model for massive, unordered, distributed (mud) computation, as implemented by Google's MapReduce and Apache's Hadoop, and it is shown that in principle, mud algorithms are equivalent in power to symmetric streaming algorithms. Expand

Which problems have strongly exponential complexity?

- Computer Science, Mathematics
- Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280)
- 1998

For several NP-complete problems, there have been a progression of better but still exponential algorithms. In this paper we address the relative likelihood of sub-exponential algorithms for these… Expand

Turing Machines with Sublogarithmic Space

- Mathematics, Computer Science
- Lecture Notes in Computer Science
- 1994

Lower bounds for accepting non-regular languages and space constructible functions are lowered and deterministic versus nondeterministic Turing machines are considered. Expand

Time-Space Tradeoffs for Counting NP Solutions Modulo Integers

- Computer Science, Mathematics
- Twenty-Second Annual IEEE Conference on Computational Complexity (CCC'07)
- 2007

It is proved that for all primes p except for possibly one of them, and for all c < 2 cos(π/7) ≈ 1.801, there is a d > 0 such that MODp-Sat is not solvable in nc time and nd space by general algorithms, which is the first time-space tradeoffs for counting the number of solutions to an NP problem modulo small integers. Expand