Light Seminar: Mapreduce and Mobile Algorithms.
Course number is 16:198:500:03 index number 46134

S. Muthukrishnan, x2379, Core 319.
Graham Cormode,

Meeting:  Thurs 4 -- 6 pm, Room 305 (Core B) for Mapreduce Algorithms. Thurs 6:30 -- 7:30 PM, Room 305 (Core B) for Mobile.

Schedule (Mapreduce Algorithms).

This course covers the fundamentals of the MapReduce framework and the Hadoop system for scaling huge computations to distributed clusters.  It surveys recent research papers on the topic to address problems on large data aggregation and analysis, such as for massive data logs, social network graphs, and distributed databases.

Jan 20
Intro to Mapreduce/Hadoop
Jan 27

Feb 3
Design mapreduce algorithms for examples, PRAM M+G
Feb 10
Finish mapreduce examples
Feb 17
Guest Lecture
Howard Karloff (AT&T Research)
Feb 24
Bill Katsak
Mar 3
Graph twiddling and pegasus
Matt Muscari
Mar 10
Vanchi Anathanarayanan
Mar 17
Spring Break
Attend this workshop 14--16
Mar 24
Set cover
Alex Nikolov
Mar 31
Counting triangles
Priya Govindan
Apr 7
Behavioral Targeting
Jinyun Yan
Apr 14
Sequence alignment
Daniela Vianna
Apr 21
Parallel GO
Sergiu Goschin
Apr 28

May 5

MapReduce Algorithms:

Introductory slides:

Talk videos:

Other tutorials:

UMD Class from Spring 2010:
Accompanying text:

Papers to read/discuss in class:


MapReduce: Simplified Data Processing on Large Clusters
Jeffrey Dean and Sanjay Ghemawat
OSDI 2004
Communications of the ACM, 2010

On the Complexity of Processing Massive, Unordered, Distributed Data.
J. Feldman, S. Muthukrishnan, A. Sidiropoulos, C. Stein and Z. Svitkina,
SODA 2008

A Model of Computation for MapReduce
H. Karloff, S. Suri, and S. Vassilvitskii
SODA 2010

Sorting, Searching, and Simulation in the MapReduce Framework
Michael T. Goodrich, Nodari Sitchinava, Qin Zhang
Under submission, 2011

Systems on top of MR:

Interpreting the Data: Parallel Analysis with Sawzall
Scientific Programming Journal 2005
Rob Pike, Sean Dorward, Robert Griesemer, Sean Quinlan

Pig latin: a not-so-foreign language for data processing
Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, Andrew Tomkins

Hive: a warehousing solution over a map-reduce framework
VLDB 2009
Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, Raghotham Murthy

Hive - A Petabyte Scale Data Warehouse Using Hadoop
ICDE 2010


Dryad: Distributed data-parallel programs from sequential building blocks.
Eurosys 2007
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly

MapReduce Online
Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elmeleegy and Russell Sears
NSDI 2010, SIGMOD 2010


Simulating Parallel Algorithms in the MapReduce Framework with Applications to Parallel Computational Geometry
Michael T. Goodrich
Massive 2010

A New Computation Model for Cluster Computing
Foto Afrati, Jeff Ullman

Max-cover in map-reduce
Flavio Chierichetti, Ravi Kumar, Andrew Tomkins
WWW 2010   

Graphs and Matrices

Graph Twiddling in a MapReduce World
Jonathan Cohen
Computing in Science and Engineering, vol. 11, no. 4, pp. 29-41, July/August, 2009.
Parallelizing Random Walk with Restart for large-scale query recommendation
Meng-Fen Chiang, Tsung-Wei Wang, Wen-Chih Peng
2010 Workshop on Massive Data Analytics on the Cloud

Distributed non-negative matrix factorization for dyadic data analysis on mapreduce

Chao Liu, Hung-chih Yang, Jinliang Fan, Li-Wei He, Yi-Min Wang WWW 2010

DOULION: Counting Triangles in Massive Graphs with a Coin
Charalampos E. Tsourakakis, U. Kang, Gary L. Miller, Christos Faloutsos
KDD 2009
Fast counting of triangles in real-world networks: proofs, algorithms and observations

PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations
U Kang, Charalampos E. Tsourakakis, Christos Faloutsos
IEEE International Conference on Data Mining (ICDM 2009)


Optimizing Joins in a Map-Reduce Environment
Foto Afrati, Jeff Ullman

Efficient parallel set-similarity joins using MapReduce
Rares Vernica, Michael J. Carey, Chen Li


Scaling Up Classifiers to Cloud Computers
Christopher Moretti, Karsten Steinhaeuser, Douglas Thain, Nitesh V. Chawla

Large-Scale Behavioral Targeting
KDD 09
Ye Chen, Dmitriy Pavlov, John Canny

MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees
BMC Bioinformatics 2010
Suzanne J Matthews email and Tiffani L Williams email

A novel approach to multiple sequence alignment using hadoop data grids
2010 Workshop on Massive Data Analytics on the Cloud
G. Sudha Sadasivam, G. Baktavatchalam

Experiences on Processing Spatial Data with MapReduce
Ariel Cary, Zhengguo Sun, Vagelis Hristidis, Naphtali Rishe
21st International Conference on Scientific and Statistical Database Management

Web-Scale Distributional Similarity and Entity Set Expansion
Patrick Pantel, Eric Crestan, Arkady Borkovsky, Ana-Maria Popescu, Vishnu Vyas
2009 Conference on Empirical Methods in Natural Language Processing

Other lists of papers:


Schedule (Mobile Apps)

Jan 20: First meeting and planning projects.