MapReduce is a programming model and software framework first developed by Google to facilitate and simplify the processing of vast amounts of data in parallel on large clusters of commodity hardware. Hadoop is an Apache Software Foundation project that importantly provides two things: a distributed filesystem called HDFS (Hadoop Distributed File System) and a framework and API for building and running MapReduce jobs. In this lecture, I will talk about the MapReduce in Hadoop, and then I will present the MapReduce workflow that I developed via our big data workflow system called DATAVIEW (www.dataview.org). DATAVIEW is a big data workflow system for managing data analysis processes. A big data workflow is the computerized modeling and automation of a process consisting of a set of computational tasks and their data interdependencies to process and analyze data of ever increasing in scale, complexity, and rate of acquisition.
Dr. James Abello
Faculty Candidate Talk