Hadoop学习笔记(5)--MapReduce计算模型

文章目录

执行MapReduce任务的有两个角色：
Hadoop流
约束
MapReduce工作流
其它
PigPen

执行MapReduce任务的有两个角色：

一个是JobTracker，
另一个是TaskTracker。

每个MapReduce任务都被初始化为一个Job。每个Job又可以分为两个阶段：
Map阶段和Reduce阶段，即对应Map函数和Reduce函数

Map函数接收一个＜key, value＞形式的输入，然后产生同样为＜key, value＞形式的中间输出，Hadoop会负责将所有具有相同中间key值的value集合到一起传递给Reduce函数，
Reduce函数接收一个如＜key，（list of values）＞形式的输入，然后对这个value集合进行处理并输出结果，Reduce的输出也是＜key, value＞形式的。

Hadoop流

Hadoop流提供了一个API，允许用户使用任何脚本语言写Map函数或Reduce函数。

Map和Reduce都是处理Linux内的可执行文件，更重要的是，它们接受的都是标准输入（stdin），输出的都是标准输出（stdout）,这就是原理。

约束

MapReduce处理的数据集（或任务）必须具备这样的特点：
- 待处理的数据集可以分解成许多小的数据集。
- 每一个小数据集都可以完全并行地进行处理。

MapReduce工作流

Map和Reduce都继承自MapReduce自己定义好的Mapper和Reducer基类，MapReduce框架根据用户继承Mapper和Reducer后的衍生类和类中覆盖的核心函数来识别用户定义的Map处理阶段和Reduce处理阶段。

MapReduce基类中的其他函数:

setup函数
setup函数可以看做task上的一个全局处理，而不像在Map函数或Reduce函数中，处理只对当前输入分片中的正在处理数据产生作用。

/**
*Called once at the start of the task.
*/
protected void setup（Context context
）throws IOException, InterruptedException{
//NOTHING
}

cleanup函数
跟setup函数相似，不同之处在于cleanup函数是在task销毁之前执行的。

/**
*Called once at the end of the task.
*/
protected void cleanup（Context context
）throws IOException, InterruptedException{
//NOTHING
}

run函数
先执行一个task钟的setup函数，再处理每个key的内容，再处理cleanup内容。

/**
*Expert users can override this method for more complete control over the
*execution of the Mapper.
*@param context
*@throws IOException
*/
public void run（Context context）throws IOException, InterruptedException{
setup（context）；
while（context.nextKeyValue（））{
map（context.getCurrentKey（），context.getCurrentValue（），context）；
}
cleanup（context）；
}

其它

学习笔记初探的时候，学习过MapReduce的原型其实就是Lisp里面的Map/Reduce。

这里介绍一个Lisp中的MapReduce语言。

PigPen

号称,“如果你会 Clojure，你就已经会 PigPen 了”的语言。

;Map-Reduce
(require '[pigpen.core :as pig])

(defn word-count [lines]
  (->> lines
    (pig/mapcat #(-> % first
                   (clojure.string/lower-case)
                   (clojure.string/replace #"[^\w\s]" "")
                   (clojure.string/split #"\s+")))
    (pig/group-by identity)
    (pig/map (fn [[word occurrences]] [word (count occurrences)]))))

;Test
=> (def data (pig/return [["The fox jumped over the dog."]
                          ["The cow jumped over the moon."]]))

#'pigpen-demo/data

=> (pig/dump (word-count data))
[["moon" 1] ["jumped" 2] ["dog" 1] ["over" 2] ["cow" 1] ["fox" 1] ["the" 4]]

原文链接：Hadoop学习笔记(5)--MapReduce计算模型，转载请注明来源！

一	二	三	四	五	六	日
« 4月				6月 »
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

原 Hadoop学习笔记(5)--MapReduce计算模型

执行MapReduce任务的有两个角色：

Hadoop流

约束

MapReduce工作流

其它

PigPen

发表评论取消回复

执行MapReduce任务的有两个角色：

Hadoop流

约束

MapReduce工作流

其它

PigPen

发表评论 取消回复

发表评论取消回复