Google云計算技術MapReduce國外課件.ppt

上傳人:tia****nde 文檔編號:14142270 上傳時間:2020-07-06 格式:PPT 頁數(shù):48 大?。?.03MB
收藏 版權申訴 舉報 下載
Google云計算技術MapReduce國外課件.ppt_第1頁
第1頁 / 共48頁
Google云計算技術MapReduce國外課件.ppt_第2頁
第2頁 / 共48頁
Google云計算技術MapReduce國外課件.ppt_第3頁
第3頁 / 共48頁

下載文檔到電腦,查找使用更方便

9.9 積分

下載資源

還剩頁未讀,繼續(xù)閱讀

資源描述:

《Google云計算技術MapReduce國外課件.ppt》由會員分享,可在線閱讀,更多相關《Google云計算技術MapReduce國外課件.ppt(48頁珍藏版)》請在裝配圖網上搜索。

1、MapReduce: Simplified Data Processing on Large Clusters,Jeffrey Dean reduce(String output_key, Iterator intermediate_values): // output_key: a word // output_values: a list of counts int result = 0; for each v in intermediate_values: result += ParseInt(v); Emit(AsString(result));,More Examples,Dist

2、ributed grep: Map: (key, whole doc/a line) (the matched line, key) Reduce: identity function,More Examples,Count of URL Access Frequency: Map: logs of web page requests (URL, 1) Reduce: (URL, total count),More Examples,Reverse Web-Link Graph: Map: (source, target) (target, source) Reduce: (target, l

3、ist(source)) (target, list(source)),MapReduce: Execution overview,,Architecture,Master Data Structure Task state: idle, in-progress, completed Identity of worker machine: for in-progress tasks Location of intermediate file regions of map tasks. Receive from map tasks Push to reduce tasks.,Execution

4、overview,Split input files (1) Master and workers (2) Map task workers (3) Buffering of results (4) Copying and sorting (5) Reduce workers (6) Return to user code (7),MapReduce: Execution overview,MapReduce: Example,,MapReduce in Parallel: Example,,MapReduce: Runtime Environment,Fault Management,Fau

5、lt Tolerance in a word: redo Master pings workers, re-schedules failed tasks. Note: Completed map tasks are re-executed on failure because their output is stored on the local disk. Master failure: redo Semantics in the presence of failures: Deterministic map/reduce function: Produce the same output

6、as would have been produced by a non-faulting sequential execution of the entire program Rely on atomic commits of map and reduce task outputs to achieve this property.,MapReduce: Fault Tolerance,Handled via re-execution of tasks. Task completion committed through master What happens if Mapper fails

7、 ? Re-execute completed + in-progress map tasks What happens if Reducer fails ? Re-execute in progress reduce tasks What happens if Master fails ? Potential trouble !!,MapReduce: Refinements Locality Optimization,Leverage GFS to schedule a map task on a machine that contains a replica of the corresp

8、onding input data. Thousands of machines read input at local disk speed Without this, rack switches limit read rate,MapReduce: Refinements Redundant Execution,Slow workers are source of bottleneck, may delay completion time. Near end of phase, spawn backup tasks, one to finish first wins. Effectivel

9、y utilizes computing power, reducing job completion time by a factor.,MapReduce: Refinements Skipping Bad Records,Map/Reduce functions sometimes fail for particular inputs. Fixing the Bug might not be possible : Third Party Libraries. On Error Worker sends signal to Master If multiple error on same

10、record, skip record,MapReduce: Refinements Miscellaneous,Combiner Function at Mapper Sorting Guarantees within each reduce partition. Local execution for debugging/testing User-defined counters,MapReduce:,Walk through of One more Application,,MapReduce : PageRank,PageRank models the behavior of a “r

11、andom surfer”. C(t) is the out-degree of t, and (1-d) is a damping factor (random jump) The “random surfer” keeps clicking on successive links at random not taking content into consideration. Distributes its pages rank equally among all pages it links to. The dampening factor takes the surfer “gett

12、ing bored” and typing arbitrary URL.,Computing PageRank,PageRank : Key Insights,Effect at each iteration is local. i+1th iteration depends only on ith iteration At iteration i, PageRank for individual nodes can be computed independently,PageRank using MapReduce,Use Sparse matrix representation (M) M

13、ap each row of M to a list of PageRank “credit” to assign to out link neighbours. These prestige scores are reduced to a single PageRank value for a page by aggregating over them.,PageRank using MapReduce,Source of Image: Lin 2008,Phase 1: Process HTML,Map task takes (URL, page-content) pairs and ma

14、ps them to (URL, (PRinit, list-of-urls)) PRinit is the “seed” PageRank for URL list-of-urls contains all pages pointed to by URL Reduce task is just the identity function,Phase 2: PageRank Distribution,Reduce task gets (URL, url_list) and many (URL, val) values Sum vals and fix up with d to get new

15、PR Emit (URL, (new_rank, url_list)) Check for convergence using non parallel component,MapReduce: Some More Apps,Distributed Grep. Count of URL Access Frequency. Clustering (K-means) Graph Algorithms. Indexing Systems,MapReduce Programs In Google Source Tree,MapReduce Jobs run in Aug, 2004,MapReduce

16、: Extensions and similar apps,PIG (Yahoo) Hadoop (Apache) DryadLinq (Microsoft),Large Scale Systems Architecture using MapReduce,Take Home Messages,Although restrictive, provides good fit for many problems encountered in the practice of processing large data sets. Functional Programming Paradigm can be applied to large scale computation. Easy to use, hides messy details of parallelization, fault-tolerance, data distribution and load balancing from the programmers. And finally, if it works for Google, it should be handy !!,

展開閱讀全文
溫馨提示:
1: 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2: 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
3.本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
5. 裝配圖網僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

相關資源

更多
正為您匹配相似的精品文檔
關于我們 - 網站聲明 - 網站地圖 - 資源地圖 - 友情鏈接 - 網站客服 - 聯(lián)系我們

copyright@ 2023-2025  zhuangpeitu.com 裝配圖網版權所有   聯(lián)系電話:18123376007

備案號:ICP2024067431-1 川公網安備51140202000466號


本站為文檔C2C交易模式,即用戶上傳的文檔直接被用戶下載,本站只是中間服務平臺,本站所有文檔下載所得的收益歸上傳人(含作者)所有。裝配圖網僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對上載內容本身不做任何修改或編輯。若文檔所含內容侵犯了您的版權或隱私,請立即通知裝配圖網,我們立即給予刪除!