Information about Mapreduce & Hadoop.

 What is MapReduce?

1)for writing distributed applications devised at Google for efficient processing of large amounts of data (multi-terabyte data-sets), on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner.

2)MapReduce is a parallel programming model

3)The MapReduce program runs on Hadoop which is an Apache open-source framework.

Data and desired process is divided in multiple tasks.

4)Applying the desired code on local machine on divided data is called

Map.

5)To produce the desired output, all these individual outputs have to be

merged or reduced to a single output. This reduction of multiple outputs to

a single one is also a process which is done by REDUCER. In Hadoop, as

many reducers are there, those many number of output files are

generated.



What is Hadoop....?

Hadoop is an open source software programming framework for storing

a large amount of data and performing the computation.

Its framework is based on Java programming with some native code

in C and shell scripts.

In short, Hadoop is used to develop applications that could perform

complete statistical analysis on huge amounts of data.


Components of Hadoop : 

1) HDFS: Hadoop Distributed File System. Google published its paper GFS

and on the basis of that HDFS was developed. It states that the files will be

broken into blocks and stored in nodes over the distributed architecture.

2) YARN: Yet another Resource Negotiator is used for job scheduling and

manage the cluster.

3) Map Reduce: This is a framework which helps Java programs to do the

parallel computation on data using key value pair. The output of Map task is

consumed by reduce task and then the out of reducer gives the desired result.

4) Hadoop Common: These Java libraries are used to start Hadoop and are

used by other Hadoop modules.





Post a Comment

Previous Post Next Post