Hadoop history

*The genesis of Hadoop came from the Google File System paper[11] that was published in October 2003. This paper spawned another research paper from Google – MapReduce: Simplified Data Processing on Large Clusters.[12] Development started on the Apache Nutch
project, but was moved to the new Hadoop subproject in January 2006.[13] Doug Cutting, who was working at Yahoo! at the time,[14] named it after his son's toy elephant.[15] The initial code that was factored out of Nutch consisted of 5k lines of code for HDFS
and 6k lines of code for MapReduce.

The first committer added to the Hadoop project was Owen O’Malley in March 2006.[16] Hadoop 0.1.0 was released in April 2006[17] and continues to evolve by the many contributors[18] to the Apache Hadoop project.

Timeline[edit]
Year Month
Event Ref.
2003 October
Google File System paper released [19]
2004 December
MapReduce: Simplified Data Processing on Large Clusters
[20]
2006 January
Hadoop subproject created with mailing lists, jira, and wiki
[21]
2006 January
Hadoop is born from Nutch 197 [22]
2006 February
NDFS+ MapReduce moved out of Apache Nutch to create Hadoop
[23]
2006 February
Owen O'Malley's first patch goes into Hadoop
[24]
2006 February
Hadoop is named after Cutting's son's yellow plush toy
[25]
2006 April
Hadoop 0.1.0 released [26]
2006 April
Hadoop sorts 1.8 TB on 188 nodes in 47.9 hours
[23]
2006 May
Yahoo deploys 300 machine Hadoop cluster [23]
2006 October
Yahoo Hadoop cluster reaches 600 machines [23]
2007 April
Yahoo runs two clusters of 1,000 machines [23]
2007 June
Only three companies on "Powered by Hadoop Page"
[27]
2007 October
First release of Hadoop that includes HBase
[28]
2007 October
Yahoo Labs creates Pig, and donates it to the ASF
[29]
2008 January
YARN JIRA opened Yarn Jira (Mapreduce 279)
2008 January
20 companies on "Powered by Hadoop Page" [27]
2008 February
Yahoo moves its web index onto Hadoop [30]
2008 February
Yahoo! production search index generated by a 10,000-core Hadoop cluster
[23]
2008 March
First Hadoop Summit [31]
2008 April
Hadoop world record fastest system to sort a terabyte of data. Running on a 910-node cluster, Hadoop sorted one terabyte in 209 seconds
[23]
2008 May
Hadoop wins TeraByte Sort (World Record sortbenchmark.org)
[32]
2008 July
Hadoop wins Terabyte Sort Benchmark [33]
2008 October
Loading 10 TB/day in Yahoo clusters [23]
2008 October
Cloudera, Hadoop distributor is founded [34]
2008 November
Google MapReduce implementation sorted one terabyte in 68 seconds
[23]
2009 March
Yahoo runs 17 clusters with 24,000 machines
[23]
2009 April
Hadoop sorts a petabyte [35]
2009 May
Yahoo! used Hadoop to sort one terabyte in 62 seconds
[23]
2009 June
Second Hadoop Summit [36]
2009 July
Hadoop Core is renamed Hadoop Common [37]
2009 July
MapR, Hadoop distributor founded [38]
2009 July
HDFS now a separate subproject [37]
2009 July
MapReduce now a separate subproject [37]
2010 January
Kerberos support added to Hadoop [39]
2010 May
Apache HBase Graduates [40]
2010 June
Third Hadoop Summit [41]
2010 June
Yahoo 4,000 nodes/70 petabytes [42]
2010 June
Facebook 2,300 clusters/40 petabytes [42]
2010 September
Apache Hive Graduates [43]
2010 September
Apache Pig Graduates [44]
2011 January
Apache Zookeeper Graduates [45]
2011 January
Facebook, LinkedIn, eBay and IBM collectively contribute 200,000 lines of code
[46]
2011 March
Apache Hadoop takes top prize at Media Guardian Innovation Awards
[47]
2011 June
Rob Beardon and Eric Badleschieler spin out Hortonworks out of Yahoo.
[48]
2011 June
Yahoo has 42K Hadoop nodes and hundreds of petabytes of storage
[48]
2011 June
Third Annual Hadoop Summit (1,700 attendees)
[49]
2011 October
Debate over which company had contributed more to Hadoop.
[46]
2012 January
Hadoop community moves to separate from MapReduce and replace with YARN
[25]
2012 June
San Jose Hadoop Summit (2,100 attendees) [50]
2012 November
Apache Hadoop 1.0 Available [37]
2013 March
Hadoop Summit – Amsterdam (500 attendees) [51]
2013 March
YARN deployed in production at Yahoo [52]
2013 June
San Jose Hadoop Summit (2,700 attendees) [53]
2013 October
Apache Hadoop 2.2 Available [37]
2014 February
Apache Hadoop 2.3 Available [37]
2014 February
Apache Spark top Level Apache Project [54]
2014 April
Hadoop summit Amsterdam (750 attendees) [55]
2014 June
Apache Hadoop 2.4 Available [37]
2014 June
San Jose Hadoop Summit (3,200 attendees) [56]
2014 August
Apache Hadoop 2.5 Available [37]
2014 November
Apache Hadoop 2.6 Available [37]
2015 April
Hadoop Summit Europe [57]
2015 June
Apache Hadoop 2.7 Available [37]

时间: 2024-12-03 00:01:22

Hadoop history的相关文章

Hadoop - Azkaban 作业调度

1.概述 在调度 Hadoop 的相关作业时,有以下几种方式: 基于 Linux 系统级别的 Crontab. Java 应用级别的 Quartz. 第三方的调度系统. 自行开发 Hadoop 应用调度系统. 对于前两种,使用 Crontab 和 Quartz 是基本可以满足业务需求,但有其弊端.在 Job 数量庞大的情况下,Crontab 脚本的编写,变得异常复杂.其调度的过程也不能透明化,让管理变得困难.Quartz 虽然不用编写脚本,实现对应的调度 API 即可,然其调度过程不透明,不涵盖

手动安装Hadoop集群的过程

最近又安装 Hadoop 集群,由于一些原因,没有使用 Hadoop 管理工具或者自动化安装脚本来安装集群,而是手动一步步的来安装,本篇文章主要是记录我手动安装 Hadoop 集群的过程,给大家做个参考. 这里所说的手动安装,是指一步步的通过脚本来安装集群,并不是使用一键安装脚本或者一些管理界面来安装. 开始之前,还是说明一下环境: 操作系统:CentOs6.4 CDH版本:4.7.0 节点数:4个 在开始之前,你可以看看我以前写的一篇文章 使用yum安装CDH Hadoop集群,因为有些细节已

Hadoop基本操作命令大全

启动Hadoop start-all.sh 关闭HADOOP stop-all.sh 查看文件列表 查看hdfs中/user/admin/aaron目录下的文件. hadoop fs -ls /user/admin/aaron 列出hdfs中/user/admin/aaron目录下的所有文件(包括子目录下的文件). hadoop fs -lsr /user/admin/aaron 创建文件目录 hadoop fs -mkdir /user/admin/aaron/newDir 删除文件 删除hd

利用 Spring Boot 在 Docker 中运行 Hadoop

本文讲的是利用 Spring Boot 在 Docker 中运行 Hadoop,[编者的话]Spring Boot是由Pivotal团队提供的全新框架,其设计目的是用来简化新Spring应用的初始搭建以及开发过程.本文介绍了如何利用Spring Boot在Docker中运行Hadoop任务. 简介 越来越多的应用都开始使用Hadoop框架.而开发者在使用过程中也遇到一些挑战,比如使用诸如Docker之类的容器开发和部署相关的技术栈开发的应用.我们将会在下面的例子中介绍如何克服这些挑战. 由于 S

Hadoop命令手册

本文讲的是Hadoop命令手册,[IT168 资讯]所有的hadoop命令均由bin/hadoop脚本引发.不指定参数运行hadoop脚本会打印所有命令的描述. 用法:hadoop [--config confdir] [COMMAND] [GENERIC_OPTIONS] [COMMAND_OPTIONS] Hadoop有一个选项解析框架用于解析一般的选项和运行类. 命令选项 描述 --config confdir 覆盖缺省配置目录.缺省是${HADOOP_HOME}/conf. GENERI

Hadoop自动化安装shell脚本

之前写过一些如何安装Cloudera Hadoop的文章,安装hadoop过程中,最开始是手动安装apache版本的hadoop,其次是使用Intel的IDH管理界面安装IDH的hadoop,再然后分别手动和通过cloudera manager安装hadoop,也使用bigtop-util yum方式安装过apache的hadoop. 安装过程中参考了很多网上的文章,解压缩过cloudera的cloudera-manager-installer.bin,发现并修复了IDH shell脚本中关于p

独家 | 一文读懂Hadoop(四):YARN

随着全球经济的不断发展,大数据时代早已悄悄到来,而Hadoop又是大数据环境的基础,想入门大数据行业首先需要了解Hadoop的知识.2017年年初apache发行了Hadoop3.0,也意味着一直有一群人在对Hadoop不断的做优化,不仅如此,各个Hadoop的商业版本也有好多公司正在使用,这也印证了它的商业价值. 读者可以通过阅读"一文读懂Hadoop"系列文章,对Hadoop技术有个全面的了解,它涵盖了Hadoop官网的所有知识点,并且通俗易懂,英文不好的读者完全可以通过阅读此篇文

两种配置大数据环境的方法Ambari以及hadoop源代码安装的步骤

1.Ambari安装 Ambari & HDP(Hortonworks Data Platform) ***************************************************************************************************** Base: 0.操作系统原则与对应的HDP对应的版本.rhel6 or rhel7 1.操作系统原则完全安装(Desktop),所有的包都安装. 2.关闭防火墙,IPV6等服务(海涛Python

使用yum源安装CDH Hadoop集群

本文主要是记录使用yum安装CDH Hadoop集群的过程,包括HDFS.Yarn.Hive和HBase.本文使用CDH5.4版本进行安装,故下文中的过程都是针对CDH5.4版本的. 0. 环境说明 系统环境: 操作系统:CentOs 6.6 Hadoop版本:CDH5.4 JDK版本:1.7.0_71 运行用户:root 集群各节点角色规划为: 192.168.56.121 cdh1 NameNode.ResourceManager.HBase.Hive metastore.Impala Ca