Focus On Oracle

Installing, Backup & Recovery, Performance Tuning,
Troubleshooting, Upgrading, Patching

Oracle Engineered System


当前位置: 首页 » 技术文章 » Big Data

How to setup Hadoop Cluster(3.0.0-alpha1)

过去十年,Apache Hadoop从无到有,到如今她已支撑起若干全球最大的生产集群。目前的版本是Hadoop 3.0,是基于JDK1.8开发的(Hadoop 2.0是基于JDK 1.7开发的,JDK 1.7在2015年4月停止了更新),她引入了一些重要的功能和优化,包括HDFS可擦除编码、支持多个Namenode、MR Native Task优化、YARN基于cgroup的内存和磁盘IO隔离、YARN container resizing,CLASSPATH的隔离等。下图为Hadoop家族的组件,核心是HDFS和Mapreduce,其他组件围绕着核心在扩展和整合。


Apache Hadoop

是Apache开源组织的一个分布式计算开源框架,提供了一个分布式文件系统子项目(HDFS)和支持MapReduce分布式计算的软件架构。

Apache Hive

是基于Hadoop的一个数据仓库工具,可以将结构化的数据文件映射为一张数据库表,通过类SQL语句快速实现简单的MapReduce统计,不必开发专门的MapReduce应用,十分适合数据仓库的统计分析。

Apache Pig

是一个基于Hadoop的大规模数据分析工具,它提供的SQL-LIKE语言叫Pig Latin,该语言的编译器会把类SQL的数据分析请求转换为一系列经过优化处理的MapReduce运算。

Apache HBase

是一个高可靠性、高性能、面向列、可伸缩的分布式存储系统,利用HBase技术可在廉价PC Server上搭建起大规模结构化存储集群。

Apache Sqoop

是一个用来将Hadoop和关系型数据库中的数据相互转移的工具,可以将一个关系型数据库(MySQL ,Oracle ,Postgres等)中的数据导进到Hadoop的HDFS中,也可以将HDFS的数据导进到关系型数据库中。

Apache Zookeeper

是一个为分布式应用所设计的分布的、开源的协调服务,它主要是用来解决分布式应用中经常遇到的一些数据管理问题,简化分布式应用协调及其管理的难度,提供高性能的分布式服务

Apache Mahout

是基于Hadoop的机器学习和数据挖掘的一个分布式框架。Mahout用MapReduce实现了部分数据挖掘算法,解决了并行挖掘的问题。

Apache Cassandra

是一套开源分布式NoSQL数据库系统。它最初由Facebook开发,用于储存简单格式数据,集Google BigTable的数据模型与Amazon Dynamo的完全分布式的架构于一身

Apache Avro

是一个数据序列化系统,设计用于支持数据密集型,大批量数据交换的应用。Avro是新的数据序列化格式与传输工具,将逐步取代Hadoop原有的IPC机制

Apache Ambari

是一种基于Web的工具,支持Hadoop集群的供应、管理和监控。

Apache Chukwa

是一个开源的用于监控大型分布式系统的数据收集系统,它可以将各种各样类型的数据收集成适合 Hadoop 处理的文件保存在 HDFS 中供 Hadoop 进行各种 MapReduce 操作。

Apache Hama

是一个基于HDFS的BSP(Bulk Synchronous Parallel)并行计算框架, Hama可用于包括图、矩阵和网络算法在内的大规模、大数据计算。

Apache Flume

是一个分布的、可靠的、高可用的海量日志聚合的系统,可用于日志数据收集,日志数据处理,日志数据传输。

Apache Giraph

是一个可伸缩的分布式迭代图处理系统, 基于Hadoop平台,灵感来自 BSP (bulk synchronous parallel) 和 Google 的 Pregel。

Apache Oozie

是一个工作流引擎服务器, 用于管理和协调运行在Hadoop平台上(HDFS、Pig和MapReduce)的任务。

Apache Crunch

是基于Google的FlumeJava库编写的Java库,用于创建MapReduce程序。与Hive,Pig类似,Crunch提供了用于实现如连接数据、执行聚合和排序记录等常见任务的模式库

Apache Whirr

是一套运行于云服务的类库(包括Hadoop),可提供高度的互补性。Whirr学支持Amazon EC2和Rackspace的服务。

Apache Bigtop

是一个对Hadoop及其周边生态进行打包,分发和测试的工具。

Apache HCatalog

是基于Hadoop的数据表和存储管理,实现中央的元数据和模式管理,跨越Hadoop和RDBMS,利用Pig和Hive提供关系视图。

Cloudera Hue

是一个基于WEB的监控和管理系统,实现对HDFS,MapReduce/YARN, HBase, Hive, Pig的web化操作和管理。


本文构建的Hadoop集群中包括3个节点:1个Master,2个Salve,节点之间局域网连接,可以相互ping通。置了2块网卡,一块为NAT(连接互联网使用),一块为Host Only(Hadoop使用)。
10.0.10.80 hdpm.ohsdba.cn hdpm
10.0.10.81 hdps1.ohsdba.cn hdps1
10.0.10.82 hdps2.ohsdba.cn hdps2
Hadoop 3.0需要配置的文件有core-site.xml、hdfs-site.xml、yarn-site.xml、mapred-site.xml、hadoop-env.sh、workers

配置hosts文件(每个节点都执行)
[root@hdpm ~]$ cat /etc/hosts
127.0.0.1    localhost4.localdomain4 localhost
::1          localhost.localdomain   localhost
10.0.10.80 hdpm.ohsdba.cn hdpm
10.0.10.81 hdps1.ohsdba.cn hdps1
10.0.10.82 hdps2.ohsdba.cn hdps2
[root@hdpm ~]$

创建用户和组(每个节点都执行)
[root@hdpm ~]# groupadd hadoop
[root@hdpm ~]# useradd -g hadoop hdp
[root@hdpm ~]# passwd hdp
Changing password for user hdp.
New password:
BAD PASSWORD: it is based on a dictionary word
BAD PASSWORD: is too simple
Retype new password:
passwd: all authentication tokens updated successfully.
[root@hdpm ~]#

建立三个节点之间的信任
[hdp@hdpm ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hdp/.ssh/id_rsa):
Created directory '/home/hdp/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hdp/.ssh/id_rsa.
Your public key has been saved in /home/hdp/.ssh/id_rsa.pub.
The key fingerprint is:
63:9c:ba:e4:e8:9c:e8:2d:46:96:85:cd:df:d3:89:2b hdp@hdpm.ohsdba.cn
The key's randomart image is:
+--[ RSA 2048]----+
|                 |
|                 |
|   +             |
|  . +  . .       |
|   o . .So .     |
|  +   .o+.o      |
| o    o  o       |
|  o+ =E..        |
| ooo* o.         |
+-----------------+
[hdp@hdpm ~]$
[root@hdps1 ~]# su - hdp
[hdp@hdps1 ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hdp/.ssh/id_rsa):
Created directory '/home/hdp/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hdp/.ssh/id_rsa.
Your public key has been saved in /home/hdp/.ssh/id_rsa.pub.
The key fingerprint is:
bf:20:43:8a:54:ee:60:bc:e9:02:35:87:bf:0c:05:b9 hdp@hdps1.ohsdba.cn
The key's randomart image is:
+--[ RSA 2048]----+
|   .             |
|  o              |
|   +.            |
| .Eoo            |
| .==. . S        |
|.o.*.o   .       |
|. +oo.o . .      |
|..  o  o . .     |
| ..       .      |
+-----------------+
[hdp@hdps1 ~]$
[hdp@hdps1 ~]$
[root@hdps2 ~]# su - hdp
[hdp@hdps2 ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hdp/.ssh/id_rsa):
Created directory '/home/hdp/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hdp/.ssh/id_rsa.
Your public key has been saved in /home/hdp/.ssh/id_rsa.pub.
The key fingerprint is:
af:77:90:6a:5f:50:8f:8c:a4:e0:aa:a9:f4:32:31:59 hdp@hdps2.ohsdba.cn
The key's randomart image is:
+--[ RSA 2048]----+
|                 |
|                 |
|      .   . .    |
|   E . . o + o   |
|  o   . S o.o .  |
| +   .   .o.     |
| .o .    ....    |
|.o.o    o....    |
|..=.   ..o..     |
+-----------------+
[hdp@hdps2 ~]$

[hdp@hdpm ~]$ cat .ssh/id_rsa.pub >>.ssh/authorized_keys
[hdp@hdpm ~]$ scp hdp@hdps1:~/.ssh/id_rsa.pub .ssh/id_rsa.pub.hdps1
The authenticity of host 'hdps1 (10.0.10.81)' can't be established.
RSA key fingerprint is 4f:68:99:eb:54:4b:61:fb:aa:f3:d9:fa:cd:09:f2:f4.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'hdps1,10.0.10.81' (RSA) to the list of known hosts.
hdp@hdps1's password:
id_rsa.pub                                    100%  401     0.4KB/s   00:00    
[hdp@hdpm ~]$ scp hdp@hdps2:~/.ssh/id_rsa.pub .ssh/id_rsa.pub.hdps2
The authenticity of host 'hdps2 (10.0.10.82)' can't be established.
RSA key fingerprint is 4f:68:99:eb:54:4b:61:fb:aa:f3:d9:fa:cd:09:f2:f4.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'hdps2,10.0.10.82' (RSA) to the list of known hosts.
hdp@hdps2's password:
id_rsa.pub                                    100%  401     0.4KB/s   00:00    
[hdp@hdpm ~]$
[hdp@hdpm .ssh]$ cat id_rsa.pub.hdps1 id_rsa.pub.hdps2 >>authorized_keys
[hdp@hdpm .ssh]$ scp authorized_keys hdps1:`pwd`
hdp@hdps1's password:
authorized_keys                               100% 1202     1.2KB/s   00:00    
[hdp@hdpm .ssh]$ scp authorized_keys hdps2:`pwd`
hdp@hdps2's password:
authorized_keys                               100% 1202     1.2KB/s   00:00    
[hdp@hdpm .ssh]$
[hdp@hdpm .ssh]$ chmod 600 authorized_keys
[hdp@hdpm .ssh]$ ssh hdps1.ohsdba.cn
[hdp@hdps1 ~]$ chmod 600 .ssh/authorized_keys
[hdp@hdps1 ~]$ exitConnection to hdps1 closed.
[hdp@hdpm .ssh]$ ssh hdps2.ohsdba.cn
[hdp@hdps2 ~]$ chmod 600 .ssh/authorized_keys
[hdp@hdps2 ~]$ 

移除java1.7和1.6如果已经安装(每个节点都执行)
[root@hdps2 ~]# rpm -qa|grep jdk
java-1.7.0-openjdk-1.7.0.99-2.6.5.1.0.1.el6.x86_64
java-1.6.0-openjdk-1.6.0.38-1.13.10.4.el6.x86_64
[root@hdps2 ~]# rpm -e java-1.7.0-openjdk
[root@hdps2 ~]# rpm -e java-1.6.0-openjdk
[root@hdps2 ~]#

[root@hdpm ~]# rpm -ivh /home/hdp/jdk-8u112-linux-x64.rpm
Preparing...                ########################################### [100%]
   1:jdk1.8.0_112           ########################################### [100%]
Unpacking JAR files...
        tools.jar...
        plugin.jar...
        javaws.jar...
        deploy.jar...
        rt.jar...
        jsse.jar...
        charsets.jar...
        localedata.jar...
[root@hdpm ~]#

建立目录(为安装Hadoop做准备)
[hdp@hdpm ~]$ mkdir -p /pohs/tmp /pohs/hdfs/data /pohs/hdfs/name
[hdp@hdpm ~]$
[hdp@hdps1 ~]$ mkdir -p /pohs/tmp /pohs/hdfs/data /pohs/hdfs/name
[hdp@hdps1 ~]$
[hdp@hdps2 ~]$ mkdir -p /pohs/tmp /pohs/hdfs/data /pohs/hdfs/name
[hdp@hdps2 ~]$ 

下载Hadoop安装文件(hdpm节点)
[root@hdpm ~]# wget http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.0.0-alpha1/hadoop-3.0.0-alpha1.tar.gz
--2016-10-25 18:09:29--  http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.0.0-alpha1/hadoop-3.0.0-alpha1.tar.gz
Resolving mirrors.tuna.tsinghua.edu.cn... 166.111.206.63, 2402:f000:1:4  16:166:111:206:63
Connecting to mirrors.tuna.tsinghua.edu.cn|166.111.206.63|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 331219821 (316M) [application/octet-stream]
Saving to: “hadoop-3.0.0-alpha1.tar.gz”

100%[=============================>] 331,219,821 2.45M/s   in 2m 4s
2016-10-25 18:11:33 (2.55 MB/s) - “hadoop-3.0.0-alpha1.tar.gz.1” saved   [331219821/331219821]
[root@hdpm ~]# ls -l hadoop-3.0.0-alpha1.tar.gz
-rw-r--r--. 1 root root 331219821 Sep  7 00:48 hadoop-3.0.0-alpha1.tar.gz
[root@hdpm ~]# su - hdp
[hdp@hdpm pohs]$ tar zxvf hadoop-3.0.0-alpha1.tar.gz
[hdp@hdpm pohs]$ mv hadoop-3.0.0-alpha1 hadoop3
[hdp@hdpm pohs]$ pwd
/pohs
[hdp@hdpm pohs]$
[hdp@hdpm pohs]$ ls -ltr
total 323480
drwxr-xr-x. 9 hdp hadoop      4096 Aug 30 15:18 hadoop3
drwx------. 2 hdp hadoop     16384 Oct 28 13:56 lost+found
-rwxr-xr-x. 1 hdp hadoop 331219821 Oct 29 12:47 hadoop-3.0.0-alpha1.tar.gz
[hdp@hdpm pohs]$ 

设置环境变量(全部节点)
修改.bash_profile,增加以下内容
export JAVA_HOME=/usr/java/jdk1.8.0_112
export HADOOP_HOME=/pohs/hadoop3
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export CLASSPATH=:$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export CLASSPATH=:$CLASSPATH:$HADOOP_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

修改配置文件(在Master上)
配置文件在$HADOOP_HOME/etc/hadoop/目录下,在主节点上修改完成后,可复制HADOOP_HOME到其他节点
hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.8.0_112
core-site.xml
<configuration>
<property>
    <name>fs.defaultFS</name>
    <value>hdfs://hdpm.ohsdba.cn:9000</value>
</property>
<property>
    <name>io.file.buffer.size</name>
    <value>131072</value>
</property>
<property>
    <name>hadoop.tmp.dir</name>
    <value>file:/pohs/tmp</value>
    <description>Abase for other temporary directories.</description>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
    <name>dfs.namenode.secondary.http-address</name>
    <value>hdpm.ohsdba.cn:9001</value>
</property>
<property>
    <name>dfs.blocksize</name>
    <value>268435456</value>
</property>
<property>
    <name>dfs.namenode.handler.count</name>
    <value>100</value>
</property>
<property>  
    <name>dfs.namenode.name.dir</name>
    <value>file:/pohs/hdfs/name</value>
</property>
<property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/pohs/hdfs/data</value>
</property>
<property>
    <name>dfs.replication</name>
    <value>3</value>
</property>
<property>
    <name>dfs.webhdfs.enabled</name>
    <value>true</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
</property>
<property>
    <name>mapreduce.admin.user.env</name>
    <value>HADOOP_MAPRED_HOME=$HADOOP_COMMON_HOME</value>
</property>
<property>
    <name>yarn.app.mapreduce.am.env</name>
    <value>HADOOP_MAPRED_HOME=$HADOOP_COMMON_HOME</value>
</property>
<property>
    <name>mapreduce.jobhistory.address</name>
    <value>hdpm.ohsdba.cn:10020</value>
</property>
<property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>hdpm.ohsdba.cn:19888</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
    <name>yarn.resourcemanager.address</name>
    <value>hdpm.ohsdba.cn:8032</value>
</property>
<property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>hdpm.ohsdba.cn:8030</value>
</property>
<property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>hdpm.ohsdba.cn:8031</value>
</property>
<property>
    <name>yarn.resourcemanager.admin.address</name>
    <value>hdpm.ohsdba.cn:8033</value>
</property>
<property>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>hdpm.ohsdba.cn:8088</value>
</property>
</configuration>
workers
hdps1.ohsdba.cn
hdps2.ohsdba.cn

注意:workers是3.0中文件的名字,在2.x.x中文件名为slaves

复制hadoop文件到其他节点

[hdp@hdpm ~]$ cd /pohs
[hdp@hdpm ~]$ scp -rp hadoop3 hdps1:/pohs
[hdp@hdpm ~]$ scp -rp hadoop3 hdps2:/pohs

格式化namenode(主节点上执行,启动之前必须先完成此步骤)
[hdp@hdpm ~]$ hdfs namenode -format
WARNING: /pohs/hadoop3/logs does not exist. Creating.
2016-11-02 11:56:34,520 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   user = hdp
STARTUP_MSG:   host = hdpm.ohsdba.cn/10.0.10.80
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 3.0.0-alpha1
STARTUP_MSG:   classpath =
STARTUP_MSG:   build = https://git-wip-us.apache.org/repos/asf/hadoop.git -r a990d2ebcd6de5d7dc2d3684930759b0f0ea4dc3; compiled by 'andrew' on 2016-08-30T07:02Z
STARTUP_MSG:   java = 1.8.0_112
************************************************************/
2016-11-02 11:56:34,579 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
2016-11-02 11:56:34,596 INFO namenode.NameNode: createNameNode [-format]
2016-11-02 11:56:35,730 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Formatting using clusterid: CID-f76c6f99-04f2-4b4d-bffc-e6e0006b36b6
2016-11-02 11:56:36,531 INFO namenode.FSEditLog: Edit logging is async:false
2016-11-02 11:56:36,538 INFO namenode.FSNamesystem: KeyProvider: null
2016-11-02 11:56:36,538 INFO namenode.FSNamesystem: fsLock is fair:true
2016-11-02 11:56:36,599 INFO namenode.FSNamesystem: fsOwner             = hdp (auth:SIMPLE)
2016-11-02 11:56:36,605 INFO namenode.FSNamesystem: supergroup          = supergroup
2016-11-02 11:56:36,605 INFO namenode.FSNamesystem: isPermissionEnabled = true
2016-11-02 11:56:36,606 INFO namenode.FSNamesystem: HA Enabled: false
2016-11-02 11:56:36,824 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
2016-11-02 11:56:36,825 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
2016-11-02 11:56:36,834 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
2016-11-02 11:56:36,852 INFO blockmanagement.BlockManager: The block deletion will start around 2016 Nov 02 11:56:36
2016-11-02 11:56:36,854 INFO util.GSet: Computing capacity for map BlocksMap
2016-11-02 11:56:36,855 INFO util.GSet: VM type       = 64-bit
2016-11-02 11:56:36,861 INFO util.GSet: 2.0% max memory 421.5 MB = 8.4 MB
2016-11-02 11:56:36,861 INFO util.GSet: capacity      = 2^20 = 1048576 entries
2016-11-02 11:56:36,963 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
2016-11-02 11:56:36,968 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
2016-11-02 11:56:36,968 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.min.datanodes = 0
2016-11-02 11:56:36,968 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.extension = 30000
2016-11-02 11:56:36,968 INFO blockmanagement.BlockManager: defaultReplication         = 3
2016-11-02 11:56:36,969 INFO blockmanagement.BlockManager: maxReplication             = 512
2016-11-02 11:56:36,969 INFO blockmanagement.BlockManager: minReplication             = 1
2016-11-02 11:56:36,969 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
2016-11-02 11:56:36,970 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
2016-11-02 11:56:36,970 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
2016-11-02 11:56:36,970 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
2016-11-02 11:56:37,676 INFO util.GSet: Computing capacity for map INodeMap
2016-11-02 11:56:37,677 INFO util.GSet: VM type       = 64-bit
2016-11-02 11:56:37,677 INFO util.GSet: 1.0% max memory 421.5 MB = 4.2 MB
2016-11-02 11:56:37,677 INFO util.GSet: capacity      = 2^19 = 524288 entries
2016-11-02 11:56:37,678 INFO namenode.FSDirectory: ACLs enabled? false
2016-11-02 11:56:37,678 INFO namenode.FSDirectory: XAttrs enabled? true
2016-11-02 11:56:37,680 INFO namenode.NameNode: Caching file names occuring more than 10 times
2016-11-02 11:56:37,696 INFO util.GSet: Computing capacity for map cachedBlocks
2016-11-02 11:56:37,697 INFO util.GSet: VM type       = 64-bit
2016-11-02 11:56:37,697 INFO util.GSet: 0.25% max memory 421.5 MB = 1.1 MB
2016-11-02 11:56:37,697 INFO util.GSet: capacity      = 2^17 = 131072 entries
2016-11-02 11:56:37,709 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
2016-11-02 11:56:37,710 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
2016-11-02 11:56:37,710 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
2016-11-02 11:56:37,714 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
2016-11-02 11:56:37,714 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
2016-11-02 11:56:37,718 INFO util.GSet: Computing capacity for map NameNodeRetryCache
2016-11-02 11:56:37,718 INFO util.GSet: VM type       = 64-bit
2016-11-02 11:56:37,719 INFO util.GSet: 0.029999999329447746% max memory 421.5 MB = 129.5 KB
2016-11-02 11:56:37,719 INFO util.GSet: capacity      = 2^14 = 16384 entries
2016-11-02 11:56:37,831 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1576596739-10.0.10.80-1478058997799
2016-11-02 11:56:37,896 INFO common.Storage: Storage directory /pohs/hdfs/name has been successfully formatted.
2016-11-02 11:56:37,977 INFO namenode.FSImageFormatProtobuf: Saving image file /pohs/hdfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
2016-11-02 11:56:38,155 INFO namenode.FSImageFormatProtobuf: Image file /pohs/hdfs/name/current/fsimage.ckpt_0000000000000000000 of size 331 bytes saved in 0 seconds.
2016-11-02 11:56:38,223 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2016-11-02 11:56:38,232 INFO util.ExitUtil: Exiting with status 0
2016-11-02 11:56:38,244 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hdpm.ohsdba.cn/10.0.10.80
************************************************************/

启动Hadoop(主节点执行即可,之前配置了ssh信任)
[hdp@hdpm name]$ start-dfs.sh
Starting namenodes on [hdpm.ohsdba.cn]
hdpm.ohsdba.cn: Warning: Permanently added 'hdpm.ohsdba.cn,10.0.10.80' (RSA) to the list of known hosts.
Starting datanodes
hdps2.ohsdba.cn: WARNING: /pohs/hadoop3/logs does not exist. Creating.
hdps1.ohsdba.cn: WARNING: /pohs/hadoop3/logs does not exist. Creating.
Starting secondary namenodes [hdpm.ohsdba.cn]
2016-11-02 11:59:46,439 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hdp@hdpm name]$ start-yarn.sh
Starting resourcemanager
Starting nodemanagers
[hdp@hdpm ~]$ mapred --daemon start historyserver
[hdp@hdpm ~]$
注意:start-all.sh在3.0中已被遗弃,可通过start-dfs.sh和start-yarn.sh启动。mr-jobhistory-daemon.sh start historyserver脚本被mapred --daemon start historyserver取代。
[hdp@hdpm name]$ start-all.sh
This script is deprecated. Use start-dfs.sh and start-yarn.sh instead.

查看进程
[hdp@hdpm name]$ jps
5104 Jps
4498 SecondaryNameNode
4341 NameNode
4826 ResourceManager
[hdp@hdpm name]$

[hdp@hdps1 ~]$ jps
15236 NodeManager
15942 Jps
15112 DataNode
[hdp@hdps1 ~]$

[hdp@hdps2 ~]$ jps
15393 Jps
14694 NodeManager
14570 DataNode
[hdp@hdps2 ~]$ 
[hdp@hdpm ~]$ hdfs dfsadmin -report
2016-11-02 15:29:04,259 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Configured Capacity: 96546439168 (89.92 GB)
Present Capacity: 88428929024 (82.36 GB)
DFS Remaining: 88428871680 (82.36 GB)
DFS Used: 57344 (56 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Pending deletion blocks: 0

-------------------------------------------------
Live datanodes (2):

Name: 10.0.10.81:9866 (hdps1.ohsdba.cn)
Hostname: hdps1.ohsdba.cn
Decommission Status : Normal
Configured Capacity: 48273219584 (44.96 GB)
DFS Used: 28672 (28 KB)
Non DFS Used: 4058755072 (3.78 GB)
DFS Remaining: 44214435840 (41.18 GB)
DFS Used%: 0.00%
DFS Remaining%: 91.59%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Nov 02 15:29:05 CST 2016

Name: 10.0.10.82:9866 (hdps2.ohsdba.cn)
Hostname: hdps2.ohsdba.cn
Decommission Status : Normal
Configured Capacity: 48273219584 (44.96 GB)
DFS Used: 28672 (28 KB)
Non DFS Used: 4058755072 (3.78 GB)
DFS Remaining: 44214435840 (41.18 GB)
DFS Used%: 0.00%
DFS Remaining%: 91.59%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Nov 02 15:29:04 CST 2016
[hdp@hdpm ~]$ 


通过web查看

Daemon Web Interface Notes
NameNode http://10.0.10.80:9870  Default HTTP port is 9870.
ResourceManager http://10.0.10.80:8088 Default HTTP port is 8088.
MapReduce JobHistory Server    
http://10.0.10.80:19888    Default HTTP port is 19888.

如果以上网页能正常打开,可以说明Hadoop集群安装成功。


关闭Hadoop

$HADOOP_HOME/sbin/stop-dfs.sh
$HADOOP_HOME/sbin/stop-yarn.sh
$HADOOP_HOME/bin/mapred --daemon stop historyserver


整合HBase,Hive等

这两个组件还是比较常用的。经测试,Hive2.1.0还不支持Hadoop 3.0.0,有兴趣的可以整合Hadoop 2.x版本。


Reference

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html

https://hadoopecosystemtable.github.io/

http://blog.fens.me/series-hadoop-family/



关键词:bigdata Hadoop 

相关文章

Oracle大数据之交互式快速参考
手把手教你快速体验Oracle大数据云
Oracle大数据云生态和技术
Hadoop Ecosystem
智能云分析解决方案之Oracle数据可视化桌面(DVD)
Oracle收购DataScience
Gartner再次将Oracle分析数据管理解决方案执行能力定位为最高
Oracle GoldenGate for BigData
Oracle大数据展现(Oracle BigData Discovery)
Oracle大数据空间和图表(Oracle BigData Spatial and Graph)
Oracle大数据SQL(Oracle BigData SQL)
Oracle大数据连接器(Oracle Big Data Connectors)
Top