600字范文 > 大数据平台分布式搭建-Hadoop集群配置

大数据平台分布式搭建-Hadoop集群配置

时间：2024-06-09 22:11:51

Section 1: 文件清单

hadoop-2.8.4.tar.gzjdk-8u181-linux-x64.tarxshell 7家庭版xftp 7家庭版

Section 2: 下载链接

[JDK 下载]：/technetwork/java/javase/downloads/java-archive-javase8u211-later-5573849.html

[Hadoop下载]：/dist/hadoop/core/

[Xshell/Xftp 下载]：/zh/free-for-home-school/

备注：

Section 3: 安装部署

- Part I - JDK安装

Step 1: 解压Java至/opt/cluster，其中cluster为专用集群配置而单独创建的文件夹

[root@localhost opt]# mkdir cluster[root@localhost ~]# tar -zxvf jdk-8u181-linux-x64.tar.gz -C /opt/cluster/

解压后，应在/opt/cluster/下应存在jdk文件夹，具体如下：

[root@localhost cluster]# lltotal 4drwxr-xr-x. 7 10 143 4096 Jul 7 jdk1.8.0_181

Step 2: 配置JAVA_HOME环境变量

[root@localhost ~]# cd /etc[root@localhost etc]# vi profile

进入后，在该文件末尾添加JAVA_HOME环境变量：

export JAVA_HOME=/opt/cluster/jdk1.8.0_181export PATH=$PATH:$JAVA_HOME/bin

退出后，执行以下代码进行生效。

[root@localhost etc]# source profile

测试是否Java配置成功，出现版本号就基本意味着配置成功。

[root@localhost etc]# java -versionjava version "1.8.0_181"Java(TM) SE Runtime Environment (build 1.8.0_181-b13)Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)

- Part II - Hadoop集群

Step 1: 总览集群IP信息

Step 2: 配置主节点的Hadoop集群

Step 2.1: 配置主节点Step 2.1: 解压Hadoop压缩包至/opt/cluster/下

[root@localhost ~]# tar -zxvf hadoop-2.8.4.tar.gz -C /opt/cluster/

Step 2.2: 配置core-site.xml（位于/opt/cluster/hadoop-2.8.4/etc/hadoop）

进入该文件后，添加如下内容：

<configuration><property><name>fs.defaultFS</name><value>hdfs://192.168.137.128:9000</value></property><property><name>hadoop.tmp.dir</name><value>/opt/cluster/hadoop-2.8.4/tmp</value></property></configuration>

注意之后，在hadoop安装目录下，即/opt/cluster/hadoop-2.8.4下创建tmp文件夹。

Step 2.3: 配置hadoop-env.sh（位于/opt/cluster/hadoop-2.8.4/etc/hadoop）

进入该文件后，修改JAVA_HOME为之前导入的Java安装所在根目录。

# The java implementation to use.export JAVA_HOME=/opt/cluster/jdk1.8.0_181

Step 2.4: 配置hdfs-site.xml（位于/opt/cluster/hadoop-2.8.4/etc/hadoop）

进入该文件后，新添如下内容:

<configuration>//original data<property><name>dfs.namenode.name.dir</name><value>/opt/cluster/hadoop-2.8.4/dfs/name</value></property>// hdfs data<property><name>dfs.datanode.data.dir</name><value>/opt/cluster/hadoop-2.8.4/dfs/data</value></property>// backup<property><name>dfs.replication</name><value>2</value></property>//secodary namenode<property><name>dfs.namenode.secondary.http-address</name><value>BlogSlave1:50090</value></property></configuration>

注意之后，在hadoop安装目录下，即/opt/cluster/hadoop-2.8.4下创建dfs文件夹，然后在该文件夹内部分别创建name和data文件夹，即

[root@localhost hadoop-2.8.4]# mkdir dfs[root@localhost hadoop-2.8.4]# cd dfs [root@localhost dfs]# mkdir name[root@localhost dfs]# mkdir data[root@localhost dfs]# lltotal 0drwxr-xr-x. 2 root root 6 Nov 11 15:08 datadrwxr-xr-x. 2 root root 6 Nov 11 15:08 name

Step 2.5：配置slaves文件（位于/opt/cluster/hadoop-2.8.4/etc/hadoop）

进入该文件夹后，新添如下内容：

BlogSlave1BlogSlave2

Step 2.6：配置yarn-site.xml文件（位于/opt/cluster/hadoop-2.8.4/etc/hadoop）

<configuration><property><name>yarn.resourcemanager.hostname</name><value>BlogMaster</value></property><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property></configuration>

Step 2.7：配置yarn-env.sh文件（位于/opt/cluster/hadoop-2.8.4/etc/hadoop）

# some Java parametersexport JAVA_HOME=/opt/cluster/jdk1.8.0_181

Step 2.8：配置mapred-site.sh文件（位于/opt/cluster/hadoop-2.8.4/etc/hadoop）

进入该目录后，并不存在mapred-site.sh，而是存在mapred-site.xml.template文件。对此，可采用拷贝mapred-site.xml.template后，重命名的方式创建一个mapred-site.xml文件。

<configuration><property><name>mapreduce.framework.name</name><value>yarn</value></property></configuration>

Step 2.9：配置yarn-site.sh文件（位于/opt/cluster/hadoop-2.8.4/etc/hadoop）

<configuration><property><name>yarn.resourcemanager.hostname</name><value>BlogMaster</value></property><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property><property><name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name><value>org.apache.hadoop.mapred.ShuffleHandler</value></property><property><name>yarn.resourcemanager.address</name><value>BlogMaster:8032</value></property><property><name>yarn.resourcemanager.scheduler.address</name><value>BlogMaster:8030</value></property><property><name>yarn.resourcemanager.resource-tracker.address</name><value>BlogMaster:8035</value></property><property><name>yarn.resourcemanager.admin.address</name><value>BlogMaster:8033</value></property><property><name>yarn.resourcemanager.webapp.address</name><value>BlogMaster:8088</value></property><property><name>yarn.nodemanager.pmem-check-enabled</name><value>false</value></property><property><name>yarn.nodemanager.vmem-check-enabled</name><value>false</value></property></configuration>

Step 2.10：配置环境变量，并添加HADOOP_HOME至path中

在JAVA_HOME的基础上，进一步配置HADOOP_HOME和YARN_HOME等环境变量。

export JAVA_HOME=/opt/cluster/jdk1.8.0_181export HADOOP_HOME=/opt/cluster/hadoop-2.8.4export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoopexport YARN_HOME=/opt/cluster/hadoop-2.8.4export YARN_CONF_DIR=$YARN_HOME/etc/hadoopexport PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

之后注意用source profile使命令生效。

Step 2.11：关闭防火墙，便于后面好配置

[root@localhost etc]# systemctl stop firewalld[root@localhost etc]# systemctl disable firewalldRemoved symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.Removed symlink /etc/systemd/system/basic.target.wants/firewalld.service.

Step 3: 采用VMware Workstation完整克隆主节点两份，作为Slave节点

克隆前，先关闭主节点，且注意是完整完整完整克隆。此处时间稍微有点长，可以听一首Zella Day的1965，enjoy…

Step 4: 设置Master和Slave从节点的IP

Step 4.1: 配置BlogMaster节点

首先，覆盖/etc下的hostname文件，修改为：

BlogMaster

其次，在/etc下的hosts文件，新添如下内容：

192.168.137.128 BlogMaster192.168.137.129 BlogSlave1192.168.137.130 BlogSlave2

最后，reboot该虚拟机，使得配置生效。

Step 4.2: 配置BlogSlave1节点的hostname（位于/etc文件夹下）

BlogSlave1

之后，reboot该虚拟机，使得配置生效。

Step 4.3: 配置BlogSlave2节点的hostname（位于/etc文件夹下）

BlogSlave2

之后，reboot该虚拟机，使得配置生效。

Step 4.4: 分发主节点的hosts至两台Slave节点对应目录设置Master和Slave的免秘钥登录*

在进行免密配置前，可用ip addr查看三台电脑的网址是不是符合预期。

注意注意注意，配置免秘钥前，先确保完全关闭各节点的防火墙，完整命令如下：

[root@BlogMaster ~]# systemctl stop firewalld[root@BlogMaster ~]# systemctl disable firewalld

[root@BlogSlave1 ~]# systemctl stop firewalld[root@BlogSlave1 ~]# systemctl disable firewalld

[root@BlogSlave2 ~]# systemctl stop firewalld[root@BlogSlave2 ~]# systemctl disable firewalld

之后，对各节点将分别执行如下四行命令。

[root@BlogMaster ~]# ssh-keygen[root@BlogMaster ~]# ssh-copy-id 192.168.137.128[root@BlogMaster ~]# ssh-copy-id 192.168.137.129[root@BlogMaster ~]# ssh-copy-id 192.168.137.130

Step 4.4.1: 三台全部执行如下命令，以分别产生公钥（注意此处一路回车直至出现秘钥图像）

[root@BlogMaster .ssh]# ssh-keygenGenerating public/private rsa key pair.Enter file in which to save the key (/root/.ssh/id_rsa): /root/.ssh/id_rsa already exists.Overwrite (y/n)? yEnter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_rsa.Your public key has been saved in /root/.ssh/id_rsa.pub.The key fingerprint is:aa:3f:dc:a1:8b:20:70:48:c0:ce:f4:43:32:93:0c:3c root@BlogMasterThe key's randomart image is:+--[ RSA 2048]----+|B . ||.E . ||+.B ||.+ o ||o . . S ||.... ||. . ..o . || . . o+ . || o.oo |+-----------------+

Step 4.4.2: 三台全部执行如下命令，以分别实现彼此秘钥的配对

更为完整的信息，如下

[root@BlogMaster .ssh]# ssh-copy-id 192.168.137.128The authenticity of host '192.168.137.128 (192.168.137.128)' can't be established.ECDSA key fingerprint is fa:31:6b:64:bf:13:26:27:44:47:40:a7:f0:4a:39:e9.Are you sure you want to continue connecting (yes/no)? yes/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keysroot@192.168.137.128's password: Number of key(s) added: 1Now try logging into the machine, with: "ssh '192.168.137.128'"and check to make sure that only the key(s) you wanted were added.[root@BlogMaster .ssh]# ssh-copy-id 192.168.137.129The authenticity of host '192.168.137.129 (192.168.137.129)' can't be established.ECDSA key fingerprint is fa:31:6b:64:bf:13:26:27:44:47:40:a7:f0:4a:39:e9.Are you sure you want to continue connecting (yes/no)? yes/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keysroot@192.168.137.129's password: Number of key(s) added: 1Now try logging into the machine, with: "ssh '192.168.137.129'"and check to make sure that only the key(s) you wanted were added.[root@BlogMaster .ssh]# ssh-copy-id 192.168.137.130The authenticity of host '192.168.137.130 (192.168.137.130)' can't be established.ECDSA key fingerprint is fa:31:6b:64:bf:13:26:27:44:47:40:a7:f0:4a:39:e9.Are you sure you want to continue connecting (yes/no)? yes/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keysroot@192.168.137.130's password: Number of key(s) added: 1Now try logging into the machine, with: "ssh '192.168.137.130'"and check to make sure that only the key(s) you wanted were added.

Step 4.5: 分发主节点的hosts至两台Slave节点对应目录

**总体：**执行如下命令，将修改后的IP-名称对应的hosts分发至集群另外两台Slave节点。

[root@BlogMaster etc]# scp -r hosts BlogSlave1:$PWD[root@BlogMaster etc]# scp -r hosts BlogSlave2:$PWD

具体地，

首先，对于BlogSlave1，则在BlogMaster节点执行如下命令：

[root@BlogMaster etc]# scp -r hosts BlogSlave1:$PWDThe authenticity of host 'blogslave1 (192.168.137.129)' can't be established.ECDSA key fingerprint is fa:31:6b:64:bf:13:26:27:44:47:40:a7:f0:4a:39:e9.Are you sure you want to continue connecting (yes/no)? yesWarning: Permanently added 'blogslave1,192.168.137.129' (ECDSA) to the list of known hosts.root@blogslave1's password: hosts100% 2400.2KB/s 00:00

其次，对于BlogSlave2，则在BlogMaster节点执行如下命令：

[root@BlogMaster etc]# scp -r hosts BlogSlave2:$PWDThe authenticity of host 'blogslave2 (192.168.137.130)' can't be established.ECDSA key fingerprint is fa:31:6b:64:bf:13:26:27:44:47:40:a7:f0:4a:39:e9.Are you sure you want to continue connecting (yes/no)? yesWarning: Permanently added 'blogslave2,192.168.137.130' (ECDSA) to the list of known hosts.root@blogslave2's password: hosts 100% 2400.2KB/s 00:00

Step 4.5: 在主节点BlogMaster执行格式化命令

[root@BlogMaster etc]# hadoop namenode -format

如果出现如下信息，则说明配置成功。

[root@BlogMaster etc]# hadoop namenode -formatDEPRECATED: Use of this script to execute hdfs command is deprecated.Instead use the hdfs command for it.19/11/11 16:27:31 INFO namenode.NameNode: STARTUP_MSG: /************************************************************STARTUP_MSG: Starting NameNodeSTARTUP_MSG: user = rootSTARTUP_MSG: host = BlogMaster/192.168.137.128STARTUP_MSG: args = [-format]STARTUP_MSG: version = 2.8.4**比较长，这里仅给出一部分**...

Part III- Hadoop集群启动和运行状态查看

Step 1: 启动Hadoop集群

[root@BlogMaster ~]# start-dfs.sh

执行后，结果：

Starting namenodes on [BlogMaster]The authenticity of host 'blogmaster (192.168.137.128)' can't be established.ECDSA key fingerprint is fa:31:6b:64:bf:13:26:27:44:47:40:a7:f0:4a:39:e9.Are you sure you want to continue connecting (yes/no)? yesBlogMaster: Warning: Permanently added 'blogmaster' (ECDSA) to the list of known hosts.BlogMaster: starting namenode, logging to /opt/cluster/hadoop-2.8.4/logs/hadoop-root-namenode-BlogMaster.outBlogSlave1: starting datanode, logging to /opt/cluster/hadoop-2.8.4/logs/hadoop-root-datanode-BlogSlave1.outBlogSlave2: starting datanode, logging to /opt/cluster/hadoop-2.8.4/logs/hadoop-root-datanode-BlogSlave2.outStarting secondary namenodes [BlogSlave1]BlogSlave1: starting secondarynamenode, logging to /opt/cluster/hadoop-2.8.4/logs/hadoop-root-secondarynamenode-BlogSlave1.out