Ubuntu下Docker搭建Hadoop完全分布式集群
1. 系统环境
Mac OS X 10.13
Parallels Desktop 虚拟机软件
Ubuntu 14.04 虚拟机
Ubuntu 14.04 与Mac OS X共享网络,文件夹。
2. 安装Docker
1 | uname -r |
在最后一步安装Docker指令遇到问题:
Some packages could not be installed. This may mean that you have requested an impossible situation or if you are using the unstable distribution that some required packages have not yet been created or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
docker-engine : Depends: init-system-helpers (>= 1.18~) but 1.14 is to be installed
Depends: lsb-base (>= 4.1+Debian11ubuntu7) but 4.1+Debian11ubuntu6 is to be installed
Depends: libdevmapper1.02.1 (>= 2:1.02.99) but 2:1.02.77-6ubuntu2 is to be installed
Recommends: aufs-tools but it is not going to be installed
Recommends: cgroupfs-mount but it is not installable or cgroup-lite but it is not going to be installed
Recommends: git
E: Unable to correct problems, you have held broken packages.意思是安装Docker对系统有些软件包的版本有要求,而现在系统已有的软件包版本过低不符合Docker安装的依赖关系。
解决方法:使用如下指令安装Dokcer 参考Github issues Docker安装问题
1 | sudo wget -qO- https://get.docker.com/ | SH |
3. Docker构建Hadoop镜像
获取CentOS7镜像
1 | 大约70+ MB 速度较慢 |
建议使用阿里云Docker镜像加速器,参考使用阿里云Docker镜像加速器
1 | 查看镜像列表 可以看到刚才pull的centos的镜像 |
在CentOS7的基础上构建CentOS-SSH镜像
使用Dockerfile安装CentOS-SSH镜像
1
2
3
4最好在一个空文件夹中创建 不要在系统根目录创建
mkdir ~/centos-ssh
cd centos-ssh
vi DockerfileDockerfile内容为:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16FROM centos
RUN yum install -y openssh-server sudo
RUN sed -i 's/UsePAM yes/UsePAM no/g' /etc/ssh/sshd_config
RUN yum install -y openssh-clients
用户root,密码111111,将此用户添加到sudoers里
RUN echo "root:111111" | chpasswd
RUN echo "root ALL=(ALL) ALL" >> /etc/sudoers
RUN ssh-keygen -t dsa -f /etc/ssh/ssh_host_dsa_key
RUN ssh-keygen -t rsa -f /etc/ssh/ssh_host_rsa_key
RUN mkdir /var/run/sshd
EXPOSE 22
CMD ["/usr/sbin/sshd", "-D"]关于Dockerfile更多指令来构建镜像,参考[Dockerfile官方文档](<https://docs.docker.com/engine/reference/builder/)
构建CentOS-SSH镜像
1
2
3
4docker build -t="centos7-ssh" .
查看镜像列表 可以看到刚才build的centos7-ssh的镜像
docker images
在CentOS-SSH的基础上构建Hadoop+JDK镜像
使用Dockerfile安装CentOS-SSH镜像
1
2
3
4最好在一个空文件夹中创建 不要在系统根目录创建
mkdir ~/hadoop
cd hadoop
vi Dockerfile在~/hadoop文件夹中放入JDK安装包和Hadoop安装包
我使用的是jdk-8u131-linux-x64.tar.gz与hadoop-2.7.2.tar.gz
不同版本JDK或Hadoop需在Dockerfile中修改
Dockerfile内容为:
1
2
3
4
5
6
7
8
9
10
11
12
13FROM centos7-ssh
ADD jdk-8u131-linux-x64.tar.gz /usr/local/
RUN mv /usr/local/jdk1.8.0_131 /usr/local/jdk1.8
ENV JAVA_HOME /usr/local/jdk1.8
ENV PATH $JAVA_HOME/bin:$PATH
ADD hadoop-2.7.2.tar.gz /usr/local
RUN mv /usr/local/hadoop-2.7.2 /usr/local/hadoop
ENV HADOOP_HOME /usr/local/hadoop
ENV PATH $HADOOP_HOME/bin:$PATH
RUN yum install -y which sudo关于Dockerfile更多指令来定制镜像,参考[Dockerfile官方文档](<https://docs.docker.com/engine/reference/builder/)
构建CentOS-SSH镜像
1
2
3
4docker build -t="hadoop" .
查看镜像列表 可以看到刚才build的hadoop的镜像
docker images
4. Dokcer搭建Hadoop完全分布式集群
两个概念的区别:参考Docker-从入门到实践
- Docker镜像(类)
- Docker容器(对象)
基本规划:
1
2
3172.18.0.2 hadoop1 namenode datanode
172.18.0.3 hadoop2 datanode
172.18.0.4 hadoop3 secondarynamenode datanode由于docker容器重新启动之后ip会发生变化,所以需要设置固定ip
1
2
3
4
5
6
7
8创建自定义网络 并且指定网段172.18.0.0/16
docker network create --subnet=172.18.0.0/16 mynetwork
查看创建的mynetwork
docker network ls
启动三个容器,分别作为hadoop1 hadoop2 hadoop3
docker run --privileged --name hadoop1 --hostname hadoop1 --net mynetwork --ip 172.18.0.2 -d -P -p 50070:50070 -p 8088:8088 -p 9000:9000 -p 50020:50020 hadoop /usr/sbin/init
docker run --name hadoop2 --hostname hadoop2 --net mynetwork --ip 172.18.0.3 -d -P hadoop
docker run --name hadoop3 --hostname hadoop3 --net mynetwork --ip 172.18.0.4 -d -P hadoop关于hadoop1容器的启动方式说明
1
2
3由于CentOS7镜像本身存在的缺陷 无法使用systemctl命令
可以通过启动容器时加参数--privileged和最末加上/usr/sbin/init 使用systemctl命令
docker run --privileged .... /usr/sbin/init- 具体报错:在安装好mysql启动时:
1
2systemctl start mysqld
Failed to get D-Bus connection: Operation not permitted
SSH免密码登录设置
打开3个容器的终端
1
2
3
4打开三个终端分别为hadoop1,hadoop2与hadoop3
docker exec -it hadoop1 /bin/bash
docker exec -it hadoop2 /bin/bash
docker exec -it hadoop3 /bin/bash配置主机映射
1
2
3
4
5在该文件内加入新的主机映射 内网ip 主机名
vi /etc/hosts
172.18.0.2 hadoop1
172.18.0.3 hadoop2
172.18.0.4 hadoop3SSH免密码登录设置
1
2
3
4
5
6
7ssh-keygen
拷贝公钥到需要免密码登录的目标服务器上
ssh-copy-id hadoop01
ssh-copy-id hadoop02
ssh-copy-id hadoop03
使用`ssh 主机名`免密码登录目标服务器
ssh hadoop0x
5. 配置Hadoop与JDK的环境变量
profile文件中加入环境变量
1
2
3
4
5
6
7
8
9
10
11
12
13
14vi /etc/profile
#JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_131
export PATH=$JAVA_HOME/bin:$PATH
#HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-2.7.2
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
让修改后的profile生效
source /etc/profile
java -version
hadoop
6. 修改Hadoop配置文件
相关文件
1
2
3
4
5
6
7cd /usr/local/hadoop/etc/hadoop
vi core-site.xml
vi hdfs-site.xml
vi yarn-site.xml
vi slaves
vi yarn-env.sh
vi mapred-env.shcore-site.xml
1
2
3
4
5
6
7
8<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop1:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/data/tmp</value>
</property>hdfs-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop3:50090</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>yarn-site.xml
1
2
3
4
5
6
7
8<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop2</value>
</property>slaves
1
2
3hadoop1
hadoop2
hadoop3yarn-env.sh
1
export JAVA_HOME=/usr/local/hadoop1.8
mapred-env.sh
1
export JAVA_HOME=/usr/local/hadoop1.8
hadoop-env.sh
1
2
3
4Hadoop启动报Error: JAVA_HOME is not set and could not be found
export JAVA_HOME=$JAVA_HOME
修改为绝对路径
export JAVA_HOME=/usr/local/hadoop1.8分发文件到hadoop2与hadoop3
1
2scp -r /usr/local/hadoop/etc/hadoop hadoop2:/usr/local/hadoop/etc/
scp -r /usr/local/hadoop/etc/hadoop hadoop3:/usr/local/hadoop/etc/
7. Hadoop集群启动
首次启动集群,需要格式化namenode
1
bin/hdfs namenode -format
在Hadoop01中启动HDFS
1
sbin/start-dfs.sh
此时可以打开本地ubuntu浏览器输入:localhost:50070看到web管理界面
关闭HDFS
1
sbin/stop-dfs.sh
启动与关闭yarn
1
2sbin/start-yarn.sh
sbin/stop-yarn.sh全部启动与关闭
1
2start-all.sh
stop-all.sh备注:由于本地ubuntu使用docker启动容器时,指定的为主机网络。Docker 容器的网络会附属在主机上,两者是互通的。所以使用ssh可以直接连接到三个容器。
如,在容器中运行一个Web服务,监听8080端口,则主机的8080端口就会自动映射到容器中。
8. 参考资料
[Dockerfile官方文档](<https://docs.docker.com/engine/reference/builder/)
Hadoop启动报Error: JAVA_HOME is not set and could not be found解决办法