###################################
CS435 Cluster Installation Tutorial 2
###################################
1. Copy the VM from the previous tutorial in to a new VM. You may call this file master. 
#We will make changes to this VM so that it becomes part of a cluster consisting of 1 master node, and 2 slave nodes. 
#########################################################################################################

2. Rename the machine. Use the gedit to do the following

sudo gedit /etc/hostname

#This should open the file consisting of the name of your machine. Type the machine name and save the file.
#You may logout and login to have the changes get in effect.
#########################################################################################################

3. Re-Configure hadoop
#configure hadoop so it knows what hosts are workers
gedit /usr/local/hadoop/etc/hadoop/workers

#add the following to this file
hadoop1
hadoop2
hadoop3


#save the file.
<---------------------------<---------------------------<--------------------------->

#we have already configured hadoop xml files. You need to edit the core-site.xml file. Change the <property> tag <name> with value

<value>https://hadoop1:9000</value>
<---------------------------<---------------------------<--------------------------->

Save the file. 
<---------------------------<---------------------------<--------------------------->
#make sure tmp file directories are clean. Run the following on terminal
cd /usr/local/hadoop_tmp
rm * -R
mkdir n
mkdir d
ls -al
chmod 755 n
chmod 755 d

#This will clean the tmp folders so we can work on the cluster.

#########################################################################################################
4. Lets prepate the network package and tools. Use the apt install to install the net-tools package.
sudo apt install net-tools

#check the ip address of the host
ifconfig

#watch out for your ethernet controller item. usually it is eth0 or ens33 or similar
#we will setup our slave nodes with these ip addresses
#192.168.5.131	hadoop1 which is the master
#192.168.5.132	hadoop2 which is a slave
#192.168.5.133	hadoop3 which is a slave

#WARNING# The above is an Example only. You need to check the IP addresses of your network.

#########################################################################################################
##########################################  I M P O R T A N T  ##########################################
#########################################################################################################
5. Setting up static IP address of your host
sudo ifconfig ens33 192.168.5.131

#WARNING# The above is an Example only. You need to check the IP addresses of your network.

<---------------------------<---------------------------<--------------------------->
#we can edit the interface file so the changes become permanent
sudo gedit /etc/host/interfaces

#type the following
auto lo
iface lo inet loopback

auto ens33
iface ens33 inet static
  address 192.168.5.131
  netmask 255.255.255.0
<---------------------------<---------------------------<--------------------------->
# we will now reset the hosts file 
sudo gedit /etc/hosts

#type in the following to overwrite the existing info
192.168.5.131	hadoop1
192.168.5.132	hadoop2
192.168.5.133	hadoop3
<---------------------------<---------------------------<--------------------------->

#WARNING# The above is an Example only. You need to check the IP addresses of your network.

#########################################################################################################
6. reboot the machine so the changes take effect

sudo reboot now

#ssh to the machine once to make passwordless ssh
ssh hadoop1


#########################################################################################################
7. Clean the temporary datanode directory and Reboot

cd /usr/local/hadoop_tmp
rm -rf d
mkdir d

#Now shutdown your VM. This VM is a Ubuntu host that serves as a Node in the hadoop cluster. 

#In your host machine, make 3 copies of this Node/VM. Change the name of each of these VMs appropriately.


#########################################################################################################
#########################################################################################################
#########################################################################################################
8. The following are instructions to prepare the worker node. Repeat the same instructions for hadoop2, hadoop3 and so on.

Start the VM. Login as before, and make the following changes:

#change the machine hostname to hadoop2, where 2 is the slave number
sudo nano /etc/hostname

#The Vi editor opens. Change the name to hadoop2. Use Ctrl SX to save and quit
Ctrl SX

# check the name of your machine
hostname

#It should show hadoop2

###################################
9. For this VM, we will change the network settings:
sudo ifconfig ens33 192.168.5.132

#note, we changed the IP address to 192.168.5.132

###################################
10. Test if you can ssh to this machine
ssh hadoop2

#check the IP address
ipconfig

#The IP should be  192.168.5.132

#########################################################################################################
##########   Repeat steps 8-9-10   for VMs with hostname hadoop3 and so on             ##################
#########################################################################################################

11. We assume that all 4 of the VMs are running on your host machine. We will now enter hadoop1 which serves as master. We will connect to other machines using ssh.

ssh hadoop2
#This allows you to connect to hadoop2. To go back ->

exit

#Test this for all machines hadoop1, 2, 3 and 4.

#########################################################################################################
12. startup the cluster

#go the hadoop1 master machine. format namenode
hdfs namenode -format

#make sure there are no errors. If all is well, start the cluster

#start hdfs
start-all.sh

#Once the prompt becomes available do:
jps

# You will see a list with NameNode, SecondaryNameNode, DataNode, ResourceManager, NodeManager on hadoop1 ( node-master)
# Switch to any other worker VM; jps will give a list of a DataNode and a NodeManager on each of hadoop2, hadoop3 and hadoop4. 
#########################################################################################################
13. You can see the webUI here:
#for hdfs open browser and type
http://hadoop1:9870/

#for yarn
http://hadoop1:8088/

###################################
14. You are familiar with the pi program run from the first tutorial, he we run the MapReduce wordcount program
#Lets make some folders and files in hdfs
hdfs dfs -mkdir books
hdfs dfs -ls -R /

#This will make directory book and show all files therein
<---------------------------<---------------------------<--------------------------->
#download books from projectgutenberg website
#assuming that you downloaded these files:alice.txt holmes.txt frankenstein.txt

hdfs dfs -put alice.txt holmes.txt frankenstein.txt books

#this will copy the 3 files to the dfs in books folder

#lets see the directory
hdfs dfs -ls books
<---------------------------<---------------------------<--------------------------->
#run the wordcount program. this will read all the files in the dfs books/ folder and write response to output folder
 
hadoop jar hadoop-mapreduce-examples-3.3.6.jar wordcount "books/*" output

<---------------------------<---------------------------<--------------------------->
#download the output folder from the dfs. It will create folder /home/hadoop1/output
hdfs dfs -get output /home/hadoop1/output

#use gedit to open the resulting file from the output folder

#########################################################################################################
15. cluster status reports and shutdown

#get a report on your hdfs
hdfs dfsadmin -report

#check yarn cluster details
yarn node -list
#########################################################################################################
16. Closing the cluster.
#stop the cluster safely
stop-all.sh


###################################