Accomplished Tasks

  • Installed and configured Spark on the new servers
  • Moved new servers to the server room
  • Decided on a new topology for mptcp machines
  • Compiled notes on modeling
  • Started working through the modeling book
  • Found additional resources for studying performance modeling

Weekly Summary

Snow Day!

We had a snow day this week where we received over 12inches of snow. The university was closed which means I was stuck at home with all of my distractions. Little work got done that day. Although relaxing can be seen as necessary for working hard.

Modeling Work

I finally started making my way through the modeling book. I am currently making my way through the probability review which is slow but will be useful. Parallel to this I spent some time compiling all of my notes from our collaborator on possible modeling solutions. I don’t fully understand them all but I eventually will! As I made my way through these notes, I paused and found some tutorials on the concepts so I can eventually figure everything out.

New servers

We moved the new servers to the server room after spending several months in the lab. It ending up being Ironic because the day after we lost our heat in the lab and we could have used the warmth from the servers!

Topology

I spent some time this week on designing a topology for our MPTCP experiments. I used a modifed FAT Tree since it seems to be the recommended topology and several of its properties our good for MPTCP. A brief drawing of the topology is below:

Spark

I configured our new servers to run Hadoop’s HDFS and spark. It was a fairly easy process as I had configured a nice script to do most of the heavy listing. A generalized install process is found below:

  1. Modify /etc/hosts to remove the hostname from the 127.0.0.1 line
     127.0.0.1 localhost HOSTNAME
    
  2. Clone git repo
  3. Install pre-requisites
     sudo apt update & sudo apt install -y openssh-server openjdk-8-jdk wget sshpass 
    
  4. Fetch install files
     wget http://URL-TO-APPS/hadoop-2.7.2.tar.gz
     wget http://URL-TO-APPS/spark-1.6.2-bin-hadoop2.6.tgz
    
  5. Untar install files
     tar xzf hadoop-2.7.2.tar.gz
     tar xzf spark-1.6.2-bin-hadoop2.6.tgz
    
  6. Move files to correct locations
     mv hadoop-2.7.2 /usr/local/hadoop
     mv spark-1.6.2-bin-hadoop2.6 /usr/local/spark
    
  7. Set up environment variables
     echo "JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64" >> ~/.bashrc
     echo "HADOOP_HOME=/usr/local/hadoop" >> ~/.bashrc
     echo "SPARK_HOME=/usr/local/spark" >> ~/.bashrc
     echo "PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SPARK_HOME/bin:$SPARK_HOME/sbin" >>~/.bashrc
     source ~/.bashrc
    
  8. Set up ssh-keys (OPTIONAL: If you have already done or want to use the same key for all)
     ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ''
     ssh-copy-id [email protected]
     ssh-copy-id [email protected]
     ssh-copy-id [email protected]
     .
     .
     .
     ssh-copy-id [email protected]
    
  9. Setup HDFS directories
     sudo mkdir -p /mnt/hdfs/namenode
     sudo mkdir -p /mnt/hdfs/datanode
     sudo mkdir $HADOOP_HOME/logs
    
  10. Modify Hadoop Configuration Files (./config/* )
  11. Modify core-site.xml replace MASTER with hostname of master
        <value>hdfs://MASTER:9000/</value>  
    
  12. Modify slaves, add the names of each slave one per line
  13. Modify hadoop-env.sh to specify the correct java version
        export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
    
  14. Modify Spark Configuration Files (./spark_config/* )
  15. Modify slaves, add the names of each slave one per line
  16. Modify spark-defaults.conf, specify the spark master hostname. Replace MASTER with the url of the master node
        spark.master                     spark://MASTER:7077
    
  17. Make the spark history server directory. This is specified in spark-defaults.conf and can be changed as you see fit. mkdir /history
  18. Copy the configuration files
    sudo cp config/ssh_config ~/.ssh/config
    sudo cp config/hadoop-env.sh $HADOOP_HOME/etc/hadoop/hadoop-env.sh
    sudo cp config/hdfs-site.xml $HADOOP_HOME/etc/hadoop/hdfs-site.xml
    sudo cp config/core-site.xml $HADOOP_HOME/etc/hadoop/core-site.xml
    sudo cp config/mapred-site.xml $HADOOP_HOME/etc/hadoop/mapred-site.xml
    sudo cp config/yarn-site.xml $HADOOP_HOME/etc/hadoop/yarn-site.xml
    sudo cp config/slaves $HADOOP_HOME/etc/hadoop/slaves
    cp config/start-hadoop.sh ~/start-hadoop.sh
    cp config/run-wordcount.sh ~/run-wordcount.sh
    cp spark_config/slaves $SPARK_HOME/conf/slaves
    cp spark_config/spark-defaults.conf $SPARK_HOME/conf/spark-defaults.conf
    
  19. Set permissions. Replace USER:USER with your username
    chmod +x ~/start-hadoop.sh
    chmod +x ~/run-wordcount.sh
    chmod +x $HADOOP_HOME/sbin/*
    sudo chmod -R +rwx $HADOOP_HOME
    sudo chmod -R +rwx $SPARK_HOME
    sudo chown -R USER:USER /mnt/hdfs
    sudo chown -R USER:USER /history
    sudo chown -R USER:USER $HADOOP_HOME/logs
    
  20. Format namenodes
    sudo /usr/local/hadoop/bin/hdfs namenode -format
    
  21. Start HDFS
    start-dfs.sh
    
  22. Start Spark Master
    start-master.sh
    
  23. Start Spark Slaves
    start-slaves.sh
    
  24. FIN

Misc notes

I have officially accepted my internship offer to work at IBM Thomas J. Watson Research Center. It looks like I will be working in Yorktown Heights for the summer and I can’t wait to see what exciting projects I’ll be working on.

What’s up for next week?

  • Discuss with Don and Prashant the experiments to run for mptcp datacenter workloads
  • Rack the servers in the server room and attach networking.
  • Continue making my way through the modeling text book.