Difference between revisions of "ML4T Software Setup"

From Quantitative Analysis Software Courses
Jump to navigation Jump to search
Line 1: Line 1:
 
== Overview ==
 
== Overview ==
  
As of Fall 2016, this class has implemented an automated assignment feedback process which requires students to use the servers provided by Georgia Tech. Since these servers have already been configured with all necessary libraries, setup has been reduced to simply checking out a single git repository, which will be covered below. For students with sporadic internet access who would like a local installation of the software, the instructions from previous semesters are available here: [[ML4T_Software_Installation]].
+
This class has implemented an automated test suite which enables students to test their code submissions using servers provided by Georgia Tech. Since these servers have already been configured with all necessary libraries, setup has been reduced to simply checking out a single git repository, which will be covered below. For students with sporadic internet access who would like a local installation of the software, the instructions from previous semesters are available here: [[ML4T_Software_Installation]].
  
 
===Important Notes===
 
===Important Notes===
 +
 
* Your code '''MUST''' run properly on the Georgia Tech provided servers, and your code must be submitted to T-square. If you do not test your code on the provided machines it may not run correctly in our auto grader.  If your code fails to run on the provided servers, you will not get credit for the assignment.  So it is very important that you ensure that you have access to, and that your code runs correctly on, these machines.
 
* Your code '''MUST''' run properly on the Georgia Tech provided servers, and your code must be submitted to T-square. If you do not test your code on the provided machines it may not run correctly in our auto grader.  If your code fails to run on the provided servers, you will not get credit for the assignment.  So it is very important that you ensure that you have access to, and that your code runs correctly on, these machines.
 
* We use a specific, static dataset for this course, which is provided as part of the repository detailed below. If you download your own data from Yahoo (or elsewhere), you will get wrong answers on assignments.
 
* We use a specific, static dataset for this course, which is provided as part of the repository detailed below. If you download your own data from Yahoo (or elsewhere), you will get wrong answers on assignments.

Revision as of 10:54, 12 January 2017

Overview

This class has implemented an automated test suite which enables students to test their code submissions using servers provided by Georgia Tech. Since these servers have already been configured with all necessary libraries, setup has been reduced to simply checking out a single git repository, which will be covered below. For students with sporadic internet access who would like a local installation of the software, the instructions from previous semesters are available here: ML4T_Software_Installation.

Important Notes

  • Your code MUST run properly on the Georgia Tech provided servers, and your code must be submitted to T-square. If you do not test your code on the provided machines it may not run correctly in our auto grader. If your code fails to run on the provided servers, you will not get credit for the assignment. So it is very important that you ensure that you have access to, and that your code runs correctly on, these machines.
  • We use a specific, static dataset for this course, which is provided as part of the repository detailed below. If you download your own data from Yahoo (or elsewhere), you will get wrong answers on assignments.

Access to machines at Georgia Tech

There are 4 machines that will be accessible to students enrolled in the ML4T class via ssh. These machines may not be available until the second week of class; we will make an announcement once they are ready, and if at that time you are still unable to log in, please contact us. If you are using a Unix based operating system, such as Ubuntu or Mac OS X, you already have an ssh client, and you can connect to one of the servers by opening up a terminal and typing:

xhost +
ssh -X gtname@buffet0X.cc.gatech.edu

replacing the X in buffet0X with 1-4, as detailed below. You will then be asked for your password and be logged in. Windows users may have to install an ssh client such as putty. In order to distribute workload across the machines, please use the specific machines as follows:

  • buffet01.cc.gatech.edu if your last name begins with A-G
  • buffet02.cc.gatech.edu if your last name begins with H-N
  • buffet03.cc.gatech.edu if your last name begins with O-U
  • buffet04.cc.gatech.edu if your last name begins with V-Z

These machines use your GT login credentials.

NOTE: We reserve the right to limit login access or terminate processes to avoid resource contention with the autograder after assignment due dates, although we will endeavor to limit such interruptions.

After you've successfully logged in, you will need to clone the following git repository containing all of the template code and data into your home directory: [1]. You can do this with the following command:

git clone https://github.gatech.edu/tb34/ML4T_2016Fall.git

again providing your GT login credentials when asked for. Make sure you check out the repository into your home directory (not any sub-directory), and that you do not change the name of the folder.

NOTE: If you change or rename directory structure, the autograder will not be able to find your assignments and you will not get any feedback.

Getting feedback from the auto grader

The repository you've just cloned contains the data and template code for all assignments. To complete the assignments you'll need to modify the templates according to the assignment description. You can do this on the buffet0X machines directly using a text editor such as gedit, nano, or vim. Or you can copy the file to your local machine, edit them in your favorite text editor or IDE, and upload them back to the server. Make sure to test run your code on the server after making changes to catch any typos or other bugs.

After you are satisfied that your program contains no obvious errors, you can have it tested by our auto grading script to make sure it passes all of our test cases. To do this, simply create an empty text file named 'GRADEME.txt' (case sensitive) in the directory of the assignment you would like graded.

The auto grader runs periodically, and if it finds the GRADEME.txt file in a students assignment directory, it will run the auto grader on that assignment, provide a score.txt and comments.txt file with more detailed information in the feedback/ sub-directory of the assignment, and remove the GRADEME.txt file.

NOTE: The autograder will only remove the GRADEME.txt file once it has completed its run and copied it's files into the feedback/ directory, so if you modify your code before this happens, the feedback may correspond to an earlier version of your code.

Updating the repository

Note: these instructions are for students who have not committed to their repository, added a different origin, or any other advanced git techniques. If you have done this, some quick googling should resolve any questions you have.

From here on, we'll assume you've checked out the repository, and may have made some modifications you'd like to keep. First things first, figure out what you have changed since you originally pulled the repo. From the ML4T_2016Fall/ directory, run the following command:

git status

Look at the list have files that have changed and make sure it makes sense. For example, if you've only modified the python files for the first two assignments, the output may look something like this:

bhrolenok3@buffet04:~/ML4T_2016Fall$ git status
On branch master
Your branch is up-to-date with 'origin/master'.

Changes not staged for commit:
 (use "git add <file>..." to update what will be committed)
 (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   mc1_p1/analysis.py
	modified:   mc1_p2/optimization.py

You may see a few lines after this under the heading "Untracked files", these are safe to ignore. They are just files that aren't part of the repository (temporary backups, .pyc files, notes, etc). If you see any modified files that you don't remember editing, you can look at the exact differences by using the following git command:

git diff <filename>

replacing <filename> with the name of the file that's been marked as modified. Following the example earlier, here's what running that command looks like for the optimization.py changes I made:

bhrolenok3@buffet04:~/ML4T_2016Fall$ git diff mc1_p2/optimization.py
diff --git a/mc1_p2/optimization.py b/mc1_p2/optimization.py
index 716de34..965d8d9 100644
--- a/mc1_p2/optimization.py
+++ b/mc1_p2/optimization.py
@@ -19,7 +19,8 @@ def optimize_portfolio(sd=dt.datetime(2008,1,1), ed=dt.datetim
 
     # find the allocations for the optimal portfolio
     # note that the values here ARE NOT meant to be correct for a test case
-    allocs = np.asarray([0.2, 0.2, 0.3, 0.3, 0.0]) # add code here to find the 
+    #allocs = np.asarray([0.2, 0.2, 0.3, 0.3, 0.0]) # add code here to find the
+    allocs = np.asarray([0.1,0.1,0.3,0.4,0.1]) #Surely this is the right one!
     cr, adr, sddr, sr = [0.25, 0.001, 0.0005, 2.1] # add code here to compute s
 
     # Get daily portfolio value
(END)

lines with - have been removed, lines with + have been added, so this output means I changed one line in the file, changing the allocs variable. You'll be able to scroll up and down through the changes using your arrow keys, and you'll need to hit the q key to get back to the command line. Once you've identified all the changed files, use scp (or WinSCP or the ssh client of your choice) to copy the files you'd like to keep to your local computer. Now, you can stash all the changes you've made on your copy of the repo on buffet0x using git stash, which, following our example, will look something like this:

bhrolenok3@buffet04:~/ML4T_2016Fall$ git stash
Saved working directory and index state WIP on master: 228f9ec Added validate_env.py, removed mc1_hw1
HEAD is now at 228f9ec Added validate_env.py, removed mc1_hw1
bhrolenok3@buffet04:~/ML4T_2016Fall$ 

Now you can safely pull down all the changes that have been made to the repo since the last time. Do that using git pull:

bhrolenok3@buffet04:~/ML4T_2016Fall$ git pull
Enter passphrase for key '/home/bhrolenok3/.ssh/id_rsa': 
DISPLAY "(null)" invalid; disabling X11 forwarding
remote: Counting objects: 7, done.
remote: Compressing objects: 100% (7/7), done.
remote: Total 7 (delta 0), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (7/7), done.
From github.gatech.edu:tb34/ML4T_2016Fall
   228f9ec..803f0be  master     -> origin/master
Updating 228f9ec..803f0be
Fast-forward
 mc3_p1/Data/winequality-red.csv   | 1599 ++++++++++++
 mc3_p1/Data/winequality-white.csv | 4898 +++++++++++++++++++++++++++++++++++++
 mc3_p1/Data/winequality.names.txt |   72 +
 3 files changed, 6569 insertions(+)
 create mode 100644 mc3_p1/Data/winequality-red.csv
 create mode 100644 mc3_p1/Data/winequality-white.csv
 create mode 100644 mc3_p1/Data/winequality.names.txt

This should be similar for everyone, since the only time the remote repository is updated is when we (TAs/Professor Balch) make changes. At this point, you'll have all the new changes to the repository. From here you can 1) start working from scratch on the current assignment (safest option), 2) copy back the modified files using scp (verify by hand), or 3) use git stash to apply the changes to the new repository. Option 2 should be safe and quick in most instances, and if you're not comfortable with git and the command line may be the easiest. You'll have to check the differences between any files you overwrite when you copy them back to buffet0x, which you can do easily with the git diff command described earlier. Option 3 handles all of these things using git's own tools. To apply your stashed changes from earlier, you can simply call git stash pop:

bhrolenok3@buffet04:~/ML4T_2016Fall$ git stash pop
On branch master
Your branch is up-to-date with 'origin/master'.
 
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)
 
	modified:   mc1_p1/analysis.py
	modified:   mc1_p2/optimization.py

which tells you the status of the repo after applying all your changes, which you should double check makes sense using git diff as before. If you see any "conflicts" or error messages when applying your stashed changes, you'll need to go back over them by hand. Since you have a backup of your files, you can always wipe out the repo and start from a clean slate.