Undergrad ML4T Software Setup

From Quantitative Analysis Software Courses
Jump to navigation Jump to search

Notice

Due to issues with code distribution directly from a git repo, projects and data will be distributed via zip file from this wiki. A zip file containing the grading script and any template code or data will be linked off of each assignment's individual wiki page. A zip file containing the grading and util modules, as well as the data and the first project, is available here: Media:CS4646 Spr18.zip. The instructions on running the test scripts provided below still applies.

Overview

Most of the projects in this class will be graded automatically. As of the summer 2017 semester, we are providing the grading scripts with the template code for each of the projects, so that students can test their code to make sure they are API compatible. Georgia Tech also provides access to four servers that have been configured to be identical to the grading environment, specifically in terms of operating system and library versions. Since these servers have already been configured with all necessary libraries, setup has been greatly simplified.

Important Notes

  • Your code MUST run properly on the Georgia Tech provided servers, and your code must be submitted to T-square. If you do not test your code on the provided machines it may not run correctly when we test it. If your code fails to run on the provided servers, you will not get credit for the assignment. So it is very important that you ensure that you have access to, and that your code runs correctly on, these machines. If you would like to develop on your personal machine and are comfortable installing libraries by hand, you can follow the instructions here: ML4T_Software_Installation. Note that these instructions are from an earlier version of the graduate class, but should work reasonably well.
  • We use a specific, static dataset for this course, which is provided as part of the repository detailed below. If you download your own data from Yahoo (or elsewhere), you will get wrong answers on assignments.
  • We reserve the right to modify the grading script while maintaining API compatibility with what is described on the project pages. This includes modifying or withholding test cases, changing point values to match the given rubric, and changing timeout limits to accommodate grading deadlines. The scripts are provided as a convenience to help students avoid common pitfalls or mistakes, and are intended to be used as a sanity check. Passing all tests does not guarantee full credit on the assignment, and should be considered a necessary but not sufficient condition for completing an assignment.
  • Using github.gatech.edu to back up your work is a very good idea which we encourage, however make sure that you do not make your solutions to the assignments public. It's easy to accidentally do this, so please be careful:
    • Do not put your solutions in a public repository. Repositories on github.com are public by default. The Georgia Tech github, github.gatech.edu, provides the same interface and allows for free private repos for students.

Access to machines at Georgia Tech

There are 4 machines that will be accessible to students enrolled in the ML4T class via ssh. These machines may not be available until the second week of class; we will make an announcement once they are ready, and if at that time you are still unable to log in, please contact us. If you are using a Unix based operating system, such as Ubuntu or Mac OS X, you already have an ssh client, and you can connect to one of the servers by opening up a terminal and typing:

xhost +
ssh -X gtname@buffet0X.cc.gatech.edu

replacing the X in buffet0X with 1-4, as detailed below. You will then be asked for your password and be logged in. Windows users may have to install an ssh client such as putty. In order to distribute workload across the machines, please use the specific machines as follows:

  • buffet01.cc.gatech.edu if your last name begins with A-G
  • buffet02.cc.gatech.edu if your last name begins with H-N
  • buffet03.cc.gatech.edu if your last name begins with O-U
  • buffet04.cc.gatech.edu if your last name begins with V-Z

These machines use your GT login credentials.

The xhost command and the -X argument to ssh are only necessary if you want to interactively draw plots directly to your screen while running code remotely on buffet. If you have any problems doing this, just forgo xhost and the -X argument and instead plot to a file using the Agg backend of matplotlib and the savefig() function. These require no "screen" access.

NOTE: We reserve the right to limit login access or terminate processes to avoid resource contention during grading, although we will endeavor to limit such interruptions.


Running the grading scripts

The directory structure you receive contains the grading scripts, data, and template code for all assignments (eventually). To complete the assignments you'll need to modify the templates according to the assignment description. You can do this on the buffet0X machines directly using a text editor such as gedit, nano, or vim. Or you can copy the file to your local machine, edit them in your favorite text editor or IDE, and upload them back to the server. Make sure to test run your code on the server after making changes to catch any typos or other bugs.

To test your code, you'll need to set up your PYTHONPATH to include the grading module and the utility module util.py, which are both one directory up from the project directories. Here's an example of how to run the grading script for the first assignment:

PYTHONPATH=../:. python grade_analysis.py

which assumes you're typing from the folder <ZIP_DIRECTORY>/assess_portfolio/. This will print out a lot of information, and will also produce two text files: points.txt and comments.txt, which summarize the output, including any errors or failed test cases.