Difference between revisions of "ML4T Software Installation"

From Quantitative Analysis Software Courses
Jump to navigation Jump to search
(Added matplotlib to install list)
Line 2: Line 2:
 
== Overview ==
 
== Overview ==
  
Use the following instructions to set up a development environment on your local machine. It should include:
+
There are two main environments available to you to develop and test your code for this class:  
  
* The proper version of Python (namely 2.7)
+
# An Ubuntu Linux image we have created that you can run in a VM on your machine
* Installation of necessary libraries (e.g. NumPy, Pandas, etc.)
+
# One of several high performance machines at Georgia Tech
* Installation of historical stock data.
+
 
 +
Both of these have been set up with the same, correct software libraries. Your code MUST run properly in one of these environments, otherwise it may not run correctly in our auto grader. If your code fails to run in the auto grader environment, you might not get credit for the assignment.  So it is very important that you ensure that you have access to one of these environments.
 +
 
 +
You may, for convenience, choose to also manually install the software on your personal machine. Keep in mind, however, that this is not officially supported and it <b>is at your own risk</b>: [[ML4T_Software_Manual_Installation]]
  
 
'''Important note''': We use a specific, static dataset for this course, which we will provide. If you download your own data from Yahoo (or elsewhere), you will get wrong answers on assignments.
 
'''Important note''': We use a specific, static dataset for this course, which we will provide. If you download your own data from Yahoo (or elsewhere), you will get wrong answers on assignments.
 +
 +
==Access to machines at Georgia Tech==
 +
 +
We will configure machines at Georgia Tech so that you can connect to them remotely using your GT login credentials.  To connect to one of these machines, open a terminal window (or DOS window) and type:
 +
 +
xhost +
 +
ssh -X gtname@buffet0X.cc.gatech.edu
 +
 +
You will then be asked for your password and be logged in.  In order to distribute workload across the machines, please use the specific machines as follows:
 +
 +
* buffet01.cc.gatech.edu if your last name begins with A-F
 +
* buffet02.cc.gatech.edu if your last name begins with G-L
 +
* buffet03.cc.gatech.edu if your last name begins with M-R
 +
* buffet04.cc.gatech.edu if your last name begins with S-Z
 +
 +
==Install set up and test an image==
 +
 +
If you don't want to connect remotely to GT machines, you can download and install a bootable image that can run in a virtual machine.
  
 
== Required software ==
 
== Required software ==

Revision as of 14:11, 8 January 2016

Overview

There are two main environments available to you to develop and test your code for this class:

  1. An Ubuntu Linux image we have created that you can run in a VM on your machine
  2. One of several high performance machines at Georgia Tech

Both of these have been set up with the same, correct software libraries. Your code MUST run properly in one of these environments, otherwise it may not run correctly in our auto grader. If your code fails to run in the auto grader environment, you might not get credit for the assignment. So it is very important that you ensure that you have access to one of these environments.

You may, for convenience, choose to also manually install the software on your personal machine. Keep in mind, however, that this is not officially supported and it is at your own risk: ML4T_Software_Manual_Installation

Important note: We use a specific, static dataset for this course, which we will provide. If you download your own data from Yahoo (or elsewhere), you will get wrong answers on assignments.

Access to machines at Georgia Tech

We will configure machines at Georgia Tech so that you can connect to them remotely using your GT login credentials. To connect to one of these machines, open a terminal window (or DOS window) and type:

xhost +
ssh -X gtname@buffet0X.cc.gatech.edu

You will then be asked for your password and be logged in. In order to distribute workload across the machines, please use the specific machines as follows:

  • buffet01.cc.gatech.edu if your last name begins with A-F
  • buffet02.cc.gatech.edu if your last name begins with G-L
  • buffet03.cc.gatech.edu if your last name begins with M-R
  • buffet04.cc.gatech.edu if your last name begins with S-Z

Install set up and test an image

If you don't want to connect remotely to GT machines, you can download and install a bootable image that can run in a virtual machine.

Required software

Install Python 2.7 (NOT Python 3) and the necessary libraries as instructed below for your favorite platform.

Linux

  • Install Python 2.7 [link]
  • Install pip (in case your Python doesn't come with it) [link]
  • Install virtualenv, virtualenvwrapper (highly recommended) [link]
    • Create a virtual environment to use for this course:
   $ mkvirtualenv ml4t
   $ workon ml4t
    • And then pip install the following within it.
  • NumPy 1.9+, SciPy 0.14+, Matplotlib 1.1+, Pandas 0.16+ [link]

Mac OS X

  • Install Python 2.7 via Homebrew
    • If you don't have it already, first get Homebrew
    • Then: brew install python
  • Install virtualenv, virtualenvwrapper (highly recommended) [link]
    • Create a virtual environment to use for this course:
   $ mkvirtualenv ml4t
   $ workon ml4t
    • And then pip install the following within it.
  • NumPy 1.9+, SciPy 0.14+, Matplotlib 1.1+, Pandas 0.16+ [link]

Windows

  • Install Python 2.7 [link]
  • Install pip (in case your Python doesn't come with it) [link]
  • Install virtualenv, virtualenvwrapper (highly recommended) [link]
    • Create a virtual environment to use for this course:
   C:\Users\Monty> mkvirtualenv ml4t
   C:\Users\Monty> workon ml4t
    • And then pip install the following within it.
  • NumPy 1.9+, SciPy 0.14+, Matplotlib 1.1+, Pandas 0.16+ [link]

Optional software

Data

  • Download: ml4t.zip
    Note: If you downloaded this prior to Aug 21, 2015, please download again. Some missing files have been included and minor issues fixed.
  • Unzip it. That should create a ml4t/ directory with the following contents:
   ml4t
   ├── data
   │   ├── $DJI.csv
   │   ├── $SPX.csv
   │   ├── $VIX.csv
   │   ├── A.csv
   │   ├── AA.csv
   │   ├── AAPL.csv
   │   ├── ...
   │   ├── YHOO.csv
   │   ├── YUM.csv
   │   ├── ZION.csv
   │   └── ZMH.csv
   └── validate_env.py

Whenever you need to work on assignments for this class, run your program from within ml4t/ so that you can access data/*.csv using a relative path.

Test installation

Test your environment by running the script validate_env.py from the ml4t/ directory:

   python validate_env.py

If it complains, or if any of the installed library versions are older than the desired versions, fix the problems, and then repeat.

A clean output from validate_env.py is required for MC1-Homework-2.