Difference between revisions of "ML4T Software Installation"

From Quantitative Analysis Software Courses
Jump to navigation Jump to search
(Initial description of VM image)
m
(18 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
== Draft ==
 +
This page is being updated for the Fall 2017 semester, and is currently in draft mode. This notice will be removed once this page has been finalized.
  
== Overview ==
+
==Attention==
  
There are two main environments available to you to develop and test your code for this class:
+
The information on this page is for those who are interested to have a Python development environment on their own machine.  Keep in mind that even if you set up your own environment, your code still MUST run correctly on the GT servers, so it is very important that you ensure that you have access to them.  Please see [[ML4T_Software_Setup]] for information on how to use those servers, and how to check out the code scaffolding for the projects.
  
# An Ubuntu Linux image we have created that you can run in a VM on your machine
+
==Overview==
# One of several high performance machines at Georgia Tech
 
  
Both of these have been set up with the same, correct software libraries.  Your code MUST run properly in one of these environments, otherwise it may not run correctly in our auto grader.  If your code fails to run in the auto grader environment, you might not get credit for the assignment.  So it is very important that you ensure that you have access to one of these environments.
+
===Important notes===
  
You may, for convenience, choose to also manually install the software on your personal machine.  Keep in mind, however, that this is not officially supported and it <b>is at your own risk</b>: [[ML4T_Software_Manual_Installation]]
+
* We use a specific, static dataset for this course, which we will provide. If you download your own data from Yahoo (or elsewhere), you will get wrong answers on assignments.
 +
* While these instructions should work for either Windows, macOS, or Linux, we '''strongly recommend''' developing on recent versions of Ubuntu LTS, as there may be significant differences on Windows. You can easily create an Ubuntu based virtual machine image to develop on using any freely available VM software (VirtualBox is probably the easiest and cheapest).
 +
* Regardless of your OS and install method, you should still ensure that your code runs on the provided servers. If your code fails to run on the servers, "it works on my machine" is not a valid excuse, and '''you will receive no credit'''.
  
'''Important note''': We use a specific, static dataset for this course, which we will provide. If you download your own data from Yahoo (or elsewhere), you will get wrong answers on assignments.
+
The assignments in this class are in Python (version 2.7), and rely heavily on a few important libraries. These libraries are under active development, which unfortunately means there can be some compatibility issues between versions. This isn't an issue if you use the provided servers, but if you want to work from your local machine it is very important to make sure you have exactly the same library versions. To that end, here is a list of each library and its version number, provided in the pip freeze format:
  
==Access to machines at Georgia Tech==
+
  cycler==0.10.0
 +
  functools32==3.2.3.post2
 +
  matplotlib==2.0.2
 +
  numpy==1.13.1
 +
  pandas==0.20.3
 +
  py==1.4.34
 +
  pyparsing==2.2.0
 +
  pytest==3.2.1
 +
  python-dateutil==2.6.1
 +
  pytz==2017.2
 +
  scipy==0.19.1
 +
  six==1.10.0
 +
  subprocess32==3.2.7
  
We will configure machines at Georgia Tech so that you can connect to them remotely using your GT login credentials. To connect to one of these machines, open a terminal window (or DOS window) and type:
+
If you are familiar with <code>pip</code> and <code>virtualenv</code> you can use this to create a virtualenv for this class which matches those version numbers. Here is an outline:
  
xhost +
+
# Install virtualenv (if it is not already installed): <code>python -m pip install --user virtualenv</code>
ssh -X gtname@buffet0X.cc.gatech.edu
+
# Create a viritual environment for this class: <code>virtualenv ml4t-venv</code>
 +
# Activate the new virtual environment: <code>source ml4t-venv/bin/activate</code> on Linux or macOS, <code>ml4t-venv/Scripts/activate.bat</code> on Windows.
 +
# Save the above list as a text file, say <tt>ml4t-libraries.txt</tt>
 +
# Install the libraries using the requirements file you just saved: <code>python -m pip install --requirement ml4t-libraries.txt</code>
  
You will then be asked for your password and be logged in.  In order to distribute workload across the machines, please use the specific machines as follows:
+
This will install <code>virtualenv</code> using <code>pip</code>, create a virtual environment in the current directory named <tt>ml4t-venv</tt>, and use <code>pip</code> to install the library versions listed above into that virtual environment. It requires [https://pypi.python.org/pypi/pip pip] which is provided by default on both macOS and Ubuntu, and comes packaged with the standard Python install for Windows. Certain backends for matplotlib may require additional libraries be installed in a platform specific way (on Ubuntu, <code>sudo apt install python-tk</code> should do the trick). More information on each of the tools mentioned on this page can be found here:
 
+
* [https://virtualenv.pypa.io/en/stable/userguide/#usage virtualenv]
* buffet01.cc.gatech.edu if your last name begins with A-F
+
* [https://pip.pypa.io/en/stable/ pip]
* buffet02.cc.gatech.edu if your last name begins with G-L
 
* buffet03.cc.gatech.edu if your last name begins with M-R
 
* buffet04.cc.gatech.edu if your last name begins with S-Z
 
 
 
==Install, set up and test a virtual machine==
 
 
 
If you don't want to connect remotely to GT machines, we have created a VM image with the same operating system and software libraries that you can download (''<span style="color:red">link coming soon</span>'') and run using [https://www.virtualbox.org/wiki/Downloads VirtualBox], Oracle's open source VMM. The credentials for the main account on this image use '''ml4t''' (case sensitive) as both the username and password, should you need to make changes. This image is configured without any optimizations enabled to be as platform agnostic as possible, but we encourage you to enable [https://www.virtualbox.org/manual/ch03.html#idp46608643755984 hardware acceleration], [https://www.virtualbox.org/manual/ch04.html#guestadd-video graphics acceleration], and [https://www.virtualbox.org/manual/ch04.html#idp46608642326848 guest additions] to improve performance ([https://brainwreckedtech.wordpress.com/2012/01/08/howto-convert-vdis-between-fixed-sized-and-dynamic-in-virtualbox/ changing] the virtual disk image from dynamically allocated to fixed may also improve performance).
 
  
 
== Optional software ==
 
== Optional software ==
Line 35: Line 46:
 
* IPython [[http://docs.python-guide.org/en/latest/scenarios/scientific/ link]]
 
* IPython [[http://docs.python-guide.org/en/latest/scenarios/scientific/ link]]
 
* A Python IDE, such as [https://www.jetbrains.com/pycharm/ PyCharm] or [https://pythonhosted.org/spyder/ Spyder]
 
* A Python IDE, such as [https://www.jetbrains.com/pycharm/ PyCharm] or [https://pythonhosted.org/spyder/ Spyder]
 
+
* [https://www.virtualbox.org/ VirtualBox]
== Data ==
 
 
 
* Download: [https://s3.amazonaws.com/content.udacity-data.com/courses/ud501/code/ml4t.zip ml4t.zip]<br/>Note: If you downloaded this prior to Aug 21, 2015, please download again. Some missing files have been included and minor issues fixed.
 
* Unzip it. That should create a <tt>ml4t/</tt> directory with the following contents:
 
    ml4t
 
    ├── data
 
    │   ├── $DJI.csv
 
    │   ├── $SPX.csv
 
    │   ├── $VIX.csv
 
    │   ├── A.csv
 
    │   ├── AA.csv
 
    │   ├── AAPL.csv
 
    │   ├── ...
 
    │   ├── YHOO.csv
 
    │   ├── YUM.csv
 
    │   ├── ZION.csv
 
    │   └── ZMH.csv
 
    └── validate_env.py
 
 
 
Whenever you need to work on assignments for this class, run your program from within <tt>ml4t/</tt> so that you can access <tt>data/*.csv</tt> using a relative path.
 
 
 
=== Test installation ===
 
 
 
Test your environment by running the script <tt>validate_env.py</tt> from the <tt>ml4t/</tt> directory:
 
    python validate_env.py
 
 
 
If it complains, or if any of the installed library versions are older than the desired versions, fix the problems, and then repeat.
 
 
 
A clean output from <tt>validate_env.py</tt> is required for MC1-Homework-2.
 

Revision as of 19:10, 22 August 2017

Draft

This page is being updated for the Fall 2017 semester, and is currently in draft mode. This notice will be removed once this page has been finalized.

Attention

The information on this page is for those who are interested to have a Python development environment on their own machine. Keep in mind that even if you set up your own environment, your code still MUST run correctly on the GT servers, so it is very important that you ensure that you have access to them. Please see ML4T_Software_Setup for information on how to use those servers, and how to check out the code scaffolding for the projects.

Overview

Important notes

  • We use a specific, static dataset for this course, which we will provide. If you download your own data from Yahoo (or elsewhere), you will get wrong answers on assignments.
  • While these instructions should work for either Windows, macOS, or Linux, we strongly recommend developing on recent versions of Ubuntu LTS, as there may be significant differences on Windows. You can easily create an Ubuntu based virtual machine image to develop on using any freely available VM software (VirtualBox is probably the easiest and cheapest).
  • Regardless of your OS and install method, you should still ensure that your code runs on the provided servers. If your code fails to run on the servers, "it works on my machine" is not a valid excuse, and you will receive no credit.

The assignments in this class are in Python (version 2.7), and rely heavily on a few important libraries. These libraries are under active development, which unfortunately means there can be some compatibility issues between versions. This isn't an issue if you use the provided servers, but if you want to work from your local machine it is very important to make sure you have exactly the same library versions. To that end, here is a list of each library and its version number, provided in the pip freeze format:

 cycler==0.10.0
 functools32==3.2.3.post2
 matplotlib==2.0.2
 numpy==1.13.1
 pandas==0.20.3
 py==1.4.34
 pyparsing==2.2.0
 pytest==3.2.1
 python-dateutil==2.6.1
 pytz==2017.2
 scipy==0.19.1
 six==1.10.0
 subprocess32==3.2.7

If you are familiar with pip and virtualenv you can use this to create a virtualenv for this class which matches those version numbers. Here is an outline:

  1. Install virtualenv (if it is not already installed): python -m pip install --user virtualenv
  2. Create a viritual environment for this class: virtualenv ml4t-venv
  3. Activate the new virtual environment: source ml4t-venv/bin/activate on Linux or macOS, ml4t-venv/Scripts/activate.bat on Windows.
  4. Save the above list as a text file, say ml4t-libraries.txt
  5. Install the libraries using the requirements file you just saved: python -m pip install --requirement ml4t-libraries.txt

This will install virtualenv using pip, create a virtual environment in the current directory named ml4t-venv, and use pip to install the library versions listed above into that virtual environment. It requires pip which is provided by default on both macOS and Ubuntu, and comes packaged with the standard Python install for Windows. Certain backends for matplotlib may require additional libraries be installed in a platform specific way (on Ubuntu, sudo apt install python-tk should do the trick). More information on each of the tools mentioned on this page can be found here:

Optional software