This project was made as a final project for Udacity.com for there Machine Learning
Engineer program. This project uses a variety of techniques to create many well
known indicators for trading on the stock market, and utilize these within a deep
learning setup that uses real time recurrent learning. We use this neural network
to predict stock returns for a companies of our choice.
Due to limitations on acquiring free intraday data, we're currently only able to
predict on roughly 50 stocks but if you chose to acquire paid data elsewhere, you
could theoretically predict for any amount of companies you wanted. This setup is
more based for those who don't have the money to buy lots of data though.
The way this package works is first, you create your indicator data using some
precalculated dataframes containing high/low/close/typical values for the past
several years that were adjusted for dividends and splits and such. Once this is
done, you can start training your neural network using this indicator data. This
project allows you to choose the indicator attributes of your choice. For each
indicator, which there are 12, you also have the ability to use 32 different
period lengths. The period length is how much data each indicator uses to
calculate itself. So for example, you might only choose to use the smallest
period length of 6 minutes, or choose the longest period length of 125 trading
days. And you can choose many different options together to get both
short term and long term data combined. Each indicator has preferred period
lengths that work best generally but I encourage you to try to test as much as
you can out yourself.
You probably want to keep the amount of indicators you're using total less than
about 250, as more than that amount of attributes can quickly run up memory and
cpu usage. For my machine, I ran on a generally small amount of resources using
12GB of ram, with an Intel i7-4510U cpu. Much more than than amount of attributes
tended to kill my memory, and in the process, kill my cpu time. I tended to train
on 230 attributes, using roughly 25000 records at a time. I found this to be useful
and provided a good accuracy/limitation ratio.
Once you've set up all of your indicator and neural network data and everything is
up to date, you can start using the real-time portion of the project. In particular,
the RealTimeCalculator notebook allows for calculating indicators in real-time using
data from a Google Finance server that provides us real-time stock data for the
stocks of our choice. This allowed us to then calculate highs/lows/closes/typical
values which then allowed us to calculate our indicators. Once the indicators had
been calculated, our neural network function would be called to make its prediction
for us based on our indicators.
This package is set up to train the neural network on the trading off-hours so that
during the day we could test only, rather than train and test which would use up a
lot more resources. However, eventually, an update will be provided that will give
you the option to train on the data in real-time so you could get more accurate
predictions.
Contact me at [email protected]
1. Ensure you have an IPython Notebook solution such as with
Anaconda which installs a variety of applications including
python, and Ipython solution, and several other helpful
modules such as Numpy, Pandas, and Pip.
2. Retrieve the 3 python files and 2 directories in the
PythonFiles/ directory from this github repository and
store them in the directory with all of your other
python packages.
For example, with Anaconda, the directory is stored in:
C:/path/to/user/Anaconda/Lib/site_packages/
RealTimeNN.py, CalcInd.py, and NyseDatesPrds.py are stored as
standalone files, while the googlefinance and googlefinance-0.7.dist-info
directories are stored as directories. The googlefinance is a slightly
altered version of the googlefinance package written by hongtaocai
3. Install the other modules using pip. For example, cd to the
scripts directory at C:/path/to/user/Anaconda/scripts/ , then
type:
pip install yahoo_finance
pip install quandl
pip install matplotlib
pip install scipy
pip install pickle
pip install pandas
pip install numpy
4. Once all of your modules have been set up, place the other ipynb
files from this repository into a directory called Trading/ and
place these 6 ipython files into this directory.
5. Retrieve the the Pickles/ directory and store this directory within
the same directory as your other ipython files. This directory contains
4 files. The columnnames.pickle file contains a file that ensures all
of our different dataframe columns containing our indicators are in
the same order for every dataframe. The final_lst222AAPL.pickle file
contains an example indicator list for predicting Apple stock returns.
It is a list with 222 indicators that are used to train/test our
neural network. The net222attrs.pickle file contains an example
trained neural network for Apple stock. This file allows you to
test how this package performs without taking the time to train
yourself. Keep in mind, it might not be completely up to date training
wise but I'll do my best to update it as I continue training. The
newsdict.pickle file contains a dictionary containing url's and dates
for stories about each company we're following dating back from up to
2 years ago to present.
6. Retrieve the HLC directory from this repository, within this is
20 pickled files containing premade dataframes with intraday
highs, lows, closes, and typical values at a minute by minute
level. These are provided so you don't have to create them yourself
using the online repository that was used to create them as it made
it so you would have to download individually one at a time and adjust
them yourself.
This repository contains 14 stocks as well as 6 Exchange Traded Funds,
as well as Indexes that are used to help in the prediction process.
If you choose, you can retrieve more stocks/funds/indexes directly from
the online server that was used to create these data sources. The site
is called http://thebonnotgang.com/tbg/historical-data/
Store this repository within your Trading/ directory.
ProgramInfoAndCreation.ipynb
IndicatorCreation.ipynb
CalculateAndAddMissingData.ipynb
DividendAndSplitUpdate.ipynb
RealTimeCalculator.ipynb
RealTimeNeuralNetwork.ipynb
Pickles/
columnnames.pickle
final_lst222AAPL.pickle
net222attrsAAPL.pickle
newsdict.pickle
PythonFiles/
RealTimeNN.py
CalcInd.py
NyseDatesPrds.py
googlefinance package
googlefinance-0.7.dist-info package
HLC directory/
GOOG.pickle
...
yahoo_finance package
quandl package
matplotlib package
scipy package
pickle package
pandas package
numpy package
1. First cd to your Trading/ directory, then first call:
ipython notebook ProgramInfoAndCreation.ipynb
Further instructions on how to view information and create
your initial files are within this file.
2. After everything is setup using that file, open:
DividendAndSplitUpdate.ipynb
This file will allow you to add the data since the intraday data
was put up, and then adjust for any splits and/or dividends that
have happened since the hlc dictionary was put online. Follow the
instructions within this file for more information.
3. After your hlc dictionary is updated for added data and dividend/
stock adjustments, you'll now create the indicators for each
company. To do this, open:
IndicatorCreation.ipynb
For more information, look within the file.
4. Now that your indicators and intraday data are up to date, you
can open:
RealTimeCalculor.ipynb
This file has everything you need to acquire real-time stock data
so as to calculate real-time indicator values as well as use these
values to create you're predictions. For more information, look
within the file.
5. Finally, you'll open:
RealTimeNeuralNetwork.ipynb
This file is used to calculate the neural networks for any companies
that you choose. This file is typically run on trading off-hours because
it is processor and memory intensive, and it allows you to create/update
a neural network that once created/updated can be saved and then will be
used throughout the next trading day to make our predictions. Eventually
this will be set up to continuously update throughout the training day for
more accurate results, but for now, it's set up to give you a neural netwrk
that you will just use to predict and only update after the trading day is
done.
If for any reason, you miss some trading day's data by not using the
RealTimeCalculator.ipynb file, you can update everything back to normal by
opening:
CalculateAndAddMissingData.ipynb
This file allows you to retrieve missed intraday data from our intraday
data source and update up to present.
Make sure there have been no dividends or stock splits on any of your stocks
since you last updated everything as if this did happen, rather than call the
above file, you'll instead open:
DividendAndSplitUpdate.ipynb
where you'll update your intraday data dataframes for the div/splits. This will
leave you with a new pickle file called onlyupdatedintra.pickle, that contains
only the company highs/lows/closes/typicals that have been adjusted. Following
this adjustment, all of the indicators for these companies need to be reupdated,
rather than soley adding the indicator values since you last updated(which you'll
do for the other companies that don't need to be fully updated). Now open:
IndicatorCreation.ipynb
And update the companies that recieved the dividend/split to reupdate all of the
indicators for those stocks. Then open:
CalculateAndAddMissingData.ipynb
Where you'll add the missing indicator data for the companies that don't need to
be fully updated. This will just append the new data to your already created
indicator dataframes.
1. Following functions thanks to https://gist.github.com/jckantor/d100a028027c5a6b8340:
- NYSE_tradingdays()
- NYSE_holidays()
- NYSE_tradingdays2()
- NYSE_holidays2()
This set of functions is used to calculate past and future trading days, making sure
to adjust for trading holidays and such. The first set of functions, NYSE_tradingdays()
and NYSE_holidays() is the original functions as was created entirely by jckantor,
while the second set was altered by me to adjust for a specific requirement I needed,
but was based entirely off the original so thanks to jckantor for providing this set
of extremley helpful trading functions!!
2. Following package thanks to https://github.com/hongtaocai/googlefinance:
- googlefinance
This package allowed for retrieving real-time stock data from Google's finance servers
that allowed this project to calculate real-time indicators. It also allowed for
getting year's worth of news data for those companies up to present. Thanks again,
this project wouldn't exist if it weren't for them!
3. Following package thanks to https://github.com/lukaszbanasiak/yahoo-finance:
- yahoofinance
This package allowed for providing information for each company when deciding which
companies to track. Thanks to lukaszbanasiak for making this package public!
4. Following package thanks to https://github.com/quandl/quandl-python:
- quandl
Quandl provides all kinds of data that is both free and pay. They are one of the
leading trading data providers out there, so check them out if interested. For my
portion of the project, I utilize there adjusted closing prices for use if a
company has a dividend or stock split you need to adjust for. Thanks to them for
providing that useful free data!
5. Following functions thanks to Dennis Atabay at https://pyrenn.readthedocs.io/en/latest/:
- create_neural_network()
- create_weight_vector()
- convert_matrices_to_vector()
- convert_vector_to_matrices()
- get_network_output()
- RTRL()
- prepare_data()
- train_LM()
- calc_error()
- NNOut()
These functions allowed for utilizing real-time recurrent learning of neural nets.
These were out of a package called Pyrenn. There is a lot more functionality from
that package so check it out. These functions have been altered to use more
descriptive naming of variables and functions, for both learning purposes and for
anyone checking out this package who wanted a little easier to follow group of
functions.
One major change was in RTRL(), the dictionaries
deriv_layer_outputs_respect_weight_vect, and sensitivity_matrix. These
dictionaries started to cause issues when training on more than 10,000 data
records at a time. Due to the way they were set up, these functions would
continuously add key/values to there dictionaries after calculating those
particular values. The issue arose because the function only needed to keep enough
to calculate up to the largest delay. What this meant was that rather than keeping
tens of thousands of data keys/values in a dictionary these dictionaries, we could
just keep, for example, the previous 5 key/value pairs that were needed for
calculating with delays but no more. What resulted was massive performance rewards
in terms of memory consumption, and as a result of significantly less memory
consumption, computing time as well. That being said, this project wouldn't have
been possible if not for them, so thanks again!!
6. Finally, thanks to everyone at http://thebonnotgang.com/tbg/ for providing FREE
INTRADAY DATA!!!! This was probably the most amazing find of this project. Prior
to this, I was working on daily stock values, rather than intraday, so when I
found these guys, I couldn't have been happier. Please Please Please check out
there site, they provide apps, and all sorts of help. These guys were a lifesaver
so show them some love! They are the only guys I was able to find on here who
provided free intraday data, so if you're poor like me and ever need intraday
data, check them out!