Data for the competition was obtained from Edgar database which
included tens of daily variables and over 200
fundamental variables. Database goes back to 1998 and covers over
30,000 stocks. In addition to available parameters students implemented
12 technical idnicators such as MACD, RSI, ROC etc and created extra
database fields for each of these indicagtors for each secutiry and
date. We have restricted ourselves to NASDAQ and NYSE stock
exchanges and overall we have data covering 10,000 stocks over last ten
years.
Competition Rules
Your job is
to define the strongest possible predictors of growth of
a stock overall all the past
stock histories available in our database. In other words
your predictors would have to had the best average "return" for the
data available in the database. Of course if history is predictor
of the future, your predictors may be useful for future data as well.
But we are not making any such claims. It is strictly about the best
peformance till now. Rules you will come up with will of course
suffer from overfitting the data. The rules will be "as they are",
holding for the data available so far without any gurantees that they
will hold in the future. This is just the begining!
Predictors will be CONJUNCTIONSof conditions
expressed as (Attribute Operator Value), where Attribute is one of the
technical or fundamental variables.Operator can be “equality” or “greater than”
(>) (less than, <) comparison and value is an element of the
domain of
attribute. Attributes should not include TIME (neither as quarter nor asweek, day etc. Attributes should also not
include names o securities or their derivatives (i.e. all stocks
beginning with
“AK”).
There are 4 categories of competition
Sprint
Predict stock prices within exactly 5
trading days. Your predictor should trigger at least 1000 times for all possible pairs of
security x date (around 10 millon such data points in our
database). This means that you predictor will trigger on average
twice a week (data covers around 500 weeks).
Uphill
Sprint
Predict stock prices within exactly 5
trading days. Your predictor
should trigger at least 100000 times
for all possible pairs of security x
date (around 10 millon such data points in our database). This
means
that you predictor will trigger on average 40 times a day (data
covers
around 2000 trading days).
Long
Distance
Predict stock prices within exactly 20
trading days. Your predictor
should trigger at least 1000 times
for all possible pairs of security x
date.
Uphill
Long Distance
Predict stock prices within exactly 20
trading days. Your predictor
should trigger at least 100000 times
for all possible pairs of security x
date (around 10 millon such data points in our
database).
The list of additional technical indicators is here Indicator
List. You can use any machine learning methods from weka or
otheriwse. You may also use a hybrid method and tinker with the
predictions manually using our database interface since every predictor
is a simple sql query against the edgra competition database.
RESULTS
Summary: we had 12 students enter 4 predictors each and we had live
competitoin in class where each of the predctors was run in real time
against the database. It felt a bit like a
Below we list medals, names, average returns (5day and 20 day depending
on the competition) plus support - the number of pairs (security, date)
when the predictor triggered.
Sprint
Gold: Bobby
16.5884251135487, 1004 SilverZhiyuan
16.4994591534424, 1040 Bronze: Michael
16.1467334367557, 1006
SELECT
count(*),avg(ret20d)
FROM edgar.competition_20d p WHEREStochK<23.2932 and RSI>16.6412
and RSI<35.7178 and WilliamsR<-88.961 and CCI>-297.677
and CCI<-108.664 and StochRSI<0.4028 and StdDev>0.19351
and BgrBands_Lower>0.55275 and
BgrBands_Lower<1.3425 and
BgrBands_Upper>1.8106 and MACD<-0.31276 and EMA_MACD<-0.28368 and PVO>-13.2484
and PVO<40.218 and ROC <-42.7578
and PPO<-18.2597 and PPO_EMA>-48.9628 and PPO_EMA<-15.7338
and PPO_HIST>-2.8042 and PPO_HIST<8.7226
Given the past, which rules performed the best? Financial writers
often make statements about bullish or bearish interpretations of
various indicators such as MACD, Bollinger Bands etc. These statments
start with the word "usually". Given the data we have we can
substanitate/validate or refute such statements. For example is MACD
> 0 a bullish indicator? we can look at the past data and see of a
security which satisfied MACD >0 had positive or negative return
later. In other words we can substantiate claims
We will monitor rules generated in class against the incoming data. We
will also address the overfitting concerns using newly proposed notions
of rule sensitivity and limited crossvalidation which we
believe is appropriate for the temporal data feed. Stay tuned.
.