Logo Contact  

STEP 4: Data


In Step 4, four different types of data are provided for the research and development process: Random-Walk Data, Simulated Quote Data, Curated Historical Data, and Real Time Trading Data.


The market changes

Testing the algorithm with various sets of data gives an indication of how profitable the trading robot will be in different market conditions. If the trading robot can profit using different market conditions, it is said to be “robust”. By testing the strategy over various types of data, it can help prove that the strategy is robust. Even as the market changes, the trading robot will still be able to profit. It also stress tests the strategy to prove that the developer programmed the strategy correctly during Step 3.

Prices may be unpredictable

Humans have an innate ability to recognize patterns - even where they don't exist. While this has been a pivotal skill to help us evolve and progress as a species, there are times when relying perceived patterns can result in a negative outcome. Many traders look for patterns in historical prices and trade solely on the faith that the patterns will repeat themselves in the future.

In 1973, Burton Malkiel (who was economics professor at Princeton University) wrote a book about random-walk theory called “A Random Walk Down Wall Street”. Random walk theory is the concept that stock prices evolve according to a random walk and are therefore unpredictable. In short, it is the basic idea that no one can predict the future prices of stocks.

Here are several studies show that stock price movement is unpredictable. The reason prices can be unpredictable is because of the law of supply and demand. As human and robotic traders buy a stock, the price increases. As they sell, the price decreases. Each trader buys or sells for different reasons.

Here are common events that affect stock prices:

  1. An unrelated large cap stock has an earnings report. An investor makes a large investment in the related sector ETF.
  2. Other traders recognized the same price pattern and place the trades before you.
  3. A high speed computer sees your trade and executes orders before you.
  4. High speed computers and traders place orders and cancel them to manipulate the price.
  5. Unexpected negative news (like an accident or scandal) is released about the company.
  6. Another unexpected event happens on the following day and more news is released.
  7. Paid advertisers post in online forums to manipulate the price for their employers.
  8. High frequency trading systems and and scalp traders buy and sell in under one minute.
  9. Day traders buy and sell during the day and close their positions at the end of the day.
  10. Swing traders hold their positions for as long as it takes to reach an exit price.
  11. Position traders buy and sell at random times from weeks to months.
  12. The price moves while trading is halted due to the price moving too quickly.
  13. News about the stock is released at random and unpredictable times.
  14. Regulations prevent traders from making enough trades to follow their strategy closely.
  15. The federal reserve chairman says something that upsets investors.
  16. An ETF or mutual fund that includes the stock is created or dissolved.
  17. A commentator or analyst changes their sentiment about the stock.

All of these events can happen simultaneously causing the price to fluctuate in an unpredictable fashion.


Random Walk Data

StepsTeam generates pseudorandom data and tests the trading strategy. A starting price is set for each symbol at the beginning of the simulation. As the simulation progresses through time, a virtual coin is flipped. If the virtual coin lands on heads, the price is increased. If the virtual coin lands on tails, the price is decreased. Also, a virtual die is rolled. Whatever number comes up (between 1 and 6) is how many cents the price is moved. Finally, the direction and amount of movement is applied to the previous price. This process is repeated over and over for each individual stock in the portfolio. It guarantees that the stock prices move in a very unpredictable way. Studies show that while prices move randomly, there is a subtle upward drift in the price over time. The reason stock prices increase is because employees at the company work hard to improve the value of the company stock. Another reason is that dividends and profits are often reinvested. As a conservative measure, the pseudorandom data used does not include the upward drift. Trading commissions, SEC fees, and slippage are factored in.

Simulated Quote Data

The simulated data has fat tails.

Curated Historical Data

Bad data leads to bad decisions. A common problem in the trading community is the use of bad data during the statistical optimization testing. The bad data leads to inaccurate trading simulation test results. StepsTeam uses professionally curated data that is acquired from a third party company. The dataset has undergone numerous rigorous checks and data quality is monitored daily by a dedicated data quality assessment team. The data is also reviewed by many professionals and is research ready.

Survivorship Bias

Some data sources fail to include stock and ETF symbols that no longer exist. When running simulations it is important to include symbols that no longer exist or the test results will be skewed to the upside. For example, if a test portfolio may be built to include all stocks from $0.50 to $0.60. After the test is run the results will look very positive because it fails to account for the fact that the companies that went to $0 are not included. Without the complete data set, the algorithm would be be blind to the possibility of bankruptcy and the resulting losses. An IndexUniverse data analysis reveals that 22 percent of all ETFs currently trading in the U.S. are at high risk of closure. ETFs that have been delisted are included as well.

Adjusted for Splits

When a stock split occurs, the data is adjusted to reflect this so that the data is more or less continuous. If the stock price is 20 and then a 2:1 split occurs, bringing the price to 10, all data before the split date is divided by 2. The split factor data column is then multiplied by 2. The data thus reflects how much money an investor would expect to make by investing at the beginning of a period and selling at the end of the period without having to take split discontinuities into account.

Adjusted for dividends

When a dividend occurs, the data is similarly adjusted in the past to be more or less continuous. Dividends are paid to all shareholders on record at the close of regular markets on the day before the ex-dividend day (The day before the ex-dividend date is also referred to as the Record Date in some literature). Thus, all shares purchased on the ex-dividend day are not entitled to the dividend payout. Because of this, share prices will typically drop by the amount of the dividend on the ex-dividend date. However, this does not represent a loss of equity because the owner of the stock during this period would receive a dividend payment to compensate for this drop. In effect, the shareholder sees no profit or loss due to dividends. For this reason, a dividend adjustment is applied to keep prices continuous when a dividend is paid.

Suspicious Quote Detection

A proprietary feed handler employs sophisticated algorithms to flag quotes as suspicious. In general, a suspicious quote is one that deviates substantially from prevailing market prices and could indicate a reporting error by an exchange or some other trading issue. Because quotes flagged as suspicious are actual quotes reported by the exchanges, the algorithm does not remove them from the database but simply flags them. Moreover, the trading robot itself detects suspicious quotes in real time. It is possible for the broker’s computer systems to send an incorrect quote. For example, rather than sending $8.00, the quote may temporarily be sent as $80.00. The trading robot calculates the percentage move each time quotes are retrieved and will temporarily halt trading until the quote can be automatically confirmed.

Symbol Change Tracking

The data has automated symbol change tracking. Thus, if a stock symbol changes for any reason, the data is automatically merged with the old data. Market events are tracked through collaboration with over a dozen trading firms and hedge funds. These special events include instances when trading of a particular symbol is halted (e.g. special announcements, SEC investigations, NASDAQ circuit breakers), and mergers and acquisitions activity.

Quotes are Sequenced

Sometimes trades are reported late by exchanges. These out-of-sequence trades can be far from the prevailing market prices at the time they are reported. Because of this, the data is filtered to remove these trades to avoid skewing the low and high of each bar. The minute resolution data files are available for all NASDAQ and NYSE listed stocks starting from January 2002 to the present.

Check Realism

The data that StepsTeam uses considers whether or not the price could be executed at the historical time for each trade.

Predictable Price Patterns

Professors Andrew W. Lo and Archie Craig MacKinlay, professors of Finance at the MIT Sloan School of Management and the University of Pennsylvania, respectively, have also presented evidence that they believe shows the random walk hypothesis to be wrong. Their book A Non-Random Walk Down Wall Street, presents a number of tests and studies that reportedly support the view that there are trends in the stock market and that the stock market is somewhat predictable.

Real Time Trading Data

As part of the testing process, StepsTeam will paper trade your robot. The robot will use the level 2 data from your broker. The buying and selling of trades is simulated without using actual money (so that there is no risk of loss). It is important to remember that while the robot may do well with all four types of data, the performance during tests does not guarantee future profits. Nonetheless, if a strategy never works in simulations, it is very unlikely to work in reality. Therefore, it is still critical to perform rigorous testing before risking real money.


The client will receive a report on how the simulated data was generated and how the historical data is curated.