PROJECTS

Walk-Forward Machine Learning Model - Market Data Analysis

PROJECT SUMMARY

This project builds a machine learning model to estimate the probability of short-term price movements in cryptocurrency markets.

The model focuses on identifying high-probability trading signals rather than predicting exact prices. Walk-forward validation is used to simulate real trading conditions and avoid look-ahead bias, ensuring that performance reflects realistic deployment scenarios.

HOW TO RUN

The project can be executed locally to generate predictions for the next trading period.
1. Clone the repository
2. Install dependencies
3. Run the main script
The script is pre-configured and does not require modification to produce output.

View full instructions and code:

https://github.com/Cadez123/Crypto_walk_forward_analysis_with_random_forest.git

PROBLEM

Financial markets are noisy, non-stationary, and difficult to model. Traditional backtesting methods often produce overly optimistic results due to data leakage and unrealistic assumptions.

This project investigates whether a machine learning model can provide a measurable statistical edge when evaluated under realistic, time-dependent conditions.

APPROACH

Historical OHLCV data collected using the Binance API, enabling access to high-frequency and multi-timeframe market data
Feature engineering using technical indicators (EMA, RSI, MACD, ATR, Bollinger Bands, ADX)
Target variable defined as whether the next day’s high exceeds the current close by a fixed threshold
Random Forest classifier used to estimate probabilities of the target event
Walk-forward validation applied to simulate real-world model usage

Walk-forward validation ensures that the model is always trained on past data and tested on unseen future data, avoiding overfitting and better reflecting real-world performance

RESULTS

Model performance was evaluated using lift, comparing the accuracy of high-confidence predictions against the baseline probability.

Across multiple months, the model consistently achieved lift values above 1.0, indicating an improvement over random selection. For example, lift values for the 1.02 target ranged approximately between ~1.12 and ~1.38 in stronger periods, while remaining close to or above 1.20 in several months.

On average, the model achieved:

~1.08 lift for 1.01 target
~1.21 lift for 1.015 target
~1.29 lift for 1.02 target

This shows that higher target thresholds lead to stronger relative performance, despite lower base probabilities.

When evaluated across different crypto assets, the model maintained consistent lift above 1.0 for the majority of assets, with some assets reaching significantly higher values. This suggests that the model is not dependent on a single asset, but captures broader market patterns.

Overall, results demonstrate that the model is able to identify higher-quality trading opportunities across both time (monthly) and cross-sectional dimensions (assets).

TECH STACK

Python
pandas, numpy
scikit-learn
Binance API
REST API integration
ta (technical analysis library)

KEY TAKEAWAY

This project demonstrates the ability to design, evaluate, and validate machine learning models in time-dependent environments, with a strong focus on realistic performance and practical applicability.

LIMITATIONS & FUTURE WORK

Model does not account for transaction costs or slippage
Performance may vary across different market regimes
Future improvements could include additional features, hyperparameter tuning, and deployment as an API