End-to-end ML & Deep Learning system for forecasting NIFTY IT, Metal & Financial Services indices using 8 algorithms on 8,918+ trading records.
Multi-year historical OHLCV data for three NSE NIFTY sector indices, sourced from National Stock Exchange of India.
| File | Index | Records | Period |
|---|---|---|---|
NIFTY.csv | All 3 Indices | 8,918 | 2003–2021 |
NIFTY IT_data.csv | NIFTY IT | 4,355 | 2003–2021 |
NIFTY FIN SERVICE_data.csv | NIFTY Fin Svc | 2,232 | 2012–2021 |
NIFTY METAL_data.csv | NIFTY Metal | 2,331 | 2011–2021 |
| Column | Type | Description |
|---|---|---|
| Date | datetime | Trading date |
| Open | float | Opening index price |
| High | float | Intraday high |
| Low | float | Intraday low |
| Close | float | Closing price ← TARGET |
| Volume | int | Shares traded |
| Turnover | float | Notional value (INR) |
| Indices | string | Label-encoded index name |
India's IT sector index covering TCS, Infosys, Wipro, HCL. Spans 2003–2021. One of the longest time series in the dataset with strong growth trend post-2017.
Tracks metals & mining sector including JSW Steel, Tata Steel, Hindalco. Highly cyclical — correlated with global commodity prices and China demand.
Covers HDFC Bank, ICICI, Bajaj Finance, Kotak. Financial sector heavily correlated with interest rate cycles and RBI policy decisions (2012–2021).
Stock markets are inherently volatile and driven by countless interacting factors. Retail investors and fund managers struggle to identify reliable price direction signals from raw OHLCV data alone. Without a systematic, data-driven approach, investment decisions are largely reactive — made after the price has already moved.
The Indian equity market (NSE) lacks accessible, open-source ML-based prediction tools specifically designed for sector indices like NIFTY IT, Metal, and Financial Services — which have distinct cyclical behaviors and risk profiles.
This project builds a comprehensive, end-to-end stock market trend prediction system targeting NSE NIFTY sector indices. Historical OHLCV data spanning 2003–2021 is processed through a multi-model ML framework comprising SVR, Random Forest, Linear Regression, KNN, Decision Trees, ElasticNet, and LSTM deep learning.
Polynomial feature expansion and temporal decomposition enrich the feature space. All models are evaluated on held-out test sets using MSE and R², with results exposed through an interactive Flask web interface that allows real-time algorithm comparison.
8 algorithms spanning classical regression, SVMs, ensemble methods, and deep learning — all implemented in utils.py.
| Algorithm | Type | Key Hyperparams | Expected R² | Strength | Limitation |
|---|---|---|---|---|---|
| Random Forest | Ensemble | n_est=15, poly deg=2 | 0.97–0.99 | Non-linear, feature importance | Can overfit |
| LSTM | Deep Learning | look_back=1, MinMaxScaler | 0.95–0.99 | Temporal dependencies | Needs large data |
| SVR (RBF) | SVM | C=1000, γ=0.1 | 0.90–0.96 | Robust to outliers | Slow on large sets |
| SVR (Linear) | SVM | C=1000 | 0.85–0.92 | Fast, stable | Misses non-linearity |
| Linear Regression | Baseline | — | 0.85–0.93 | Interpretable | Linear only |
| ElasticNet | Regularised | L1+L2, rs=0 | 0.82–0.90 | Less overfit | Linear assumption |
| KNN | Instance | k=2 | 0.70–0.85 | No training phase | Noise sensitive |
| Decision Tree | Baseline | Unpruned | 0.95–1.0 train | Interpretable | Overfits badly |
All charts use real NIFTY data extracted from your dataset (2020–2021 period).
Mathematical and statistical foundations underpinning the feature engineering, model training, and evaluation pipeline.
Enter a date and index to get predicted Open & Close prices from all 5 algorithms, plus a comparative accuracy chart.
Translating model outputs into actionable investment and technology strategy for financial firms.
Given LSTM's lowest MSE in comparative testing, deploy it as the primary engine for 1–5 day ahead price forecasting. Use RF as a secondary validator. Only act on trades where both models agree on direction.
In regulated financial environments (SEBI compliance), explainability is mandatory. Random Forest provides SHAP-compatible feature importance — use it wherever model decisions must be audited or explained to clients.
NIFTY IT outperforms during tech booms; NIFTY Metal leads commodity cycles; Fin Services tracks rate cycles. Use the multi-index comparative model to allocate capital across sectors based on predicted momentum differentials.
Do not use predictions in isolation during high-volatility events (budget days, RBI policy announcements, global crises). Implement an ATR (Average True Range) filter — pause automated signals when ATR exceeds 2× its 20-day average.
Market regimes shift. Retrain all models monthly on a rolling 2-year window. Use walk-forward validation (not random split) to prevent data leakage and ensure test performance reflects real-world generalization.
Simulate 2 years of trades using model signals on historical data before going live. Track Sharpe ratio, max drawdown, and win rate. A model with good MSE can still lose money if the prediction timing is slightly off.
The yfinance library is already in requirements.txt. Configure it to auto-fetch latest NSE NIFTY data daily at 4pm IST (after market close) and append to the training dataset for continuous model freshness.
Pure OHLCV models miss earnings releases, RBI rate decisions, and global macro events. Augment with a simple event calendar filter — suppress or weight down model signals on known high-impact event dates.
This project is for educational and research purposes only. All predictions are based on historical data and statistical models. Past performance does not guarantee future results. Stock market investments involve risk including potential loss of principal. Always conduct your own due diligence before making any investment decisions. This system does not account for macroeconomic factors, geopolitical events, or company-specific news.
Roadmap to transform this portfolio project into a production-grade real-time trading intelligence system.
yfinance library (already in requirements.txt) to fetch live NSE data. Schedule a Python cron job (using APScheduler or Celery) to run at 4:00 PM IST daily after market close:ticker = yf.Ticker("^CNXIT"); df = ticker.history(period="1d")ta library is already in requirements.txt but unused. Add RSI(14), MACD, Bollinger Bands, ATR, OBV, and MFI as model features. These indicators encode market momentum, volatility, and volume pressure — dramatically improving prediction accuracy over OHLCV alone.Built by Pavan Kumar Nallabothula — Data Analyst specializing in ML, forecasting, and financial data systems.
Professional Data Analyst with expertise in machine learning, predictive modelling, and financial data systems. This project is part of a portfolio demonstrating end-to-end ML pipeline development — from raw data ingestion through feature engineering, multi-model training, evaluation, and web deployment.