YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Smart Money Concepts (SMC) Trading Signal Generator for XAUUSD

This project processes hourly XAUUSD (Gold) price data to generate trading signals based on Smart Money Concepts (SMC) patterns, including Order Blocks (OB), Fair Value Gaps (FVG), and Change of Character (CHOCH). It uses a Long Short-Term Memory (LSTM) neural network to predict trading signals (Buy=1, Sell=0, Hold=2) and includes take-profit (TP) and stop-loss (SL) calculations for trading strategies.

Features

  • Data Preprocessing: Processes a large dataset (~150,000 rows) of hourly XAUUSD OHLCV data.
  • SMC Signal Generation: Identifies SMC patterns using vectorized operations for efficiency.
  • Feature Engineering: Extracts SMC-derived features like higher highs (hh), lower lows (ll), trend, body size, and volume ratio.
  • Sequence Creation: Generates sequences (window size=60) for time-series modeling.
  • LSTM Model: Trains an LSTM model with class-weighted loss to handle imbalanced classes (many Hold signals).
  • Evaluation: Provides a classification report with precision, recall, and F1-score for Sell, Buy, and Hold classes.
  • Optimization: Handles large datasets with chunked sequence creation, memory-efficient data types, and early stopping.

Requirements

  • Python 3.8+
  • Libraries:
    pandas==2.0.3
    numpy==1.24.3
    scikit-learn==1.2.2
    datasets==2.14.4
    torch==2.0.1
    
  • Hardware:
    • CPU (multi-core recommended for parallel processing)
    • GPU (optional, for faster training with CUDA)
    • At least 16 GB RAM for handling ~150,000 rows
  • Input Data:
    • XAUUSD_H1.csv: Hourly OHLCV data with columns Time, Open, High, Low, Close, Volume (tab-separated).

Installation

  1. Clone the repository:
    git clone <repository-url>
    cd <repository-directory>
    
  2. Install dependencies:
    pip install -r requirements.txt
    
  3. Ensure the dataset (XAUUSD_H1.csv) is placed in the project directory.

Usage

  1. Preprocess Data: Run the preprocessing script to generate SMC signals and create a Hugging Face Dataset:

    python preprocess_smc_optimized.py
    
    • Input: XAUUSD_H1.csv (~150,000 rows)
    • Output: /xauusd_smc_dataset (saved dataset with sequences and labels)
    • Features: Open, High, Low, Close, Volume, hh, ll, trend, body_size, volume_ratio
    • Labels: Buy=1, Sell=0, Hold=2
    • Sequence length: 60 hours
    • Train-test split: 80/20
  2. Train and Evaluate Model: Run the training script to train an LSTM model and generate a classification report:

    python train_evaluate_lstm_smc.py
    
    • Loads /xauusd_smc_dataset
    • Trains an LSTM model with class-weighted loss and early stopping
    • Outputs training logs (train loss, test loss, test accuracy) and a classification report
    • Saves the best model to /best_lstm_model.pth
  3. Example Output:

    Epoch 1/100, Train Loss: 0.571448, Test Loss: 0.568098, Test Accuracy: 0.9834
    ...
    Epoch 100/100, Train Loss: 0.232735, Test Loss: 0.235532, Test Accuracy: 0.9650
    Classification Report:
                  precision    recall  f1-score   support
    Sell          0.XX      0.XX      0.XX      XXXX
    Buy           0.XX      0.XX      0.XX      XXXX
    Hold          0.XX      0.XX      0.XX     XXXXX
    accuracy                           0.XX     29988
    macro avg      0.XX      0.XX      0.XX     29988
    weighted avg   0.XX      0.XX      0.XX     29988
    

Project Structure

  • preprocess_smc_optimized.py: Generates SMC signals, creates sequences, and saves the dataset.
  • train_evaluate_lstm_smc.py: Trains an LSTM model and evaluates performance with a classification report.
  • XAUUSD_H1.csv: Input dataset (not included; provide your own).
  • /xauusd_smc_dataset: Saved Hugging Face dataset.
  • /best_lstm_model.pth: Saved best model weights.

Performance Optimizations

  • Vectorized SMC Calculations: Uses pandas boolean indexing instead of loops for signal generation.
  • Chunked Sequence Creation: Processes sequences in chunks (10,000 rows) to manage memory for ~150,000 rows.
  • Class Imbalance Handling: Applies class weights ([1.0, 1.0, 0.1]) to prioritize Buy/Sell signals over Hold.
  • Early Stopping: Stops training if test loss doesn’t improve for 10 epochs.
  • Memory Efficiency: Uses float32 for features, int64 for labels, and saves dataset to disk.

Future Improvements

  • Threshold Tuning: Adjust SMC thresholds (e.g., SL to 0.7%) for XAUUSD volatility.
  • Additional Features: Add OB/FVG/CHOCH flags as features.
  • Alternative Models: Experiment with Transformer models for better sequence modeling.
  • Parallel Processing: Use dask for larger datasets.

License

MIT License

Contact

For issues or contributions, please open a issue or contact semsankken.

Generated on September 8, 2025, 16:15 +07

Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support