Model Card for MatroidNN

Model Details

Model Description

Model type: Neural Network with Matroid-based Feature Selection (MatroidNN)

Version: 1.0

Framework: PyTorch

Last updated: February 27, 2025

Overview

MatroidNN is a neural network architecture that incorporates matroid theory for feature selection. It addresses the challenge of feature redundancy by selecting a maximally independent set of features based on matroid theory principles before training the neural network.

Model Architecture

  • Feature Selection Component: MatroidFeatureSelector using correlation-based dependency analysis
  • Neural Network: 3-layer feedforward network with batch normalization and dropout
  • Input: Varies based on the number of features selected by the matroid selector
  • Hidden Layers: Configurable hidden layer sizes (default 64 โ†’ 32)
  • Output: Multi-class classification (configurable number of classes)
  • Parameters: ~5K-10K parameters (varies based on input/output dimensions)

Uses

Direct Use

MatroidNN is designed for classification tasks where feature redundancy is a potential issue. It's particularly useful for:

  • High-dimensional datasets with correlated features
  • Feature selection in biological/medical data
  • Financial prediction with multicollinear variables
  • Any classification task where feature independence is desired

Out-of-Scope Use

This model is not intended for:

  • Regression tasks (without modification)
  • Time series prediction (without temporal adaptations)
  • Raw image or text classification (without appropriate feature extraction)

Training Data

The model was developed and tested using synthetic data with deliberate feature dependencies. For real-world applications, the model should be retrained on domain-specific data.

Training Dataset

  • Type: Synthetic data with controlled dependencies
  • Size: 1000 samples (default), configurable
  • Features: 20 initial features (default), configurable
  • Classes: 3 classes (default), configurable
  • Distribution: Equal class distribution in the synthetic data

Performance

Metrics

On synthetic test data with 3 classes:

  • Accuracy: 94.0%
  • Macro-average F1-score: 0.93
  • Per-class metrics:
    • Class 0: Precision 0.96, Recall 1.00, F1 0.98
    • Class 1: Precision 0.86, Recall 0.86, F1 0.86
    • Class 2: Precision 0.97, Recall 0.93, F1 0.95

Factors

Performance may vary based on:

  • Feature correlation structure in the dataset
  • Number of initial features and their information content
  • Class distribution balance
  • Rank threshold parameter in the MatroidFeatureSelector

Limitations

  • The matroid-based feature selection uses correlation as a proxy for independence, which may not capture all forms of dependency
  • The current implementation assumes numerical features and may require adaptation for categorical features
  • Feature selection is performed once before training and does not adapt during training
  • The rank threshold parameter requires careful tuning based on the dataset

Ethical Considerations

  • Feature selection might unintentionally exclude features that are important for fairness considerations
  • The model inherits any biases present in the training data
  • Results should be interpreted with caution in high-stakes applications, with human oversight

Technical Specifications

Hardware Requirements

  • Training: CUDA-capable GPU recommended for larger datasets
  • Inference: CPU sufficient for most applications

Software Requirements

  • Python 3.8+
  • PyTorch 1.8+
  • NumPy 1.20+
  • scikit-learn 0.24+

Training Hyperparameters

  • Batch size: 32 (default)
  • Learning rate: 0.001 (default)
  • Optimizer: Adam
  • Loss function: Cross-Entropy Loss
  • Epochs: Early stopping based on validation loss (patience=10)
  • Feature selection rank threshold: 0.7 (default, configurable)

How to Use

from matroid_nn import MatroidFeatureSelector, MatroidNN

# Initialize feature selector
selector = MatroidFeatureSelector(rank_threshold=0.7)

# Apply feature selection
X_train_selected = selector.fit_transform(X_train)
X_test_selected = selector.transform(X_test)

# Create and train model
model = MatroidNN(
    input_size=X_train_selected.shape[1],
    hidden_size=64,
    output_size=num_classes
)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support