Model Card for Model ID

NOTE: This is NOT our final model. This is one of the secondary models that we explored in developing our final model. The final model is in the GBTrees Repository on HuggingFace.

Model Details

This model classifies news headlines as either NBC or Fox News.

Model Description

  • Developed by: Jack Bader, Kaiyuan Wang, Pairan Xu
  • Taks: Binary classification (NBC News vs. Fox News)
  • Preprocessing: TF-IDF vectorization applied to the text data
  • stop_words = "english"
  • max_features = 1000
  • Model type: Random Forest
  • Freamwork: Scikit-learn

Metrics

  • Accuracy Score

Model Evaluation

import pandas as pd
import joblib
from huggingface_hub import hf_hub_download
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import classification_report

# Mount to drive
from google.colab import drive
drive.mount('/content/drive')

# Load test set
test_df = pd.read_csv("/content/drive/MyDrive/test_data_random_subset.csv", encoding="Windows-1252")

# Log in w/ huggingface token
# Token can be found in repo as Token.docx
!huggingface-cli login

# Download the model
model = hf_hub_download(repo_id = "CIS5190FinalProj/RandomForest", filename = "best_rf_model.pkl")

# Download the vectorizer
tfidf_vectorizer = hf_hub_download(repo_id = "CIS5190FinalProj/RandomForest", filename = "tfidf_vectorizer.pkl")

# Load the model
pipeline = joblib.load(model)

# Load the vectorizer
tfidf_vectorizer = joblib.load(tfidf_vectorizer)

# Extract the headlines from the test set
X_test = test_df['title']

# Apply transformation to the headlines into numerical features
X_test_transformed = tfidf_vectorizer.transform(X_test)

# Make predictions using the pipeline
y_pred = pipeline.predict(X_test_transformed)

# Extract 'labels' as target
y_test = test_df['label']

# Print classification report
print(classification_report(y_test, y_pred))
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.