Model description
This model is an attempt to solve the 2025 FrugalAI challenge. Nice.
Intended uses & limitations
Better than random label assignment, still room for improvement.
Training Procedure
Trained with a lot of care
Hyperparameters
Click to expand
Hyperparameter | Value |
---|---|
memory | |
steps | [('lemmatizer', FunctionTransformer(func=<function lemmatize_X at 0x7f2c3cd63ca0>)), ('tfidf', TfidfVectorizer(max_df=0.95, min_df=2, stop_words=['if', 'when', 'most', 'ourselves', 'your', 'having', "didn't", '@', "you've", 'hasn', 'at', "mightn't", "mustn't", 'these', "it's", 'our', 'had', 'll', 'too', 'this', 'by', 'it', 'further', 'wasn', 'before', 'all', '{', 'herself', 'other', 'above', ...], tokenizer=<function tokenize_quote at 0x7f2c3cdaea60>)), ('rf', RandomForestClassifier())] |
transform_input | |
verbose | False |
lemmatizer | FunctionTransformer(func=<function lemmatize_X at 0x7f2c3cd63ca0>) |
tfidf | TfidfVectorizer(max_df=0.95, min_df=2, stop_words=['if', 'when', 'most', 'ourselves', 'your', 'having', "didn't", '@', "you've", 'hasn', 'at', "mightn't", "mustn't", 'these', "it's", 'our', 'had', 'll', 'too', 'this', 'by', 'it', 'further', 'wasn', 'before', 'all', '{', 'herself', 'other', 'above', ...], tokenizer=<function tokenize_quote at 0x7f2c3cdaea60>) |
rf | RandomForestClassifier() |
lemmatizer__accept_sparse | False |
lemmatizer__check_inverse | True |
lemmatizer__feature_names_out | |
lemmatizer__func | <function lemmatize_X at 0x7f2c3cd63ca0> |
lemmatizer__inv_kw_args | |
lemmatizer__inverse_func | |
lemmatizer__kw_args | |
lemmatizer__validate | False |
tfidf__analyzer | word |
tfidf__binary | False |
tfidf__decode_error | strict |
tfidf__dtype | <class 'numpy.float64'> |
tfidf__encoding | utf-8 |
tfidf__input | content |
tfidf__lowercase | True |
tfidf__max_df | 0.95 |
tfidf__max_features | |
tfidf__min_df | 2 |
tfidf__ngram_range | (1, 1) |
tfidf__norm | l2 |
tfidf__preprocessor | |
tfidf__smooth_idf | True |
tfidf__stop_words | ['if', 'when', 'most', 'ourselves', 'your', 'having', "didn't", '@', "you've", 'hasn', 'at', "mightn't", "mustn't", 'these', "it's", 'our', 'had', 'll', 'too', 'this', 'by', 'it', 'further', 'wasn', 'before', 'all', '{', 'herself', 'other', 'above', 'needn', 'than', 'i', 'not', 'was', 'few', 'both', 'd', 'now', 'has', ')', '&', '`', 'who', 'whom', '"', 'through', 'me', 'myself', '>', 'and', "'", 'which', 've', 'were', 'aren', 'doesn', 'that', ' |
tfidf__strip_accents | |
tfidf__sublinear_tf | False |
tfidf__token_pattern | (?u)\b\w\w+\b |
tfidf__tokenizer | <function tokenize_quote at 0x7f2c3cdaea60> |
tfidf__use_idf | True |
tfidf__vocabulary | |
rf__bootstrap | True |
rf__ccp_alpha | 0.0 |
rf__class_weight | |
rf__criterion | gini |
rf__max_depth | |
rf__max_features | sqrt |
rf__max_leaf_nodes | |
rf__max_samples | |
rf__min_impurity_decrease | 0.0 |
rf__min_samples_leaf | 1 |
rf__min_samples_split | 2 |
rf__min_weight_fraction_leaf | 0.0 |
rf__monotonic_cst | |
rf__n_estimators | 100 |
rf__n_jobs | |
rf__oob_score | False |
rf__random_state | |
rf__verbose | 0 |
rf__warm_start | False |
Model Plot
Pipeline(steps=[('lemmatizer',FunctionTransformer(func=<function lemmatize_X at 0x7f2c3cd63ca0>)),('tfidf',TfidfVectorizer(max_df=0.95, min_df=2,stop_words=['if', 'when', 'most', 'ourselves','your', 'having', "didn't", '@',"you've", 'hasn', 'at', "mightn't","mustn't", 'these', "it's", 'our','had', 'll', 'too', 'this', 'by','it', 'further', 'wasn', 'before','all', '{', 'herself', 'other','above', ...],tokenizer=<function tokenize_quote at 0x7f2c3cdaea60>)),('rf', RandomForestClassifier())])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('lemmatizer',FunctionTransformer(func=<function lemmatize_X at 0x7f2c3cd63ca0>)),('tfidf',TfidfVectorizer(max_df=0.95, min_df=2,stop_words=['if', 'when', 'most', 'ourselves','your', 'having', "didn't", '@',"you've", 'hasn', 'at', "mightn't","mustn't", 'these', "it's", 'our','had', 'll', 'too', 'this', 'by','it', 'further', 'wasn', 'before','all', '{', 'herself', 'other','above', ...],tokenizer=<function tokenize_quote at 0x7f2c3cdaea60>)),('rf', RandomForestClassifier())])
FunctionTransformer(func=<function lemmatize_X at 0x7f2c3cd63ca0>)
TfidfVectorizer(max_df=0.95, min_df=2,stop_words=['if', 'when', 'most', 'ourselves', 'your', 'having',"didn't", '@', "you've", 'hasn', 'at', "mightn't","mustn't", 'these', "it's", 'our', 'had', 'll','too', 'this', 'by', 'it', 'further', 'wasn','before', 'all', '{', 'herself', 'other', 'above', ...],tokenizer=<function tokenize_quote at 0x7f2c3cdaea60>)
RandomForestClassifier()
Evaluation Results
Metric | Value |
---|---|
accuracy | 0.5873666940114848 |
f1_score | 0.5666496543166571 |
super_config | this works! even with arguments 2 |
How to Get Started with the Model
[More Information Needed]
Model Card Authors
This model card is written by following authors:
[More Information Needed]
Model Card Contact
You can contact the model card authors through following channels: [More Information Needed]
Citation
Below you can find information related to citation.
BibTeX:
[More Information Needed]
A lot of info
Does this work?
Confusion Matrix
- Downloads last month
- 0
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.