GGU-CLF / README.md

philipp-zettl

Upload folder using huggingface_hub

9f090d3 verified about 2 months ago

preview code

raw

history blame

No virus

21.4 kB

	---
	language:
	- multilingual
	- de
	- en
	license: mit
	library_name: sklearn
	tags:
	- sklearn
	- skops
	- text-classification
	- english
	- german
	datasets:
	- philipp-zettl/GGU-xx
	model_format: pickle
	model_file: GGU-CLF.pkl
	get_started_code: "```python\nimport pickle\nwith open(pkl_filename, 'rb') as file:\n\
	\ clf = pickle.load(file)\n```"
	model_card_authors: https://huggingface.co/philipp-zettl
	limitations: This model is ready to be used in production.
	model_description: GGU (Greeting/Gratitude/Unknown) classifier for natural language
	chat messages.
	model_id: GGU-CLF
	funded_by: https://huggingface.co/easybits
	repo: https://huggingface.co/philipp-zettl/GGU-CLF
	widget:
	- example_title: 'Greeting (English #1)'
	text: Hey there
	- example_title: 'Greeting (English #2)'
	text: Good to see you
	- example_title: Greeting (German)
	text: Guten Morgen
	- example_title: 'Gratitude (English #1)'
	text: Thank you
	- example_title: 'Gratitude (English #2)'
	text: Cheers mate
	---

	# Model description

	This is a Multinomial Naive Bayes model trained on a custom dataset.
	Count vectorizer is used for vectorization.
	It is used to classify user text into the classes:
	- 0: Greeting
	- 1: Gratitude
	- 2: Unknown

	## Intended uses & limitations

	### Direct use

	Use this model to classify messages from natural laguage chats.

	### Out Of Scope Usage

	The model was not trained on multi-sentence samples. You should avoid those. Officially tested and supported languages are english, german any other language is considered out of scope.

	## Training Procedure


	This model was trained using the [philipp-zettl/GGU-xx](https://huggingface.co/datasets/philipp-zettl/GGU-xx) dataset.

	You can find it's performance metrics under [Evaluation Results](#evaluation-results).


	### Hyperparameters

	<details>
	<summary> Click to expand </summary>

	\| Hyperparameter \| Value \|
	\|---------------------\|---------------------------------------------------------------------------------------------------------------------------\|
	\| memory \| \|
	\| steps \| [('vect', TfidfVectorizer(analyzer='char_wb', lowercase=False, ngram_range=(1, 3))), ('clf', MultinomialNB(alpha=0.112))] \|
	\| verbose \| False \|
	\| vect \| TfidfVectorizer(analyzer='char_wb', lowercase=False, ngram_range=(1, 3)) \|
	\| clf \| MultinomialNB(alpha=0.112) \|
	\| vect__analyzer \| char_wb \|
	\| vect__binary \| False \|
	\| vect__decode_error \| strict \|
	\| vect__dtype \| <class 'numpy.float64'> \|
	\| vect__encoding \| utf-8 \|
	\| vect__input \| content \|
	\| vect__lowercase \| False \|
	\| vect__max_df \| 1.0 \|
	\| vect__max_features \| \|
	\| vect__min_df \| 1 \|
	\| vect__ngram_range \| (1, 3) \|
	\| vect__norm \| l2 \|
	\| vect__preprocessor \| \|
	\| vect__smooth_idf \| True \|
	\| vect__stop_words \| \|
	\| vect__strip_accents \| \|
	\| vect__sublinear_tf \| False \|
	\| vect__token_pattern \| (?u)\b\w\w+\b \|
	\| vect__tokenizer \| \|
	\| vect__use_idf \| True \|
	\| vect__vocabulary \| \|
	\| clf__alpha \| 0.112 \|
	\| clf__class_prior \| \|
	\| clf__fit_prior \| True \|
	\| clf__force_alpha \| True \|

	</details>

	### Model Plot

	<style>#sk-container-id-2 {/* Definition of color scheme common for light and dark mode /--sklearn-color-text: black;--sklearn-color-line: gray;/ Definition of color scheme for unfitted estimators /--sklearn-color-unfitted-level-0: #fff5e6;--sklearn-color-unfitted-level-1: #f6e4d2;--sklearn-color-unfitted-level-2: #ffe0b3;--sklearn-color-unfitted-level-3: chocolate;/ Definition of color scheme for fitted estimators /--sklearn-color-fitted-level-0: #f0f8ff;--sklearn-color-fitted-level-1: #d4ebff;--sklearn-color-fitted-level-2: #b3dbfd;--sklearn-color-fitted-level-3: cornflowerblue;/ Specific color for light theme /--sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));--sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, white)));--sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));--sklearn-color-icon: #696969;@media (prefers-color-scheme: dark) {/ Redefinition of color scheme for dark theme */--sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));--sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, #111)));--sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));--sklearn-color-icon: #878787;}
	}#sk-container-id-2 {color: var(--sklearn-color-text);
	}#sk-container-id-2 pre {padding: 0;
	}#sk-container-id-2 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;
	}#sk-container-id-2 div.sk-dashed-wrapped {border: 1px dashed var(--sklearn-color-line);margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: var(--sklearn-color-background);
	}#sk-container-id-2 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }`but bootstrap.min.css set `[hidden] { display: none !important; }`so we also need the `!important` here to be able to override thedefault hidden behavior on the sphinx rendered scikit-learn.org.See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;
	}#sk-container-id-2 div.sk-text-repr-fallback {display: none;
	}div.sk-parallel-item,
	div.sk-serial,
	div.sk-item {/* draw centered vertical line to link estimators */background-image: linear-gradient(var(--sklearn-color-text-on-default-background), var(--sklearn-color-text-on-default-background));background-size: 2px 100%;background-repeat: no-repeat;background-position: center center;
	}/* Parallel-specific style estimator block */#sk-container-id-2 div.sk-parallel-item::after {content: "";width: 100%;border-bottom: 2px solid var(--sklearn-color-text-on-default-background);flex-grow: 1;
	}#sk-container-id-2 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: var(--sklearn-color-background);position: relative;
	}#sk-container-id-2 div.sk-parallel-item {display: flex;flex-direction: column;
	}#sk-container-id-2 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;
	}#sk-container-id-2 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;
	}#sk-container-id-2 div.sk-parallel-item:only-child::after {width: 0;
	}/* Serial-specific style estimator block */#sk-container-id-2 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: var(--sklearn-color-background);padding-right: 1em;padding-left: 1em;
	}/* Toggleable style: style used for estimator/Pipeline/ColumnTransformer box that is
	clickable and can be expanded/collapsed.
	- Pipeline and ColumnTransformer use this feature and define the default style
	- Estimators will overwrite some part of the style using the `sk-estimator` class
	// Pipeline and ColumnTransformer style (default) /#sk-container-id-2 div.sk-toggleable {/ Default theme specific background. It is overwritten whether we have aspecific estimator or a Pipeline/ColumnTransformer */background-color: var(--sklearn-color-background);
	}/* Toggleable label */
	#sk-container-id-2 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.5em;box-sizing: border-box;text-align: center;
	}#sk-container-id-2 label.sk-toggleable__label-arrow:before {/* Arrow on the left of the label */content: "▸";float: left;margin-right: 0.25em;color: var(--sklearn-color-icon);
	}#sk-container-id-2 label.sk-toggleable__label-arrow:hover:before {color: var(--sklearn-color-text);
	}/* Toggleable content - dropdown /#sk-container-id-2 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;/ unfitted */background-color: var(--sklearn-color-unfitted-level-0);
	}#sk-container-id-2 div.sk-toggleable__content.fitted {/* fitted */background-color: var(--sklearn-color-fitted-level-0);
	}#sk-container-id-2 div.sk-toggleable__content pre {margin: 0.2em;border-radius: 0.25em;color: var(--sklearn-color-text);/* unfitted */background-color: var(--sklearn-color-unfitted-level-0);
	}#sk-container-id-2 div.sk-toggleable__content.fitted pre {/* unfitted */background-color: var(--sklearn-color-fitted-level-0);
	}#sk-container-id-2 input.sk-toggleable__control:checked~div.sk-toggleable__content {/* Expand drop-down */max-height: 200px;max-width: 100%;overflow: auto;
	}#sk-container-id-2 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: "▾";
	}/* Pipeline/ColumnTransformer-specific style */#sk-container-id-2 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {color: var(--sklearn-color-text);background-color: var(--sklearn-color-unfitted-level-2);
	}#sk-container-id-2 div.sk-label.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: var(--sklearn-color-fitted-level-2);
	}/* Estimator-specific style // Colorize estimator box */
	#sk-container-id-2 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {/* unfitted */background-color: var(--sklearn-color-unfitted-level-2);
	}#sk-container-id-2 div.sk-estimator.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {/* fitted */background-color: var(--sklearn-color-fitted-level-2);
	}#sk-container-id-2 div.sk-label label.sk-toggleable__label,
	#sk-container-id-2 div.sk-label label {/* The background is the default theme color */color: var(--sklearn-color-text-on-default-background);
	}/* On hover, darken the color of the background */
	#sk-container-id-2 div.sk-label:hover label.sk-toggleable__label {color: var(--sklearn-color-text);background-color: var(--sklearn-color-unfitted-level-2);
	}/* Label box, darken color on hover, fitted */
	#sk-container-id-2 div.sk-label.fitted:hover label.sk-toggleable__label.fitted {color: var(--sklearn-color-text);background-color: var(--sklearn-color-fitted-level-2);
	}/* Estimator label */#sk-container-id-2 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;
	}#sk-container-id-2 div.sk-label-container {text-align: center;
	}/* Estimator-specific */
	#sk-container-id-2 div.sk-estimator {font-family: monospace;border: 1px dotted var(--sklearn-color-border-box);border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;/* unfitted */background-color: var(--sklearn-color-unfitted-level-0);
	}#sk-container-id-2 div.sk-estimator.fitted {/* fitted */background-color: var(--sklearn-color-fitted-level-0);
	}/* on hover */
	#sk-container-id-2 div.sk-estimator:hover {/* unfitted */background-color: var(--sklearn-color-unfitted-level-2);
	}#sk-container-id-2 div.sk-estimator.fitted:hover {/* fitted */background-color: var(--sklearn-color-fitted-level-2);
	}/* Specification for estimator info (e.g. "i" and "?") // Common style for "i" and "?" */.sk-estimator-doc-link,
	a:link.sk-estimator-doc-link,
	a:visited.sk-estimator-doc-link {float: right;font-size: smaller;line-height: 1em;font-family: monospace;background-color: var(--sklearn-color-background);border-radius: 1em;height: 1em;width: 1em;text-decoration: none !important;margin-left: 1ex;/* unfitted */border: var(--sklearn-color-unfitted-level-1) 1pt solid;color: var(--sklearn-color-unfitted-level-1);
	}.sk-estimator-doc-link.fitted,
	a:link.sk-estimator-doc-link.fitted,
	a:visited.sk-estimator-doc-link.fitted {/* fitted */border: var(--sklearn-color-fitted-level-1) 1pt solid;color: var(--sklearn-color-fitted-level-1);
	}/* On hover */
	div.sk-estimator:hover .sk-estimator-doc-link:hover,
	.sk-estimator-doc-link:hover,
	div.sk-label-container:hover .sk-estimator-doc-link:hover,
	.sk-estimator-doc-link:hover {/* unfitted */background-color: var(--sklearn-color-unfitted-level-3);color: var(--sklearn-color-background);text-decoration: none;
	}div.sk-estimator.fitted:hover .sk-estimator-doc-link.fitted:hover,
	.sk-estimator-doc-link.fitted:hover,
	div.sk-label-container:hover .sk-estimator-doc-link.fitted:hover,
	.sk-estimator-doc-link.fitted:hover {/* fitted */background-color: var(--sklearn-color-fitted-level-3);color: var(--sklearn-color-background);text-decoration: none;
	}/* Span, style for the box shown on hovering the info icon */
	.sk-estimator-doc-link span {display: none;z-index: 9999;position: relative;font-weight: normal;right: .2ex;padding: .5ex;margin: .5ex;width: min-content;min-width: 20ex;max-width: 50ex;color: var(--sklearn-color-text);box-shadow: 2pt 2pt 4pt #999;/* unfitted */background: var(--sklearn-color-unfitted-level-0);border: .5pt solid var(--sklearn-color-unfitted-level-3);
	}.sk-estimator-doc-link.fitted span {/* fitted */background: var(--sklearn-color-fitted-level-0);border: var(--sklearn-color-fitted-level-3);
	}.sk-estimator-doc-link:hover span {display: block;
	}/* "?"-specific style due to the `<a>` HTML tag /#sk-container-id-2 a.estimator_doc_link {float: right;font-size: 1rem;line-height: 1em;font-family: monospace;background-color: var(--sklearn-color-background);border-radius: 1rem;height: 1rem;width: 1rem;text-decoration: none;/ unfitted */color: var(--sklearn-color-unfitted-level-1);border: var(--sklearn-color-unfitted-level-1) 1pt solid;
	}#sk-container-id-2 a.estimator_doc_link.fitted {/* fitted */border: var(--sklearn-color-fitted-level-1) 1pt solid;color: var(--sklearn-color-fitted-level-1);
	}/* On hover */
	#sk-container-id-2 a.estimator_doc_link:hover {/* unfitted */background-color: var(--sklearn-color-unfitted-level-3);color: var(--sklearn-color-background);text-decoration: none;
	}#sk-container-id-2 a.estimator_doc_link.fitted:hover {/* fitted */background-color: var(--sklearn-color-fitted-level-3);
	}
	</style><div id="sk-container-id-2" class="sk-top-container" style="overflow: auto;"><div class="sk-text-repr-fallback"><pre>Pipeline(steps=[('vect',TfidfVectorizer(analyzer='char_wb', lowercase=False,ngram_range=(1, 3))),('clf', MultinomialNB(alpha=0.112))])</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label fitted sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-5" type="checkbox" ><label for="sk-estimator-id-5" class="sk-toggleable__label fitted sk-toggleable__label-arrow fitted">  Pipeline<a class="sk-estimator-doc-link fitted" rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.5/modules/generated/sklearn.pipeline.Pipeline.html">?<span>Documentation for Pipeline</span></a><span class="sk-estimator-doc-link fitted">i<span>Fitted</span></span></label><div class="sk-toggleable__content fitted"><pre>Pipeline(steps=[('vect',TfidfVectorizer(analyzer='char_wb', lowercase=False,ngram_range=(1, 3))),('clf', MultinomialNB(alpha=0.112))])</pre></div> </div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator fitted sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-6" type="checkbox" ><label for="sk-estimator-id-6" class="sk-toggleable__label fitted sk-toggleable__label-arrow fitted"> TfidfVectorizer<a class="sk-estimator-doc-link fitted" rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.5/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html">?<span>Documentation for TfidfVectorizer</span></a></label><div class="sk-toggleable__content fitted"><pre>TfidfVectorizer(analyzer='char_wb', lowercase=False, ngram_range=(1, 3))</pre></div> </div></div><div class="sk-item"><div class="sk-estimator fitted sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-7" type="checkbox" ><label for="sk-estimator-id-7" class="sk-toggleable__label fitted sk-toggleable__label-arrow fitted"> MultinomialNB<a class="sk-estimator-doc-link fitted" rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.5/modules/generated/sklearn.naive_bayes.MultinomialNB.html">?<span>Documentation for MultinomialNB</span></a></label><div class="sk-toggleable__content fitted"><pre>MultinomialNB(alpha=0.112)</pre></div> </div></div></div></div></div></div>

	## Evaluation Results

	\| Metric \| Value \|
	\|----------\|----------\|
	\| accuracy \| 0.951691 \|
	\| f1 score \| 0.951691 \|

	### Evaluation Methods

	The model is evaluated on validation data from the dataset's test split, using accuracy and F1-score with micro average.

	#### Confusion matrix

	![Confusion matrix](confusion_matrix.png)

	### Model description/Evaluation Results/Classification Report

	<details>
	<summary> Click to expand </summary>

	\| index \| precision \| recall \| f1-score \| support \|
	\|--------------\|-------------\|----------\|------------\|-----------\|
	\| greeting \| 0.926471 \| 0.969231 \| 0.947368 \| 65 \|
	\| gratitude \| 0.982456 \| 0.888889 \| 0.933333 \| 63 \|
	\| unknown \| 0.95122 \| 0.987342 \| 0.968944 \| 79 \|
	\| macro avg \| 0.953382 \| 0.948487 \| 0.949882 \| 207 \|
	\| weighted avg \| 0.952955 \| 0.951691 \| 0.951331 \| 207 \|

	</details>

	# How to Get Started with the Model

	```python
	import pickle
	with open(pkl_filename, 'rb') as file:
	clf = pickle.load(file)
	```

	# Model Card Authors

	This model card is written by following authors:

	[philipp-zettl](https://huggingface.co/philipp-zettl/)