Model description

This is a Support Vector Classifier model trained on JeVeuxAider dataset. As input, the model takes text embeddings encoded with camembert-base (768 tokens)

Intended uses & limitations

This model is not ready to be used in production.

Training Procedure

[More Information Needed]

Hyperparameters

Click to expand

Hyperparameter	Value
memory
steps	[('columntransformer', ColumnTransformer(transformers=[('num', Pipeline(steps=[('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler()), ('pca', PCA(n_components=689))]), Index(['avg_1', 'avg_2', 'avg_3', 'avg_4', 'avg_5', 'avg_6', 'avg_7', 'avg_8', 'avg_9', 'avg_10', ... 'max_759', 'max_760', 'max_761', 'max_762', 'max_763', 'max_764', 'max_765', 'max_766', 'max_767', 'max_768'], dtype='object', length=2304))], verbose_feature_names_out=False)), ('randomforestclassifier', RandomForestClassifier(max_depth=15, max_features=20, min_samples_split=10, random_state=42))]
verbose	False
columntransformer	ColumnTransformer(transformers=[('num', Pipeline(steps=[('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler()), ('pca', PCA(n_components=689))]), Index(['avg_1', 'avg_2', 'avg_3', 'avg_4', 'avg_5', 'avg_6', 'avg_7', 'avg_8', 'avg_9', 'avg_10', ... 'max_759', 'max_760', 'max_761', 'max_762', 'max_763', 'max_764', 'max_765', 'max_766', 'max_767', 'max_768'], dtype='object', length=2304))], verbose_feature_names_out=False)
randomforestclassifier	RandomForestClassifier(max_depth=15, max_features=20, min_samples_split=10, random_state=42)
columntransformer__n_jobs
columntransformer__remainder	drop
columntransformer__sparse_threshold	0.3
columntransformer__transformer_weights
columntransformer__transformers	[('num', Pipeline(steps=[('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler()), ('pca', PCA(n_components=689))]), Index(['avg_1', 'avg_2', 'avg_3', 'avg_4', 'avg_5', 'avg_6', 'avg_7', 'avg_8', 'avg_9', 'avg_10', ... 'max_759', 'max_760', 'max_761', 'max_762', 'max_763', 'max_764', 'max_765', 'max_766', 'max_767', 'max_768'], dtype='object', length=2304))]
columntransformer__verbose	False
columntransformer__verbose_feature_names_out	False
columntransformer__num	Pipeline(steps=[('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler()), ('pca', PCA(n_components=689))])
columntransformer__num__memory
columntransformer__num__steps	[('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler()), ('pca', PCA(n_components=689))]
columntransformer__num__verbose	False
columntransformer__num__imputer	SimpleImputer(strategy='median')
columntransformer__num__scaler	StandardScaler()
columntransformer__num__pca	PCA(n_components=689)
columntransformer__num__imputer__add_indicator	False
columntransformer__num__imputer__copy	True
columntransformer__num__imputer__fill_value
columntransformer__num__imputer__keep_empty_features	False
columntransformer__num__imputer__missing_values	nan
columntransformer__num__imputer__strategy	median
columntransformer__num__imputer__verbose	deprecated
columntransformer__num__scaler__copy	True
columntransformer__num__scaler__with_mean	True
columntransformer__num__scaler__with_std	True
columntransformer__num__pca__copy	True
columntransformer__num__pca__iterated_power	auto
columntransformer__num__pca__n_components	689
columntransformer__num__pca__n_oversamples	10
columntransformer__num__pca__power_iteration_normalizer	auto
columntransformer__num__pca__random_state
columntransformer__num__pca__svd_solver	auto
columntransformer__num__pca__tol	0.0
columntransformer__num__pca__whiten	False
randomforestclassifier__bootstrap	True
randomforestclassifier__ccp_alpha	0.0
randomforestclassifier__class_weight
randomforestclassifier__criterion	gini
randomforestclassifier__max_depth	15
randomforestclassifier__max_features	20
randomforestclassifier__max_leaf_nodes
randomforestclassifier__max_samples
randomforestclassifier__min_impurity_decrease	0.0
randomforestclassifier__min_samples_leaf	1
randomforestclassifier__min_samples_split	10
randomforestclassifier__min_weight_fraction_leaf	0.0
randomforestclassifier__n_estimators	100
randomforestclassifier__n_jobs
randomforestclassifier__oob_score	False
randomforestclassifier__random_state	42
randomforestclassifier__verbose	0
randomforestclassifier__warm_start	False

Model Plot

Pipeline(steps=[('columntransformer',ColumnTransformer(transformers=[('num',Pipeline(steps=[('imputer',SimpleImputer(strategy='median')),('scaler',StandardScaler()),('pca',PCA(n_components=689))]),Index(['avg_1', 'avg_2', 'avg_3', 'avg_4', 'avg_5', 'avg_6', 'avg_7', 'avg_8','avg_9', 'avg_10',...'max_759', 'max_760', 'max_761', 'max_762', 'max_763', 'max_764','max_765', 'max_766', 'max_767', 'max_768'],dtype='object', length=2304))],verbose_feature_names_out=False)),('randomforestclassifier',RandomForestClassifier(max_depth=15, max_features=20,min_samples_split=10,random_state=42))])

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Evaluation Results

Metric	Value
accuracy	0.964661
f1 score	0.964637

Confusion Matrix

How to Get Started with the Model

[More Information Needed]

Model Card Authors

huynhdoo

Model Card Contact

You can contact the model card authors through following channels: [More Information Needed]

Citation

BibTeX

@inproceedings{...,year={2023}}

get_started_code

import pickle as pickle with open(pkl_filename, 'rb') as file: pipe = pickle.load(file)