Multi2ConvAI-Quality: French logistic regression model using fasttext embeddings
This model was developed in the Multi2ConvAI project:
- domain: Quality (more details about our use cases: (en, de))
- language: French (fr)
- model type: logistic regression
- embeddings: fastText embeddings
How to run
Requires:
- multi2convai
- serialized fastText embeddings (see last section of this readme or these instructions)
Run with one line of code
After installing multi2convai
and locally available fastText embeddings you can run:
# assumes working dir is the root of the cloned multi2convai repo
python scripts/run_inference.py -m multi2convai-quality-fr-logreg-ft
>>> Create pipeline for config: multi2convai-quality-fr-logreg-ft.
>>> Created a LogisticRegressionFasttextPipeline for domain: 'quality' and language 'fr'.
>>>
>>> Enter your text (type 'stop' to end execution): Lancer le programme
>>> 'Lancer le programme' was classified as 'neo.start' (confidence: 0.8943)
How to run model using multi2convai
After installing multi2convai
and locally available fastText embeddings you can run:
# assumes working dir is the root of the cloned multi2convai repo
from pathlib import Path
from multi2convai.pipelines.inference.base import ClassificationConfig
from multi2convai.pipelines.inference.logistic_regression_fasttext import (
LogisticRegressionFasttextConfig,
LogisticRegressionFasttextPipeline,
)
language = "fr"
domain = "quality"
# 1. Define paths of model, label dict and embeddings
model_file = "model.pth"
label_dict_file = "label_dict.json"
embedding_path = Path(
f"../models/embeddings/fasttext/fr/wiki.200k.fr.embed"
)
vocabulary_path = Path(
f"../models/embeddings/fasttext/fr/wiki.200k.fr.vocab"
)
# 2. Create and setup pipeline
model_config = LogisticRegressionFasttextConfig(
model_file, embedding_path, vocabulary_path
)
config = ClassificationConfig(language, domain, label_dict_file, model_config)
pipeline = LogisticRegressionFasttextPipeline(config)
pipeline.setup()
# 3. Run intent classification on a text of your choice
label = pipeline.run("Lancer le programme")
label
>>> Label(string='neo.start', ratio='0.8943')
Download and serialize fastText
# assumes working dir is the root of the cloned multi2convai repo
mkdir models/fasttext/fr
curl https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.fr.vec --output models/fasttext/fr/wiki.fr.vec
python scripts/serialize_fasttext.py -r fasttext/wiki.fr.vec -v fasttext/fr/wiki.200k.fr.vocab -e fasttext/fr/wiki.200k.fr.embed -n 200000