### How to use #### Requirements Transformers require `transformers` and `sentencepiece`, both of which can be installed using `pip`. ```sh pip install transformers sentencepiece ``` #### Pipelines 🚀 In case you are not familiar with Transformers, you can use pipelines instead. Note that, pipelines can't have _no answer_ for the questions. ```python from transformers import pipeline model_name = "SajjadAyoubi/lm-roberta-large-fa-qa" qa_pipeline = pipeline("question-answering", model=model_name, tokenizer=model_name) text = "سلام من سجاد ایوبی هستم ۲۰ سالمه و به پردازش زبان طبیعی علاقه دارم" questions = ["اسمم چیه؟", "چند سالمه؟", "به چی علاقه دارم؟"] for question in questions: print(qa_pipeline({"context": text, "question": question})) >>> {'score': 0.4839823544025421, 'start': 8, 'end': 18, 'answer': 'سجاد ایوبی'} >>> {'score': 0.3747948706150055, 'start': 24, 'end': 32, 'answer': '۲۰ سالمه'} >>> {'score': 0.5945395827293396, 'start': 38, 'end': 55, 'answer': 'پردازش زبان طبیعی'} ``` #### Manual approach 🔥 Using the Manual approach, it is possible to have _no answer_ with even better performance. - PyTorch ```python from transformers import AutoTokenizer, AutoModelForQuestionAnswering from src.utils import AnswerPredictor model_name = "SajjadAyoubi/lm-roberta-large-fa-qa" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForQuestionAnswering.from_pretrained(model_name) text = "سلام من سجاد ایوبی هستم ۲۰ سالمه و به پردازش زبان طبیعی علاقه دارم" questions = ["اسمم چیه؟", "چند سالمه؟", "به چی علاقه دارم؟"] # this class is from src/utils.py and you can read more about it predictor = AnswerPredictor(model, tokenizer, device="cpu", n_best=10) preds = predictor(questions, [text] * 3, batch_size=3) for k, v in preds.items(): print(v) ``` Produces an output such below: ``` 100%|██████████| 1/1 [00:00<00:00, 3.56it/s] {'score': 8.040637016296387, 'text': 'سجاد ایوبی'} {'score': 9.901972770690918, 'text': '۲۰'} {'score': 12.117212295532227, 'text': 'پردازش زبان طبیعی'} ``` - TensorFlow 2.X ```python from transformers import AutoTokenizer, TFAutoModelForQuestionAnswering from src.utils import TFAnswerPredictor model_name = "SajjadAyoubi/lm-roberta-large-fa-qa" tokenizer = AutoTokenizer.from_pretrained(model_name) model = TFAutoModelForQuestionAnswering.from_pretrained(model_name) text = "سلام من سجاد ایوبی هستم ۲۰ سالمه و به پردازش زبان طبیعی علاقه دارم" questions = ["اسمم چیه؟", "چند سالمه؟", "به چی علاقه دارم؟"] # this class is from src/utils.py, you can read more about it predictor = TFAnswerPredictor(model, tokenizer, n_best=10) preds = predictor(questions, [text] * 3, batch_size=3) for k, v in preds.items(): print(v) ``` Produces an output such below: ```text 100%|██████████| 1/1 [00:00<00:00, 3.56it/s] {'score': 8.040637016296387, 'text': 'سجاد ایوبی'} {'score': 9.901972770690918, 'text': '۲۰'} {'score': 12.117212295532227, 'text': 'پردازش زبان طبیعی'} ``` Or you can access the whole demonstration using [HowToUse iPython Notebook on Google Colab](https://colab.research.google.com/github/sajjjadayobi/PersianQA/blob/main/notebooks/HowToUse.ipynb)