Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Question Difficulty Classification Model

Introduction

This project aims to classify question answer pairs based on it's difficulty as easy,Medium or hard.You can pass a single question-answer pair seperated by comma or a list of question-answer pairs to the model. I have fine tuned bert-base-cased model with pre-trained parameter on Question-Answer Dataset by Carnegie Mellon University for this task

Table of Contents

Model Details

Model Description: This model is a fine-tune checkpoint of bert-base-cased,pretrained on a large corpus of English data in a self-supervised fashion. . This model reaches an accuracy of 95 on the dev set (for comparison, Bert bert-base-uncased version reaches an accuracy of 97).

  • Developed by: Hugging Face
  • Model Type: Text Classification
  • Language(s): English
  • License: Apache-2.0
  • Parent Model: For more details about lBERT, we encourage users to check out this model card.
  • Resources for more information:

Dependencies

  • Transformer
  • Python 3.7.13
  • Numpy

How to use the model

  1. Import Essential Libraries ​​
from transformers import TFBertModel
from transformers import BertTokenizer
import tensorflow as tf
  1. Load the Model and Tokenizer
questionclassification_model = tf.keras.models.load_model(<path to the model>)
tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
  1. Essential Functions
def prepare_data(input_text):
  
    token = tokenizer.batch_encode_plus(
        input_text,
        max_length=256, 
        truncation=True, 
        padding='max_length', 
        add_special_tokens=True,
        return_tensors='tf'
    )
    return {
        'input_ids': tf.cast(token['input_ids'], tf.float64),
        'attention_mask': tf.cast(token['attention_mask'], tf.float64)
    }

def make_prediction(model, processed_data, classes=['Easy', 'Medium', 'Hard']):
    outcls=[]
    probs = model.predict(processed_data)
    s=probs.argmax(axis=1)
    for i in range(0,len(probs)):
      outcls.append(classes[s[i]])
    return outcls,probs;

3.Make predictions on the list of questions-answer pairs

input_text = ["What is gandhi commonly considered to be?,Father of the nation in india","What is the long-term warming of the planets overall temperature called?, Global Warming"]
processed_data = prepare_data(input_text)
result,prob = make_prediction(questionclassification_model, processed_data=processed_data)
for i in range (len(result)):
  print(f"{result[i]} : {max(prob[i])}")

Risks, Limitations and Biases

  • The predicted outputs have only very less easy category questions.
  • 90% of the easy questions in the dataset are yes/no type questions.
  • Very few datasets are available in public for question difficulty classification.
  • People who are experts in a specific subject can only create a dataset for this task.Otherwise,The model will generate wrong results.

Training

Training Data

I used Question-Answer Dataset by Carnegie Mellon University for this task

Training Procedure

Fine-tuning hyper-parameters
  • learning_rate = 1e-5
  • decay = 1e-6
  • optimizer = adam
  • loss function = categorical cross entropy
  • max_length = 256
  • num_train_epochs = 10
Downloads last month
0
Unable to determine this model’s pipeline type. Check the docs .