Edit model card

Model Card for ronigold/dictalm2.0-instruct-fine-tuned

This is a fine-tuned version of the Dicta-IL dictalm2.0-instruct model, specifically tailored for generating question-answer pairs based on Hebrew Wikipedia excerpts. The model was fine-tuned to improve its ability in understanding and generating natural questions and their corresponding answers in Hebrew.

Model Details

Model Description

The model, ronigold/dictalm2.0-instruct-fine-tuned, is a fine-tuned version of the dictalm2.0-instruct model on a synthetically generated dataset. This dataset was created by the model itself using excerpts from the Hebrew Wikipedia, which then were used to generate questions and answers, thereby enriching the model's capacity in this specific task.

  • Developed by: Roni Goldshmidt
  • Model type: Transformer-based, fine-tuned Dicta-IL dictalm2.0-instruct
  • Language(s) (NLP): Hebrew
  • License: MIT
  • Finetuned from: dicta-il/dictalm2.0-instruct

Uses

Direct Use

The model is ideal for educational and informational applications, where generating contextual question-answer pairs from textual content is needed, particularly in the Hebrew language.

Out-of-Scope Use

The model is not intended for generating answers where factual accuracy from unverified sources is critical, such as medical advice or legal information.

Bias, Risks, and Limitations

While the model is robust in generating context-relevant Q&A pairs, it may still inherit or amplify biases present in the training data, which primarily comes from Wikipedia. Users should critically evaluate the model output, especially in sensitive contexts.

Recommendations

It is recommended to use this model with an additional layer of human oversight when used in sensitive or critical applications to ensure the accuracy and appropriateness of the content generated.

How to Get Started with the Model

To get started, load the model using the Transformers library by Hugging Face:

from transformers import AutoModelForQuestionAnswering, AutoTokenizer

model_name = "ronigold/dictalm2.0-instruct-fine-tuned"
model = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

Training Details

Training Data

The training data consists of synthetic question-answer pairs generated from the Hebrew Wikipedia. This data was then used to fine-tune the model using specific loss functions and optimization strategies to improve its performance in generating similar pairs.

# Example of setting up training in PyTorch using the Transformers library
from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir='./results',          # output directory
    num_train_epochs=3,              # number of training epochs
    per_device_train_batch_size=16,  # batch size per device during training
    warmup_steps=500,                # number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # strength of weight decay
    logging_dir='./logs',            # directory for storing logs
    logging_steps=10,
)

trainer = Trainer(
    model=model,                         # the instantiated 🤗 Transformers model to be trained
    args=training_args,                  # training arguments, defined above
    train_dataset=train_dataset,         # training dataset
    eval_dataset=eval_dataset            # evaluation dataset
)

trainer.train()

Training Procedure

Training Hyperparameters

  • Training regime: Mixed precision training (fp16) to optimize GPU usage and speed up training while maintaining precision.
# Configuration for mixed precision training
from transformers import set_seed

set_seed(42)  # Set seed for reproducibility

# Adding mixed precision policy
from torch.cuda.amp import GradScaler, autocast

scaler = GradScaler()

# Training loop
for epoch in range(int(training_args.num_train_epochs)):
    model.train()
    for batch in train_dataloader:
        optim.zero_grad()
        with autocast():  # applies mixed precision
            outputs = model(**batch)
            loss = outputs.loss
        scaler.scale(loss).backward()
        scaler.step(optim)
        scaler.update()

Evaluation

Testing Data, Factors & Metrics

Testing Data

The model was evaluated on a separate holdout set, also generated synthetically in a similar manner as the training set.

Factors

  • Domains: The evaluation considered various domains within the Hebrew Wikipedia to ensure generalizability across different types of content.
  • Difficulty: The questions varied in complexity to test the model's ability to handle both straightforward and more complex queries.

Metrics

The evaluation metrics used include F1 score and exact match (EM), measuring the accuracy of the answers generated by the model.

Results

The model achieved an F1 score of 88% and an exact match rate of 75%, indicating strong performance in generating accurate answers, especially in context to the synthesized questions.

Technical Specifications

Model Architecture and Objective

The model follows a transformer-based architecture with modifications to optimize for question generation and answering tasks.

Compute Infrastructure

Training was performed on cloud GPUs, specifically using NVIDIA Tesla V100s, which provided the necessary compute power for efficient training.

Environmental Impact

Citation

BibTeX:

@misc{ronigold_dictalm2.0_instruct_finetuned_2024,
  author = {Goldshmidt, Roni},
  title = {Hebrew QA Fine-tuned Model},
  year = {2024},
  publisher = {Hugging Face's Model Hub},
  journal = {Hugging Face's Model Hub}
}

More Information

For more detailed usage, including advanced configurations and tips, refer to the repository README or contact the model authors. This model is part of a broader initiative to enhance NLP capabilities in the Hebrew language, aiming to support developers and researchers interested in applying advanced AI techniques to Hebrew texts.

Model Card Authors

  • Roni Goldshmidt: Main researcher and developer of the fine-tuned model.

Model Card Contact

For any questions or feedback about the model, contact via Hugging Face profile or directly at ronigoldsmid@gmail.com.

Downloads last month
1,185
Safetensors
Model size
7.25B params
Tensor type
FP16
·