vaishnavkoka commited on
Commit
53696b7
verified
1 Parent(s): efb7838

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -2
README.md CHANGED
@@ -17,8 +17,68 @@ base_model:
17
  library_name: transformers
18
  tags:
19
  - llama
20
- - gemma
21
  - sqaud
22
  - fine
23
  - tuned
24
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  library_name: transformers
18
  tags:
19
  - llama
 
20
  - sqaud
21
  - fine
22
  - tuned
23
+ ---
24
+
25
+ 1. Overview
26
+ This repository highlights the fine-tuning of the Llama-3.2-1B model on the SQuAD (Stanford Question Answering Dataset) dataset. The task involves training the model to accurately answer questions based on a given context passage. Fine-tuning the pre-trained Llama model aligns it with the objectives of extractive question-answering.
27
+
28
+ 2. Model Information
29
+ Model Used: meta-llama/Llama-3.2-1B
30
+ Pre-trained Parameters: The model contains approximately 1.03 billion parameters, verified during setup and matching official documentation.
31
+ Fine-tuned Parameters: The parameter count remains consistent with the pre-trained model, as fine-tuning only updates task-specific weights.
32
+
33
+ 3. Dataset and Task Details
34
+ Dataset: SQuAD
35
+ The Stanford Question Answering Dataset (SQuAD) is a benchmark dataset designed for extractive question-answering tasks. It contains passages with corresponding questions and answer spans extracted directly from the text.
36
+ Task Objective
37
+ Given a passage and a question, the model is trained to identify the correct span of text in the passage that answers the question.
38
+
39
+ 4. Fine-Tuning Approach
40
+ Train-Test Split: An 80:20 split was applied to the dataset, ensuring a balanced distribution of passages and questions in the train and test subsets. Stratified sampling was used, with a seed value of 1 for reproducibility.
41
+ Tokenization: Context and question pairs were tokenized with padding and truncation to ensure uniform input lengths (maximum 512 tokens).
42
+ Model Training: Fine-tuning was conducted over three epochs with a learning rate of 3e-5. Gradient accumulation and early stopping were used to enhance training efficiency and prevent overfitting.
43
+ Hardware: Training utilized GPU acceleration to handle the large model size and complex token sequences efficiently.
44
+
45
+ 5. Results and Observations
46
+ Zero-shot vs. Fine-tuned Performance: Without fine-tuning, the pre-trained Llama model demonstrated limited ability to answer questions accurately. Fine-tuning significantly improved the model鈥檚 performance on metrics such as F1 score, exact match, and ROUGE.
47
+
48
+ Fine-tuning Benefits: Training on the SQuAD dataset equipped the model with a deeper understanding of context and its relationship to specific queries, enhancing its ability to extract precise answer spans.
49
+
50
+ Model Parameters: The parameter count remained unchanged during fine-tuning, underscoring that performance improvements stemmed from the optimization of existing weights rather than structural changes.
51
+
52
+ 6. How to Use the Fine-Tuned Model
53
+ Install Necessary Libraries:
54
+
55
+ pip install transformers datasets
56
+ Load the Fine-Tuned Model:
57
+
58
+ from transformers import AutoTokenizer, AutoModelForQuestionAnswering
59
+
60
+ model_name = "<your-huggingface-repo>/squad-llama-finetuned"
61
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
62
+ model = AutoModelForQuestionAnswering.from_pretrained(model_name)
63
+ Make Predictions:
64
+
65
+ context = "Llama is a model developed by Meta AI designed for natural language understanding tasks."
66
+ question = "Who developed Llama?"
67
+
68
+ inputs = tokenizer(question, context, return_tensors="pt", truncation=True, padding=True)
69
+ outputs = model(**inputs)
70
+
71
+ start_idx = outputs.start_logits.argmax()
72
+ end_idx = outputs.end_logits.argmax()
73
+
74
+ answer = tokenizer.decode(inputs["input_ids"][0][start_idx:end_idx + 1])
75
+ print(f"Predicted Answer: {answer}")
76
+
77
+ 7. Key Takeaways
78
+ Fine-tuning Llama on SQuAD equips it with the ability to handle extractive question-answering tasks with high accuracy and precision.
79
+ The parameter count of the model does not change during fine-tuning, highlighting that performance enhancements are derived from weight updates rather than architectural modifications.
80
+ The comparison between zero-shot and fine-tuned performance demonstrates the necessity of task-specific training to achieve state-of-the-art results.
81
+
82
+ 8. Acknowledgments
83
+ Hugging Face for providing seamless tools for model fine-tuning and evaluation.
84
+ Stanford Question Answering Dataset for serving as a robust benchmark for extractive QA tasks.