This is ONLY the LoRA adapters, and not the full model!

Base model: https://huggingface.co/mesolitica/malaysian-tinyllama-1.1b-16k-instructions-v4

Fine-tuned on this dataset: https://huggingface.co/datasets/kaiimran/malaysia-tweets-sentiment

Following this tutorial: https://colab.research.google.com/drive/1AZghoNBQaMDgWJpi4RbffGM1h6raLUj9?usp=sharing

Evaluation on test dataset

  1. Accuracy: 0.9455

    • Interpretation: Approximately 94.55% of the predictions made by the model are correct. This is a high accuracy rate, indicating that the model performs well on the test dataset overall.
  2. Precision: 0.9936

    • Interpretation: Out of all the positive predictions made by the model, 99.36% were correct. This suggests that the model is very good at identifying true positive cases and has a very low false positive rate.
  3. Recall: 0.8980

    • Interpretation: Out of all the actual positive cases in the dataset, the model correctly identified 89.80% of them. While this is a good recall rate, it is relatively lower compared to precision, indicating that there are some false negatives (i.e., positive cases that the model failed to identify).
  4. F1 Score: 0.9434

    • Interpretation: The F1 score is the harmonic mean of precision and recall, balancing the two. An F1 score of 0.9434 indicates that the model achieves a good balance between precision and recall.

Overall Assessment

  • High Precision: The model has an excellent precision score, meaning it is highly reliable in predicting positive sentiment without mistakenly labeling too many negative cases as positive.
  • Good Recall: The recall score is also good, but slightly lower than precision, suggesting that there are some positive cases that the model misses.
  • Balanced Performance: The F1 score indicates that the model maintains a good balance between precision and recall, which is crucial for tasks like sentiment analysis.

Considerations for Improvement

  • Recall Improvement: Since recall is lower compared to precision, we might consider strategies to improve it, such as:
    • Data Augmentation: Adding more training data, particularly positive samples, might help the model learn to identify positive cases better.
    • Hyperparameter tuning: Like changing epochs, etc

Conclusion

The model shows strong performance, with particularly high precision and a good overall F1 score. The slightly lower recall suggests room for improvement, but the current metrics indicate that the model is very effective for binary sentiment analysis.

Downloads last month
10
Inference API
Unable to determine this model’s pipeline type. Check the docs .