Alessandro Ercolani

giux78

AI & ML interests

NLP, Reinforcement Learning, Semantics, Computational Neuroscience

Articles

Organizations

Posts 7

view post
Post
๐ŸŽ‰ Super @DeepMount00 just released ๐—š๐—ฒ๐—บ๐—บ๐—ฎ_๐—ค๐—”_๐—œ๐—ง๐—”_๐˜ƒ๐Ÿฏ ๐—น๐—ฒ๐—ฎ๐—ฑ๐—ถ๐—ป๐—ด the ๐—ฅ๐—”๐—š ๐˜๐—ฎ๐˜€๐—ธ on the Italian ๐—Ÿ๐—Ÿ๐— _๐—œ๐—ง๐—”_๐—Ÿ๐—˜๐—”๐——๐—˜๐—ฅ๐—•๐—ข๐—”๐—ฅ๐——. The model is a fine tuned version of Gemma 2B.
Model details: DeepMount00/Gemma_QA_ITA_v3
Explore the full RAG section rankings here: FinancialSupport/open_ita_llm_leaderboard on section Classifica RAG
view post
Post
On evaluating fine tuned 7B Italian open source LLMs I have collected many data points and I created a super simple explorative analyses. My hypothesis based on data are:

- mmlu is hard to improve when fine tuning a base model on a different language
- fine tuning also on single GPUs can improve by 5% to 10% the base model on common tasks but a lot more on specific cases with the right training time and data
- fine tuning can specialize well but at cost of loosing some foundational knowledge.

Here the data https://docs.google.com/spreadsheets/d/1MBcxy1loK8eIycZG4DN84Q2ejZ0jSjxUBgoShHDR6IY/edit?usp=sharing
Here the colab https://colab.research.google.com/drive/1ra4_skG5QYWSYOzvagOoIoj4bibQD8Gw?usp=sharing
Here an article with some considerations https://medium.com/@giuxale/an-analyses-on-italian-llms-models-evaluations-51bffe1d44d1