--- language: - pt - en license: cc tags: - text-generation-inference - transformers - mistral - gguf - brazil - brasil - portuguese metrics: - name: assin2_rte f1_macro type: assin2_rte value: 90.13 - name: assin2_rte acc type: assin2_rte value: 90.16 - name: assin2_sts pearson type: assin2_sts value: 71.51 - name: assin2_sts mse type: assin2_sts value: 68.03 - name: bluex acc type: bluex value: 47.98 - name: enem acc type: enem value: 58.43 - name: faquad_nli f1_macro type: faquad_nli value: 64.24 - name: faquad_nli acc type: faquad_nli value: 67.69 - name: hatebr_offensive_binary f1_macro type: hatebr_offensive_binary value: 83.61 - name: hatebr_offensive_binary acc type: hatebr_offensive_binary value: 83.71 - name: oab_exams acc type: oab_exams value: 38.41 - name: portuguese_hate_speech_binary f1_macro type: portuguese_hate_speech_binary value: 61.87 - name: portuguese_hate_speech_binary acc type: portuguese_hate_speech_binary value: 63.22 base_model: mistralai/Mistral-7B-Instruct-v0.2 pipeline_tag: text-generation model-index: - name: CabraMistral7b results: - task: type: text-generation name: Text Generation dataset: name: ENEM Challenge (No Images) type: eduagarcia/enem_challenge split: train args: num_few_shot: 3 metrics: - type: acc value: 60.81 name: accuracy source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/CabraMistral7b name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: BLUEX (No Images) type: eduagarcia-temp/BLUEX_without_images split: train args: num_few_shot: 3 metrics: - type: acc value: 46.87 name: accuracy source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/CabraMistral7b name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: OAB Exams type: eduagarcia/oab_exams split: train args: num_few_shot: 3 metrics: - type: acc value: 38.59 name: accuracy source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/CabraMistral7b name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Assin2 RTE type: assin2 split: test args: num_few_shot: 15 metrics: - type: f1_macro value: 90.27 name: f1-macro source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/CabraMistral7b name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Assin2 STS type: eduagarcia/portuguese_benchmark split: test args: num_few_shot: 15 metrics: - type: pearson value: 72.25 name: pearson source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/CabraMistral7b name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: FaQuAD NLI type: ruanchaves/faquad-nli split: test args: num_few_shot: 15 metrics: - type: f1_macro value: 64.35 name: f1-macro source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/CabraMistral7b name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: HateBR Binary type: ruanchaves/hatebr split: test args: num_few_shot: 25 metrics: - type: f1_macro value: 83.15 name: f1-macro source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/CabraMistral7b name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: PT Hate Speech Binary type: hate_speech_portuguese split: test args: num_few_shot: 25 metrics: - type: f1_macro value: 64.82 name: f1-macro source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/CabraMistral7b name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: tweetSentBR type: eduagarcia-temp/tweetsentbr split: test args: num_few_shot: 25 metrics: - type: f1_macro value: 64.8 name: f1-macro source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/CabraMistral7b name: Open Portuguese LLM Leaderboard --- # Cabra Mistral 7b v2 Esse modelo é um finetune do [Mistral 7b Instruct 0.2](https://huggingface.co/mistralai/mistral-7b-instruct-v0.2) com o dataset interno Cabra 10k. Esse modelo é optimizado para português. Ele apresenta melhoria em varios benchmarks brasileiros em comparação com o modelo base. **Conheça os nossos outros modelos: [Cabra](https://huggingface.co/collections/botbot-ai/models-6604c2069ceef04f834ba99b).** ## Detalhes do Modelo ### Modelo: Mistral 7b Instruct 0.2 Mistral-7B-v0.1 é um modelo de transformador, com as seguintes escolhas arquitetônicas: - Grouped-Query Attention - Sliding-Window Attention - Byte-fallback BPE tokenizer ### dataset: Cabra 10k Dataset interno para finetuning. Vamos lançar em breve. ### Quantização / GGUF Colocamos diversas versões (GGUF) quantanizadas no branch "quantanization". ### Exemplo ``` [INST] who is Elon Musk? [/INST]Elon Musk é um empreendedor, inventor e capitalista americano. Ele é o fundador, CEO e CTO da SpaceX, CEO da Neuralink e fundador do The Boring Company. Musk também é o proprietário do Twitter. ``` ### Paramentros de trainamento ``` - learning_rate: 1e-05 - train_batch_size: 4 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - num_devices: 2 - gradient_accumulation_steps: 8 - total_train_batch_size: 64 - total_eval_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.01 - num_epochs: 3 ``` ### Framework - Transformers 4.39.0.dev0 - Pytorch 2.1.2+cu118 - Datasets 2.14.6 - Tokenizers 0.15.2 ## Uso O modelo é destinado, por agora, a fins de pesquisa. As áreas e tarefas de pesquisa possíveis incluem: - Pesquisa sobre modelos gerativos. - Investigação e compreensão das limitações e viéses de modelos gerativos. **Proibido para uso comercial. Somente Pesquisa.** ### Evals | Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |-----------------------------|---------|----------------------|--------|----------|--------|---------| | assin2_rte | 1.1 | all | 15 | f1_macro | 0.9013 | ± 0.0043 | | | | all | 15 | acc | 0.9016 | ± 0.0043 | | assin2_sts | 1.1 | all | 15 | pearson | 0.7151 | ± 0.0074 | | | | all | 15 | mse | 0.6803 | ± N/A | | bluex | 1.1 | all | 3 | acc | 0.4798 | ± 0.0107 | | | | exam_id__USP_2019 | 3 | acc | 0.375 | ± 0.044 | | | | exam_id__USP_2021 | 3 | acc | 0.3462 | ± 0.0382 | | | | exam_id__USP_2020 | 3 | acc | 0.4107 | ± 0.0379 | | | | exam_id__UNICAMP_2018| 3 | acc | 0.4815 | ± 0.0392 | | | | exam_id__UNICAMP_2020| 3 | acc | 0.4727 | ± 0.0389 | | | | exam_id__UNICAMP_2021_1| 3 | acc | 0.413 | ± 0.0418 | | | | exam_id__UNICAMP_2019| 3 | acc | 0.42 | ± 0.0404 | | | | exam_id__UNICAMP_2022| 3 | acc | 0.5897 | ± 0.0456 | | | | exam_id__USP_2022 | 3 | acc | 0.449 | ± 0.041 | | | | exam_id__USP_2024 | 3 | acc | 0.6341 | ± 0.0434 | | | | exam_id__UNICAMP_2024| 3 | acc | 0.6 | ± 0.0422 | | | | exam_id__USP_2023 | 3 | acc | 0.5455 | ± 0.0433 | | | | exam_id__UNICAMP_2023| 3 | acc | 0.5349 | ± 0.044 | | | | exam_id__USP_2018 | 3 | acc | 0.4815 | ± 0.0393 | | | | exam_id__UNICAMP_2021_2| 3 | acc | 0.5098 | ± 0.0403 | | enem | 1.1 | all | 3 | acc | 0.5843 | ± 0.0075 | | | | exam_id__2010 | 3 | acc | 0.5726 | ± 0.0264 | | | | exam_id__2009 | 3 | acc | 0.6 | ± 0.0264 | | | | exam_id__2014 | 3 | acc | 0.633 | ± 0.0268 | | | | exam_id__2022 | 3 | acc | 0.6165 | ± 0.0243 | | | | exam_id__2012 | 3 | acc | 0.569 | ± 0.0265 | | | | exam_id__2013 | 3 | acc | 0.5833 | ± 0.0274 | | | | exam_id__2016_2 | 3 | acc | 0.5203 | ± 0.026 | | | | exam_id__2011 | 3 | acc | 0.6325 | ± 0.0257 | | | | exam_id__2023 | 3 | acc | 0.5778 | ± 0.0246 | | | | exam_id__2016 | 3 | acc | 0.595 | ± 0.0258 | | | | exam_id__2017 | 3 | acc | 0.5517 | ± 0.0267 | | | | exam_id__2015 | 3 | acc | 0.563 | ± 0.0261 | | faquad_nli | 1.1 | all | 15 | f1_macro | 0.6424 | ± 0.0138 | | | | all | 15 | acc | 0.6769 | ± 0.013 | | hatebr_offensive_binary | 1 | all | 25 | f1_macro | 0.8361 | ± 0.007 | | | | all | 25 | acc | 0.8371 | ± 0.007 | | oab_exams | 1.5 | all | 3 | acc | 0.3841 | ± 0.006 | | | | exam_id__2011-03 | 3 | acc | 0.3636 | ± 0.0279 | | | | exam_id__2014-14 | 3 | acc | 0.475 | ± 0.0323 | | | | exam_id__2016-21 | 3 | acc | 0.4125 | ± 0.0318 | | | | exam_id__2012-06a | 3 | acc | 0.3875 | ± 0.0313 | | | | exam_id__2014-13 | 3 | acc | 0.325 | ± 0.0303 | | | | exam_id__2015-16 | 3 | acc | 0.425 | ± 0.032 | | | | exam_id__2010-02 | 3 | acc | 0.4 | ± 0.0283 | | | | exam_id__2012-08 | 3 | acc | 0.3875 | ± 0.0314 | | | | exam_id__2011-05 | 3 | acc | 0.375 | ± 0.0312 | | | | exam_id__2017-22 | 3 | acc | 0.4 | ± 0.0316 | | | | exam_id__2018-25 | 3 | acc | 0.4125 | ± 0.0318 | | | | exam_id__2012-09 | 3 | acc | 0.3636 | ± 0.0317 | | | | exam_id__2017-24 | 3 | acc | 0.3375 | ± 0.0304 | | | | exam_id__2016-20a | 3 | acc | 0.3125 | ± 0.0299 | | | | exam_id__2012-06 | 3 | acc | 0.425 | ± 0.0318 | | | | exam_id__2013-12 | 3 | acc | 0.4375 | ± 0.0321 | | | | exam_id__2016-20 | 3 | acc | 0.45 | ± 0.0322 | | | | exam_id__2013-11 | 3 | acc | 0.4 | ± 0.0316 | | | | exam_id__2015-17 | 3 | acc | 0.4231 | ± 0.0323 | | | | exam_id__2015-18 | 3 | acc | 0.4 | ± 0.0316 | | | | exam_id__2017-23 | 3 | acc | 0.35 | ± 0.0308 | | | | exam_id__2010-01 | 3 | acc | 0.2471 | ± 0.0271 | | | | exam_id__2011-04 | 3 | acc | 0.375 | ± 0.0313 | | | | exam_id__2016-19 | 3 | acc | 0.4103 | ± 0.0321 | | | | exam_id__2013-10 | 3 | acc | 0.3375 | ± 0.0305 | | | | exam_id__2012-07 | 3 | acc | 0.3625 | ± 0.031 | | | | exam_id__2014-15 | 3 | acc | 0.3846 | ± 0.0318 | | portuguese_hate_speech_binary | 1 | all | 25 | f1_macro | 0.6187 | ± 0.0119 | | | | all | 25 | acc | 0.6322 | ± 0.0117 | # [Open Portuguese LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/nicolasdec/CabraMistral7b) | Metric | Value | |--------------------------|--------| |Average |**65.1**| |ENEM Challenge (No Images)| 60.81| |BLUEX (No Images) | 46.87| |OAB Exams | 38.59| |Assin2 RTE | 90.27| |Assin2 STS | 72.25| |FaQuAD NLI | 64.35| |HateBR Binary | 83.15| |PT Hate Speech Binary | 64.82| |tweetSentBR | 64.80|