phpaiola commited on
Commit
776a6f9
1 Parent(s): a781b05

Upload 9 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,8 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ bode-13b-alpaca-f16.gguf filter=lfs diff=lfs merge=lfs -text
37
+ bode-13b-alpaca-q4_0.gguf filter=lfs diff=lfs merge=lfs -text
38
+ bode-13b-alpaca-q4_k_m.gguf filter=lfs diff=lfs merge=lfs -text
39
+ bode-13b-alpaca-q5_k_m.gguf filter=lfs diff=lfs merge=lfs -text
40
+ bode-13b-alpaca-q8_0.gguf filter=lfs diff=lfs merge=lfs -text
Logo_Bode_LLM_GGUF.jpeg ADDED
README.md CHANGED
@@ -1,3 +1,125 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ language:
4
+ - pt
5
+ - en
6
+ metrics:
7
+ - accuracy
8
+ - f1
9
+ - precision
10
+ - recall
11
+ pipeline_tag: text-generation
12
+ tags:
13
+ - LLM
14
+ - Portuguese
15
+ - Bode
16
+ - Alpaca
17
+ - Llama 2
18
+ inference: false
19
  ---
20
+
21
+ # BODE - GGUF VERSION
22
+
23
+ <!--- PROJECT LOGO -->
24
+ <p align="center">
25
+ <img src="https://huggingface.co/recogna-nlp/bode-7b-alpaca-pt-br-gguf/resolve/main/Logo_Bode_LLM_GGUF.jpeg" alt="Bode Logo" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
26
+ </p>
27
+
28
+ Este repositório contém o modelo Bode de 7B de parâmetros em formato GGUF, na versão de 32 e 16 bits e também nas versões quantizadas de 8, 5 e 4 bits.
29
+
30
+ Bode é um modelo de linguagem (LLM) para o português desenvolvido a partir do modelo Llama 2 por meio de fine-tuning no dataset Alpaca, traduzido para o português pelos autores do Cabrita. Este modelo é projetado para tarefas de processamento de linguagem natural em português, como geração de texto, tradução automática, resumo de texto e muito mais.
31
+ O objetivo do desenvolvimento do BODE é suprir a escassez de LLMs para a língua portuguesa. Modelos clássicos, como o próprio LLaMa, são capazes de responder prompts em português, mas estão sujeitos a muitos erros de gramática e, por vezes, geram respostas na língua inglesa. Ainda há poucos modelos em português disponíveis para uso gratuito e, segundo nosso conhecimento, não modelos disponíveis com 13b de parâmetros ou mais treinados especificamente com dados em português.
32
+
33
+ Acesse o [artigo](https://arxiv.org/abs/2401.02909) para mais informações sobre o Bode.
34
+
35
+
36
+ # Sobre o formato GGUF
37
+
38
+ O modelo no formato GGUF permite seu uso para inferência usando o llama.cpp, permitindo tanto o uso de CPU como de GPU, e outras bibliotecas e ferramentas compatíveis, como:
39
+ * [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
40
+ * [KoboldCpp](https://github.com/LostRuins/koboldcpp)
41
+ * [LM Studio](https://lmstudio.ai/)
42
+ * [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui)
43
+ * [ctransformers](https://github.com/marella/ctransformers)
44
+ * [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)
45
+
46
+
47
+ ## Detalhes do Modelo
48
+
49
+ - **Modelo Base:** Llama 2
50
+ - **Dataset de Treinamento:** Alpaca
51
+ - **Idioma:** Português
52
+
53
+ ## Versões disponíveis
54
+
55
+ | Quantidade de parâmetros | PEFT | Modelo |
56
+ | :-: | :-: | :-: |
57
+ | 7b | &check; | [recogna-nlp/bode-7b-alpaca-pt-br](https://huggingface.co/recogna-nlp/bode-7b-alpaca-pt-br) |
58
+ | 13b | &check; | [recogna-nlp/bode-13b-alpaca-pt-br](https://huggingface.co/recogna-nlp/bode-13b-alpaca-pt-br)|
59
+ | 7b | | [recogna-nlp/bode-7b-alpaca-pt-br-no-peft](https://huggingface.co/recogna-nlp/bode-7b-alpaca-pt-br-no-peft) |
60
+ | 7b-gguf | | [recogna-nlp/bode-7b-alpaca-pt-br-gguf](https://huggingface.co/recogna-nlp/bode-7b-alpaca-pt-br-gguf) |
61
+
62
+ ## Uso
63
+
64
+ Segue um exemplo de uso da versão quantizada de 5 bits utilizando o ctransformers e o LangChain:
65
+
66
+ ```python
67
+
68
+ # Downloads necessários
69
+ !pip install ctransformers
70
+ !pip install langchain
71
+
72
+ from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
73
+
74
+ from langchain.chains import LLMChain
75
+ from langchain.prompts import PromptTemplate
76
+ from langchain.llms import CTransformers
77
+
78
+ template = """Abaixo está uma instrução que descreve uma tarefa. Escreva uma resposta que complete adequadamente o pedido.
79
+
80
+ ### Instrução:
81
+ {instruction}
82
+
83
+ ### Resposta:"""
84
+
85
+ prompt = PromptTemplate(template=template, input_variables=["question"])
86
+
87
+ llm = CTransformers(model="recogna-nlp/bode-7b-alpaca-pt-br-gguf", model_file="bode-7b-alpaca-q8_0.gguf", model_type='llama')
88
+ llm_chain = LLMChain(prompt=prompt, llm=llm)
89
+
90
+ response = llm_chain.run("O que é um bode?")
91
+ print(response)
92
+ #Exemplo de resposta obtida (pode variar devido a temperatura): Um bode é um animal de quatro patas e membros postiados atrás, com um corpo alongado e coberto por pelagem escura.
93
+
94
+ ```
95
+
96
+ ## Treinamento e Dados
97
+
98
+ O modelo Bode foi treinado por fine-tuning a partir do modelo Llama 2 usando o dataset Alpaca em português, que consiste em um Instruction-based dataset. O treinamento foi realizado no Supercomputador Santos Dumont do LNCC, através do projeto da Fundunesp 2019/00697-8.
99
+
100
+ ## Citação
101
+
102
+ Se você deseja utilizar o Bode em sua pesquisa, pode citar este [artigo](https://arxiv.org/abs/2401.02909) que discute o modelo com mais detalhes. Cite-o da seguinte maneira:
103
+
104
+
105
+ ```
106
+ @misc{bode2024,
107
+ title={Introducing Bode: A Fine-Tuned Large Language Model for Portuguese Prompt-Based Task},
108
+ author={Gabriel Lino Garcia and Pedro Henrique Paiola and Luis Henrique Morelli and Giovani Candido and Arnaldo Cândido Júnior and Danilo Samuel Jodas and Luis C. S. Afonso and Ivan Rizzo Guilherme and Bruno Elias Penteado and João Paulo Papa},
109
+ year={2024},
110
+ eprint={2401.02909},
111
+ archivePrefix={arXiv},
112
+ primaryClass={cs.CL}
113
+ }
114
+ ```
115
+
116
+ ## Contribuições
117
+
118
+ Contribuições para a melhoria deste modelo são bem-vindas. Sinta-se à vontade para abrir problemas e solicitações pull.
119
+
120
+ ## Agradecimentos
121
+
122
+ Agradecemos ao Laboratório Nacional de Computação Científica (LNCC/MCTI, Brasil) por prover os recursos de CAD do supercomputador SDumont.
123
+
124
+
125
+ ```
USE_POLICY.md ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Bode Acceptable Use Policy
2
+
3
+ Bode was obtained from fine-tuning Llama 2, so we followed the same Use Policy established by Meta. If you access or use Bode, you agree to this Acceptable Use Policy (“Policy”). The most recent copy of this policy can be found at [ai.meta.com/llama/use-policy](http://ai.meta.com/llama/use-policy).
4
+
5
+ ## Prohibited Uses
6
+ We want everyone to use Bode safely and responsibly. You agree you will not use, or allow others to use, Bode to:
7
+
8
+ 1. Violate the law or others’ rights, including to:
9
+ 1. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as:
10
+ 1. Violence or terrorism
11
+ 2. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material
12
+ 3. Human trafficking, exploitation, and sexual violence
13
+ 4. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials.
14
+ 5. Sexual solicitation
15
+ 6. Any other criminal activity
16
+ 2. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals
17
+ 3. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services
18
+ 4. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices
19
+ 5. Collect, process, disclose, generate, or infer health, demographic, or other sensitive personal or private information about individuals without rights and consents required by applicable laws
20
+ 6. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Bode Materials
21
+ 7. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system
22
+
23
+
24
+
25
+ 2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Bode related to the following:
26
+ 1. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State
27
+ 2. Guns and illegal weapons (including weapon development)
28
+ 3. Illegal drugs and regulated/controlled substances
29
+ 4. Operation of critical infrastructure, transportation technologies, or heavy machinery
30
+ 5. Self-harm or harm to others, including suicide, cutting, and eating disorders
31
+ 6. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual
32
+
33
+
34
+
35
+ 3. Intentionally deceive or mislead others, including use of Bode related to the following:
36
+ 1. Generating, promoting, or furthering fraud or the creation or promotion of disinformation
37
+ 2. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content
38
+ 3. Generating, promoting, or further distributing spam
39
+ 4. Impersonating another individual without consent, authorization, or legal right
40
+ 5. Representing that the use of Bode or outputs are human-generated
41
+ 6. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement
42
+ 4. Fail to appropriately disclose to end users any known dangers of your AI system
43
+
44
+ Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means:
45
+
46
+ * Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama: [LlamaUseReport@meta.com](mailto:LlamaUseReport@meta.com)
47
+
bode-13b-alpaca-f16.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1ed6138128eae9db8f4241a82b14f22ee5eaa02a7f8099f40a183b66b86c7568
3
+ size 26033303520
bode-13b-alpaca-q4_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c1e3a63d238810679b31dfd35ff87f0cd5f27c4cff935c5a62f322b61b5a3814
3
+ size 7365834752
bode-13b-alpaca-q4_k_m.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:28289a9f6ef85f327915e8cc035a322d5fa5fcb89262bd6f091cf5061d5c9590
3
+ size 7865956352
bode-13b-alpaca-q5_k_m.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9109e6486c353e6963b7d007cc3092fd7a8c755fa1e692192982dd94ffe57e7a
3
+ size 9229924352
bode-13b-alpaca-q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0f8383130426cb14d61fee5c21e1b376ff3099aee1665e62af83ef29e441a771
3
+ size 13831319520
config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "model_type": "llama"
3
+ }