Upload 9 files

Browse files

Files changed (10) hide show

.gitattributes +5 -0
Logo_Bode_LLM_GGUF.jpeg +0 -0
README.md +122 -0
USE_POLICY.md +47 -0
bode-13b-alpaca-f16.gguf +3 -0
bode-13b-alpaca-q4_0.gguf +3 -0
bode-13b-alpaca-q4_k_m.gguf +3 -0
bode-13b-alpaca-q5_k_m.gguf +3 -0
bode-13b-alpaca-q8_0.gguf +3 -0
config.json +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,8 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+bode-13b-alpaca-f16.gguf filter=lfs diff=lfs merge=lfs -text
+bode-13b-alpaca-q4_0.gguf filter=lfs diff=lfs merge=lfs -text
+bode-13b-alpaca-q4_k_m.gguf filter=lfs diff=lfs merge=lfs -text
+bode-13b-alpaca-q5_k_m.gguf filter=lfs diff=lfs merge=lfs -text
+bode-13b-alpaca-q8_0.gguf filter=lfs diff=lfs merge=lfs -text

Logo_Bode_LLM_GGUF.jpeg ADDED Viewed

README.md CHANGED Viewed

@@ -1,3 +1,125 @@
 ---
 license: mit
 ---

 ---
 license: mit
+language:
+- pt
+- en
+metrics:
+- accuracy
+- f1
+- precision
+- recall
+pipeline_tag: text-generation
+tags:
+- LLM
+- Portuguese
+- Bode
+- Alpaca
+- Llama 2
+inference: false
 ---
+# BODE - GGUF VERSION
+<!--- PROJECT LOGO -->
+<p align="center">
+  <img src="https://huggingface.co/recogna-nlp/bode-7b-alpaca-pt-br-gguf/resolve/main/Logo_Bode_LLM_GGUF.jpeg" alt="Bode Logo" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
+</p>
+Este repositório contém o modelo Bode de 7B de parâmetros em formato GGUF, na versão de 32 e 16 bits e também nas versões quantizadas de 8, 5 e 4 bits.
+Bode é um modelo de linguagem (LLM) para o português desenvolvido a partir do modelo Llama 2 por meio de fine-tuning no dataset Alpaca, traduzido para o português pelos autores do Cabrita. Este modelo é projetado para tarefas de processamento de linguagem natural em português, como geração de texto, tradução automática, resumo de texto e muito mais.
+O objetivo do desenvolvimento do BODE é suprir a escassez de LLMs para a língua portuguesa. Modelos clássicos, como o próprio LLaMa, são capazes de responder prompts em português, mas estão sujeitos a muitos erros de gramática e, por vezes, geram respostas na língua inglesa. Ainda há poucos modelos em português disponíveis para uso gratuito e, segundo nosso conhecimento, não modelos disponíveis com 13b de parâmetros ou mais treinados especificamente com dados em português.
+Acesse o [artigo](https://arxiv.org/abs/2401.02909) para mais informações sobre o Bode.
+# Sobre o formato GGUF
+O modelo no formato GGUF permite seu uso para inferência usando o llama.cpp, permitindo tanto o uso de CPU como de GPU, e outras bibliotecas e ferramentas compatíveis, como:
+* [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
+* [KoboldCpp](https://github.com/LostRuins/koboldcpp)
+* [LM Studio](https://lmstudio.ai/)
+* [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui)
+* [ctransformers](https://github.com/marella/ctransformers)
+* [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)
+## Detalhes do Modelo
+- **Modelo Base:** Llama 2
+- **Dataset de Treinamento:** Alpaca
+- **Idioma:** Português
+## Versões disponíveis
+| Quantidade de parâmetros       | PEFT | Modelo                                                                                      |
+| :-:                            | :-:  |  :-:                                                                                         |
+| 7b                             | &check; | [recogna-nlp/bode-7b-alpaca-pt-br](https://huggingface.co/recogna-nlp/bode-7b-alpaca-pt-br)  |
+| 13b                            | &check; | [recogna-nlp/bode-13b-alpaca-pt-br](https://huggingface.co/recogna-nlp/bode-13b-alpaca-pt-br)|
+| 7b                             |    | [recogna-nlp/bode-7b-alpaca-pt-br-no-peft](https://huggingface.co/recogna-nlp/bode-7b-alpaca-pt-br-no-peft)  |
+| 7b-gguf                             |    | [recogna-nlp/bode-7b-alpaca-pt-br-gguf](https://huggingface.co/recogna-nlp/bode-7b-alpaca-pt-br-gguf)  |
+## Uso
+Segue um exemplo de uso da versão quantizada de 5 bits utilizando o ctransformers e o LangChain:
+```python
+# Downloads necessários
+!pip install ctransformers
+!pip install langchain
+from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
+from langchain.chains import LLMChain
+from langchain.prompts import PromptTemplate
+from langchain.llms import CTransformers
+template = """Abaixo está uma instrução que descreve uma tarefa. Escreva uma resposta que complete adequadamente o pedido.
+### Instrução:
+{instruction}
+### Resposta:"""
+prompt = PromptTemplate(template=template, input_variables=["question"])
+llm = CTransformers(model="recogna-nlp/bode-7b-alpaca-pt-br-gguf", model_file="bode-7b-alpaca-q8_0.gguf", model_type='llama')
+llm_chain = LLMChain(prompt=prompt, llm=llm)
+response = llm_chain.run("O que é um bode?")
+print(response)
+#Exemplo de resposta obtida (pode variar devido a temperatura): Um bode é um animal de quatro patas e membros postiados atrás, com um corpo alongado e coberto por pelagem escura.
+```
+## Treinamento e Dados
+O modelo Bode foi treinado por fine-tuning a partir do modelo Llama 2 usando o dataset Alpaca em português, que consiste em um Instruction-based dataset. O treinamento foi realizado no Supercomputador Santos Dumont do LNCC, através do projeto da Fundunesp 2019/00697-8.
+## Citação
+Se você deseja utilizar o Bode em sua pesquisa, pode citar este [artigo](https://arxiv.org/abs/2401.02909) que discute o modelo com mais detalhes. Cite-o da seguinte maneira:
+```
+    @misc{bode2024,
+      title={Introducing Bode: A Fine-Tuned Large Language Model for Portuguese Prompt-Based Task},
+      author={Gabriel Lino Garcia and Pedro Henrique Paiola and Luis Henrique Morelli and Giovani Candido and Arnaldo Cândido Júnior and Danilo Samuel Jodas and Luis C. S. Afonso and Ivan Rizzo Guilherme and Bruno Elias Penteado and João Paulo Papa},
+      year={2024},
+      eprint={2401.02909},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL}
+}
+```
+## Contribuições
+Contribuições para a melhoria deste modelo são bem-vindas. Sinta-se à vontade para abrir problemas e solicitações pull.
+## Agradecimentos
+Agradecemos ao Laboratório Nacional de Computação Científica (LNCC/MCTI, Brasil) por prover os recursos de CAD do supercomputador SDumont.
+```

USE_POLICY.md ADDED Viewed

	@@ -0,0 +1,47 @@

+# Bode Acceptable Use Policy
+Bode was obtained from fine-tuning Llama 2, so we followed the same Use Policy established by Meta. If you access or use Bode, you agree to this Acceptable Use Policy (“Policy”). The most recent copy of this policy can be found at [ai.meta.com/llama/use-policy](http://ai.meta.com/llama/use-policy).
+## Prohibited Uses
+We want everyone to use Bode safely and responsibly. You agree you will not use, or allow others to use, Bode to:
+1. Violate the law or others’ rights, including to:
+    1. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as:
+        1. Violence or terrorism
+        2. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material
+        3. Human trafficking, exploitation, and sexual violence
+        4. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials.
+        5. Sexual solicitation
+        6. Any other criminal activity
+    2. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals
+    3. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services
+    4. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices
+    5. Collect, process, disclose, generate, or infer health, demographic, or other sensitive personal or private information about individuals without rights and consents required by applicable laws
+    6. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Bode Materials
+    7. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system
+2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Bode related to the following:
+    1. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State
+    2. Guns and illegal weapons (including weapon development)
+    3. Illegal drugs and regulated/controlled substances
+    4. Operation of critical infrastructure, transportation technologies, or heavy machinery
+    5. Self-harm or harm to others, including suicide, cutting, and eating disorders
+    6. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual
+3. Intentionally deceive or mislead others, including use of Bode related to the following:
+    1. Generating, promoting, or furthering fraud or the creation or promotion of disinformation
+    2. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content
+    3. Generating, promoting, or further distributing spam
+    4. Impersonating another individual without consent, authorization, or legal right
+    5. Representing that the use of Bode or outputs are human-generated
+    6. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement
+4. Fail to appropriately disclose to end users any known dangers of your AI system
+Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means:
+* Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama: [LlamaUseReport@meta.com](mailto:LlamaUseReport@meta.com)

bode-13b-alpaca-f16.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1ed6138128eae9db8f4241a82b14f22ee5eaa02a7f8099f40a183b66b86c7568
+size 26033303520

bode-13b-alpaca-q4_0.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c1e3a63d238810679b31dfd35ff87f0cd5f27c4cff935c5a62f322b61b5a3814
+size 7365834752

bode-13b-alpaca-q4_k_m.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:28289a9f6ef85f327915e8cc035a322d5fa5fcb89262bd6f091cf5061d5c9590
+size 7865956352

bode-13b-alpaca-q5_k_m.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9109e6486c353e6963b7d007cc3092fd7a8c755fa1e692192982dd94ffe57e7a
+size 9229924352

bode-13b-alpaca-q8_0.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0f8383130426cb14d61fee5c21e1b376ff3099aee1665e62af83ef29e441a771
+size 13831319520

config.json ADDED Viewed

	@@ -0,0 +1,3 @@

+{
+    "model_type": "llama"
+}