willyninja30 commited on
Commit
818dc71
1 Parent(s): 931e4bd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -30
README.md CHANGED
@@ -16,7 +16,7 @@ tags:
16
  - languages
17
  pipeline_tag: text-generation
18
  ---
19
- This model is the last version of Llama 2 70B finetuned over 50.000 high quality french tokens. We built our own dataset for training doing an extract of the French Dataset from Enno and removing Alpaca style translated text from english.
20
 
21
  The goal is to increase model quality on French and general topics.
22
 
@@ -36,7 +36,6 @@ We are also applying rope scalling as experimental approach used by several othe
36
  ## Model Details /
37
  *Note: Use of this model is governed by the Meta license. In order to download the model weights and tokenizer, please visit the [website](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) and accept our License before requesting access here.*
38
 
39
- Meta developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama-2-Chat, are optimized for dialogue use cases. Llama-2-Chat models outperform open-source chat models on most benchmarks we tested, and in our human evaluations for helpfulness and safety, are on par with some popular closed-source models like ChatGPT and PaLM.
40
 
41
  **Model Developers** Meta
42
 
@@ -48,39 +47,11 @@ Meta developed and publicly released the Llama 2 family of large language models
48
 
49
  **Model Architecture** Llama 2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align to human preferences for helpfulness and safety.
50
 
51
-
52
- ||Training Data|Params|Content Length|GQA|Tokens|LR|
53
- |---|---|---|---|---|---|---|
54
- |Llama 2|*A new mix of publicly available online data*|7B|4k|&#10007;|2.0T|3.0 x 10<sup>-4</sup>|
55
- |Llama 2|*A new mix of publicly available online data*|13B|4k|&#10007;|2.0T|3.0 x 10<sup>-4</sup>|
56
- |Llama 2|*A new mix of publicly available online data*|70B|4k|&#10004;|2.0T|1.5 x 10<sup>-4</sup>|
57
-
58
- *Llama 2 family of models.* Token counts refer to pretraining data only. All models are trained with a global batch-size of 4M tokens. Bigger models - 70B -- use Grouped-Query Attention (GQA) for improved inference scalability.
59
-
60
- **Model Dates** Llama 2 was trained between January 2023 and July 2023.
61
-
62
- **Status** This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we improve model safety with community feedback.
63
-
64
  **License** A custom commercial license is available at: [https://ai.meta.com/resources/models-and-libraries/llama-downloads/](https://ai.meta.com/resources/models-and-libraries/llama-downloads/)
65
 
66
  **Research Paper** ["Llama-2: Open Foundation and Fine-tuned Chat Models"](arxiv.org/abs/2307.09288)
67
 
68
- ## Intended Use
69
- **Intended Use Cases** Llama 2 is intended for commercial and research use in English. Tuned models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks.
70
 
71
- **Out-of-scope Uses** Use in any manner that violates applicable laws or regulations (including trade compliance laws).Use in languages other than English. Use in any other way that is prohibited by the Acceptable Use Policy and Licensing Agreement for Llama 2.
72
-
73
- ## Hardware and Software
74
- **Training Factors** We used custom training libraries, Meta's Research Super Cluster, and production clusters for pretraining. Fine-tuning, annotation, and evaluation were also performed on third-party cloud compute.
75
-
76
- **Carbon Footprint** Pretraining utilized a cumulative 3.3M GPU hours of computation on hardware of type A100-80GB (TDP of 350-400W). Estimated total emissions were 539 tCO2eq, 100% of which were offset by Meta’s sustainability program.
77
-
78
- ||Time (GPU hours)|Power Consumption (W)|Carbon Emitted(tCO<sub>2</sub>eq)|
79
- |---|---|---|---|
80
- |Llama 2 7B|184320|400|31.22|
81
- |Llama 2 13B|368640|400|62.44|
82
- |Llama 2 70B|1720320|400|291.42|
83
- |Total|3311616||539.00|
84
 
85
  **CO<sub>2</sub> emissions during pretraining.** Time: total GPU time required for training each model. Power Consumption: peak power capacity per GPU device for the GPUs used adjusted for power usage efficiency. 100% of the emissions are directly offset by Meta's sustainability program, and because we are openly releasing these models, the pretraining costs do not need to be incurred by others.
86
 
 
16
  - languages
17
  pipeline_tag: text-generation
18
  ---
19
+ ARIA is the last version of Llama 2 70B finetuned over 50.000 high quality french tokens. We built our own dataset for training doing an extract of the French Dataset from Enno and removing Alpaca style translated text from english.
20
 
21
  The goal is to increase model quality on French and general topics.
22
 
 
36
  ## Model Details /
37
  *Note: Use of this model is governed by the Meta license. In order to download the model weights and tokenizer, please visit the [website](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) and accept our License before requesting access here.*
38
 
 
39
 
40
  **Model Developers** Meta
41
 
 
47
 
48
  **Model Architecture** Llama 2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align to human preferences for helpfulness and safety.
49
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
  **License** A custom commercial license is available at: [https://ai.meta.com/resources/models-and-libraries/llama-downloads/](https://ai.meta.com/resources/models-and-libraries/llama-downloads/)
51
 
52
  **Research Paper** ["Llama-2: Open Foundation and Fine-tuned Chat Models"](arxiv.org/abs/2307.09288)
53
 
 
 
54
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
 
56
  **CO<sub>2</sub> emissions during pretraining.** Time: total GPU time required for training each model. Power Consumption: peak power capacity per GPU device for the GPUs used adjusted for power usage efficiency. 100% of the emissions are directly offset by Meta's sustainability program, and because we are openly releasing these models, the pretraining costs do not need to be incurred by others.
57