library_name: transformers
tags: []
Model Card for Meta-Llama-3-8B
Meta-Llama-3-8B is an advanced language model developed by Meta, part of the Llama 3 family, optimized for text generation and natural language understanding tasks. This model leverages transformer architecture and is available in pre-trained and instruction-tuned variants.
Model Details
Model Description
This is the model card for the Meta-Llama-3-8B, a part of the Llama 3 model family which includes models with 8 billion and 70 billion parameters. The model is pre-trained on a diverse dataset of publicly available text and is designed for both research and commercial use, particularly for applications requiring natural language understanding and generation.
- Developed by: Meta
- Model type: Auto-regressive language model
- Language(s) (NLP): English
- License: Meta's custom commercial license
- Finetuned from model: Not applicable
Model Sources [optional]
- Repository: Meta-Llama-3-8B on Hugging Face
- Paper [optional]: Meta AI Blog
- Demo [optional]: Hugging Face Chat Demo
Uses
Direct Use
The model can be used directly for text generation tasks, including but not limited to chatbots, content creation, and interactive applications.
Downstream Use [optional]
When fine-tuned, the model can be adapted for specific tasks such as sentiment analysis, question answering, and more.
Out-of-Scope Use
The model should not be used for generating harmful content, spam, impersonation, or any other unethical activities as outlined in Meta's usage policy.
Bias, Risks, and Limitations
Recommendations
Users should be aware of the model's limitations and biases. It is recommended to use the model responsibly and with caution, especially in sensitive applications.
How to Get Started with the Model
from transformers import pipeline
import torch
model_id = "meta-llama/Meta-Llama-3-8B"
pipeline = pipeline(
"text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto"
)
print(pipeline("Hello, how are you today?"))
Training Details
Training Data
The model was trained on a mix of publicly available online data, totaling over 15 trillion tokens.
Training Procedure
The model utilizes Grouped-Query Attention (GQA) for improved inference scalability and was trained using supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF).
Training Hyperparameters
- Training regime: Mixed precision (bfloat16)
Evaluation
Testing Data, Factors & Metrics
Testing Data
Testing was performed using standard natural language processing benchmarks.
Metrics
Metrics include standard NLP benchmarks for language models, such as perplexity, accuracy, and other task-specific measures.
Results
Llama 3 models have demonstrated strong performance on various benchmarks, showing competitive results with other large-scale language models.
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator.
- Hardware Type: NVIDIA A100 GPUs
- Hours used: Not specified
- Cloud Provider: Not used
- Compute Region: Not specified
- Carbon Emitted: Not specified
Technical Specifications
Llama 3 sharded model for an easier inference and fine tuning process on lower to mid end processing systems.
Model Architecture and Objective
Llama 3 is an auto-regressive transformer-based language model designed for text generation and understanding.
Compute Infrastructure
Hardware
Training utilized a cluster of NVIDIA A100 GPUs.
Software
The model is compatible with PyTorch and Hugging Face's transformers library.
Citation
BibTeX:
@article{meta2024llama3,
title={Meta Llama 3: An Open-Source Large Language Model},
author={Meta AI},
journal={Meta AI Blog},
year={2024},
url={https://ai.meta.com/blog/meta-llama-3/}
}
APA:
Meta AI. (2024). Meta Llama 3: An Open-Source Large Language Model. Meta AI Blog. Retrieved from https://ai.meta.com/blog/meta-llama-3/
Glossary
- Auto-regressive model: A type of model that generates sequences by predicting the next element based on previous elements.
- Transformer architecture: A neural network architecture designed for handling sequential data, particularly for tasks in NLP.
More Information
For more details about Meta, visit the Meta Llama website.
Model Card Authors
realshyfox
Model Card Contact
For any questions or feedback, please contact the Meta AI team via their GitHub repository.