metadata

library_name: transformers
tags: []

Model Card for Meta-Llama-3-8B

Meta-Llama-3-8B is an advanced language model developed by Meta, part of the Llama 3 family, optimized for text generation and natural language understanding tasks. This model leverages transformer architecture and is available in pre-trained and instruction-tuned variants.

Model Details

Model Description

This is the model card for the Meta-Llama-3-8B, a part of the Llama 3 model family which includes models with 8 billion and 70 billion parameters. The model is pre-trained on a diverse dataset of publicly available text and is designed for both research and commercial use, particularly for applications requiring natural language understanding and generation.

Developed by: Meta
Model type: Auto-regressive language model
Language(s) (NLP): English
License: Meta's custom commercial license
Finetuned from model: Not applicable

Model Sources [optional]

Repository: Meta-Llama-3-8B on Hugging Face
Paper [optional]: Meta AI Blog
Demo [optional]: Hugging Face Chat Demo

Uses

Direct Use

The model can be used directly for text generation tasks, including but not limited to chatbots, content creation, and interactive applications.

Downstream Use [optional]

When fine-tuned, the model can be adapted for specific tasks such as sentiment analysis, question answering, and more.

Out-of-Scope Use

The model should not be used for generating harmful content, spam, impersonation, or any other unethical activities as outlined in Meta's usage policy.

Bias, Risks, and Limitations

Recommendations

Users should be aware of the model's limitations and biases. It is recommended to use the model responsibly and with caution, especially in sensitive applications.

How to Get Started with the Model

from transformers import pipeline
import torch

model_id = "meta-llama/Meta-Llama-3-8B"
pipeline = pipeline(
    "text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto"
)
print(pipeline("Hello, how are you today?"))

Training Details

Training Data

The model was trained on a mix of publicly available online data, totaling over 15 trillion tokens.

Training Procedure

The model utilizes Grouped-Query Attention (GQA) for improved inference scalability and was trained using supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF).

Training Hyperparameters

Training regime: Mixed precision (bfloat16)

Evaluation

Testing Data, Factors & Metrics

Testing Data

Testing was performed using standard natural language processing benchmarks.

Metrics

Metrics include standard NLP benchmarks for language models, such as perplexity, accuracy, and other task-specific measures.

Results

Llama 3 models have demonstrated strong performance on various benchmarks, showing competitive results with other large-scale language models.

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator.

Hardware Type: NVIDIA A100 GPUs
Hours used: Not specified
Cloud Provider: Not used
Compute Region: Not specified
Carbon Emitted: Not specified

Technical Specifications

Llama 3 sharded model for an easier inference and fine tuning process on lower to mid end processing systems.

Model Architecture and Objective

Llama 3 is an auto-regressive transformer-based language model designed for text generation and understanding.

Compute Infrastructure

Hardware

Training utilized a cluster of NVIDIA A100 GPUs.

Software

The model is compatible with PyTorch and Hugging Face's transformers library.

Citation

BibTeX:

@article{meta2024llama3,
  title={Meta Llama 3: An Open-Source Large Language Model},
  author={Meta AI},
  journal={Meta AI Blog},
  year={2024},
  url={https://ai.meta.com/blog/meta-llama-3/}
}

APA:

Meta AI. (2024). Meta Llama 3: An Open-Source Large Language Model. Meta AI Blog. Retrieved from https://ai.meta.com/blog/meta-llama-3/

Glossary

Auto-regressive model: A type of model that generates sequences by predicting the next element based on previous elements.
Transformer architecture: A neural network architecture designed for handling sequential data, particularly for tasks in NLP.

More Information

For more details about Meta, visit the Meta Llama website.

Model Card Authors

realshyfox

Model Card Contact

For any questions or feedback, please contact the Meta AI team via their GitHub repository.