Transformers documentation

LLaMA

Transformers

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v4.56.0).

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

LLaMA

개요

LLaMA 모델은 Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample에 의해 제안된 LLaMA: Open and Efficient Foundation Language Models에서 소개되었습니다. 이 모델은 7B에서 65B개의 파라미터까지 다양한 크기의 기초 언어 모델을 모아놓은 것입니다.

논문의 초록은 다음과 같습니다:

“LLaMA는 7B에서 65B개의 파라미터 수를 가진 기초 언어 모델의 모음입니다. 우리는 수조 개의 토큰으로 모델을 훈련시켰고, 공개적으로 이용 가능한 데이터셋만을 사용하여 최고 수준의 모델을 훈련시킬 수 있음을 보여줍니다. 특히, LLaMA-13B 모델은 대부분의 벤치마크에서 GPT-3 (175B)를 능가하며, LLaMA-65B는 최고 수준의 모델인 Chinchilla-70B와 PaLM-540B에 버금가는 성능을 보입니다. 우리는 모든 모델을 연구 커뮤니티에 공개합니다.”

팁:

LLaMA 모델의 가중치는 이 양식을 작성하여 얻을 수 있습니다.
가중치를 다운로드한 후에는 이를 변환 스크립트를 사용하여 Hugging Face Transformers 형식으로 변환해야합니다. 변환 스크립트를 실행하려면 아래의 예시 명령어를 참고하세요:

python src/transformers/models/llama/convert_llama_weights_to_hf.py \
    --input_dir /path/to/downloaded/llama/weights --model_size 7B --output_dir /output/path

변환을 하였다면 모델과 토크나이저는 다음과 같이 로드할 수 있습니다:

from transformers import LlamaForCausalLM, LlamaTokenizer

tokenizer = LlamaTokenizer.from_pretrained("/output/path")
model = LlamaForCausalLM.from_pretrained("/output/path")

스크립트를 실행하기 위해서는 모델을 float16 정밀도로 전부 로드할 수 있을 만큼의 충분한 CPU RAM이 필요합니다. (가장 큰 버전의 모델이 여러 체크포인트로 나뉘어 있더라도, 각 체크포인트는 모델의 각 가중치의 일부를 포함하고 있기 때문에 모든 체크포인트를 RAM에 로드해야 합니다) 65B 모델의 경우, 총 130GB의 RAM이 필요합니다.

LLaMA 토크나이저는 sentencepiece를 기반으로 하는 BPE 모델입니다. sentencepiece의 특징 중 하나는 시퀀스를 디코딩할 때 첫 토큰이 단어의 시작이라면 (예를 들어 “Banana”), 토크나이저는 문자열 앞에 공백을 추가하지 않는다는 것입니다.

이 모델은 BlackSamorez의 기여와 함께, zphang에 의해 제공되었습니다. Hugging Face에서의 구현 코드는 GPT-NeoX를 기반으로 하며 여기에서 찾을 수 있고, 저자의 코드 원본은 여기에서 확인할 수 있습니다.

원래 LLaMA 모델을 기반으로 Meta AI에서 몇 가지 후속 작업을 발표했습니다:

Llama2: Llama2는 구조적인 몇 가지 수정(Grouped Query Attention)을 통해 개선된 버전이며, 2조 개의 토큰으로 사전 훈련이 되어 있습니다. Llama2에 대한 자세한 내용은 이 문서를 참고하세요.

리소스

LLaMA를 시작하는 데 도움이 될 Hugging Face 및 커뮤니티(🌎로 표시)의 공식 자료 목록입니다. 여기에 자료를 제출하고 싶다면 Pull Request를 올려주세요! 추가할 자료는 기존의 자료와 중복되지 않고 새로운 내용을 보여주는 것이 좋습니다.

Text Classification

LLaMA 모델을 텍스트 분류 작업에 적용하기 위한 프롬프트 튜닝 방법에 대한 노트북 🌎

Question Answering

Stack Exchange에서 질문에 답하는 LLaMA를 훈련하는 방법을 위한 StackLLaMA: RLHF로 LLaMA를 훈련하는 실전 가이드 🌎

⚗️ 최적화

제한된 메모리를 가진 GPU에서 xturing 라이브러리를 사용하여 LLaMA 모델을 미세 조정하는 방법에 대한 노트북 🌎

⚡️ 추론

🤗 PEFT 라이브러리의 PeftModel을 사용하여 LLaMA 모델을 실행하는 방법에 대한 노트북 🌎
LangChain을 사용하여 PEFT 어댑터 LLaMA 모델을 로드하는 방법에 대한 노트북 🌎

🚀 배포

🤗 PEFT 라이브러리와 사용자 친화적인 UI로 LLaMA 모델을 미세 조정하는 방법에 대한 노트북 🌎
Amazon SageMaker에서 텍스트 생성을 위해 Open-LLaMA 모델을 배포하는 방법에 대한 노트북 🌎

Transformers

LLaMA

개요

리소스

LlamaConfig

class transformers.LlamaConfig

LlamaTokenizer

class transformers.LlamaTokenizer

build_inputs_with_special_tokens

get_special_tokens_mask

create_token_type_ids_from_sequences

save_vocabulary

LlamaTokenizerFast

class transformers.LlamaTokenizerFast

build_inputs_with_special_tokens

get_special_tokens_mask

create_token_type_ids_from_sequences

update_post_processor

save_vocabulary

LlamaModel

class transformers.LlamaModel

forward

LlamaForCausalLM

class transformers.LlamaForCausalLM

forward

LlamaForSequenceClassification

class transformers.LlamaForSequenceClassification

forward