togethercomputer/LLaMA-2-7B-32K

research title suggestion

#40 opened about 2 months ago by

AbdulsamadW

togethercomputer/LLaMA-2-7B-32K Can be used for zeroshot classification?

#39 opened 6 months ago by

Gonzalomoreno01

Adding `safetensors` variant of this model

#38 opened 8 months ago by

SFconvertbot

Adding `safetensors` variant of this model

#36 opened 11 months ago by

SFconvertbot

Adding Evaluation Results

#34 opened about 1 year ago by

leaderboard-pr-bot

How to use on GPU

#33 opened about 1 year ago by

p2991459

Could you please explain more details about fine-tuning LLaMA-2-7B to LLaMA-2-7B-32k? Such as the fine-tuning steps and batch size. Thanks!

#32 opened about 1 year ago by

Mooler

Adding `safetensors` variant of this model

#30 opened about 1 year ago by

efy9002

RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'

4

#29 opened about 1 year ago by

shubhamagarwal92

Using the Accelerate API to train models on multiple GPUs

8

#28 opened about 1 year ago by

ajash

Keep getting error while loading tokenizer = AutoTokenizer.from_pretrained("togethercomputer/LLaMA-2-7B-32K")

5

#27 opened about 1 year ago by

AIHero123

[AUTOMATED] Model Memory Requirements

#26 opened about 1 year ago by

model-sizer-bot

Installing ! pip install git+https://github.com/HazyResearch/flash-attention.git#subdirectory=csrc/rotary but flah_llama still erroring out

4

#25 opened about 1 year ago by

ajash

Adding `safetensors` variant of this model

#24 opened about 1 year ago by

brigs

Quantizations for llama.cpp

4

#23 opened about 1 year ago by

rozek

ENDPOINT CONFIGURATION ON AWS SAGEMAKER

1

#21 opened about 1 year ago by

NABARKA

Adding `safetensors` variant of this model

#20 opened about 1 year ago by

efy9002

protofile.proto: A file with this name is already in the pool

1

#19 opened about 1 year ago by

surya-narayanan

Is LLaMA-2-7B-32K already fine-tuned for answering questions from long text?

1

#18 opened over 1 year ago by

MathewOpt

Fix RuntimeError: pad attn scores back to original query sequence length, instead of unpadded sequence length (i.e. no change).

1

#17 opened over 1 year ago by

Birchlabs

How can specific information be eliminated in a LLM?

#16 opened over 1 year ago by

kiopuy

!pip install flash-attn --no-build-isolation

3

#15 opened over 1 year ago by

NivYO

Instead of flash_attn it should be flash_attn_2_cuda . This is causing a deployment issue in TGI/DJL

1

#14 opened over 1 year ago by

monuminu

RoPE scaling and max_position_embeddings

2

#12 opened over 1 year ago by

ag0

getting strange tokens after finetuning on Qlora

2

#11 opened over 1 year ago by

monuminu

Training diverges when used with Llama 2 70B and 4-bit QLoRA

3

#10 opened over 1 year ago by

alyssavance

Do you have plans on making a chat model(LLama-2-7B-32k-Chat)? If so, any idea when it would come out?

3

#9 opened over 1 year ago by

barpy

How to training a llama-2-7B-32k from llama-2-7B?

7

#8 opened over 1 year ago by

Sayoyo

Upload introduction_cn.ipynb

#7 opened over 1 year ago by

DORA1222

Problem with generating anything

3

#6 opened over 1 year ago by

wempoo

Plans for 13b version?

3

#5 opened over 1 year ago by

rombodawg

GGML Version

8

#4 opened over 1 year ago by

s3nh

Model on your API Playground

7

#3 opened over 1 year ago by

1littlecoder

how to fine tune peft qlora and SFTTrainer?

12

#2 opened over 1 year ago by

NickyNicky

What is the VRAM requirement of this model?

5

#1 opened over 1 year ago by

Said2k