What is the max sequence length that model can compute if I use flash attention?
#20 opened 23 days ago
by
halfmoon039
Do I need to apply_chat_template before Supervised Fine-tuning Gemma-1.1-7b-it?
#19 opened 27 days ago
by
Syax19
Is 1.1 trained from the same SFT model as 1.0?
#18 opened about 1 month ago
by
chujiezheng
finetunr error. "triu_tril_cuda_template" not implemented for 'BFloat16'
1
#17 opened about 1 month ago
by
Saicy
Update README.md
#16 opened about 1 month ago
by
ssalvo41
TemplateError: System role not supported
3
#15 opened about 1 month ago
by
luogy
Consider adding <start_of_context> and <stop_of_context> or similar special tokens for context ingestion.
#13 opened about 1 month ago
by
qnixsynapse
loss padding_side
#12 opened about 1 month ago
by
NickyNicky
Why is this completely broken?
2
#11 opened about 1 month ago
by
rombodawg
Number of parameters
7
#9 opened about 1 month ago
by
HugoLaurencon