Li Tan PRO
tanliboy
AI & ML interests
None yet
Organizations
tanliboy's activity
Text Classification with LLMs
4
#30 opened 24 days ago
by
dss107
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
4
#120 opened 5 days ago
by
erildo
Have you deleted your GitHub page?
2
#10 opened about 8 hours ago
by
xwzy6
Sliding window vs. Global Attention
5
#41 opened 15 days ago
by
tanliboy
GSM8K Evaluation Result: 84.5 vs. 76.95
8
#81 opened about 1 month ago
by
tanliboy
Gemma2-2b training uses much more momory!
1
#23 opened 5 days ago
by
bubbleseller
GemmaSdpaAttention vs GemmaAttention
2
#71 opened 6 days ago
by
canqin001
Fix Llama 3.1 Chat Template to Properly Handle add_generation_prompt
9
#26 opened 15 days ago
by
Tostino
🍭 Fine-tuning support for Qwen2-VL-7B-Instruct
4
#1 opened 6 days ago
by
study-hjt
Batch Inference causes degraded performance
1
#43 opened 9 days ago
by
tanliboy
Evaluation Result
#15 opened 12 days ago
by
tanliboy
How is this dataset supposed to be used to evaluate the model?
4
#1 opened 19 days ago
by
realdanielbyrne
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
2
#18 opened about 1 month ago
by
lcahill
Llama-3-Instruct with Langchain keeps talking to itself
10
#147 opened 2 months ago
by
fahim9778
Pruning
7
#24 opened 19 days ago
by
dhivakarsa
Bad test results using lm-evaluation-harness
4
#68 opened 6 months ago
by
smart-liu
two BOS token id is right?
4
#97 opened 23 days ago
by
hpsun
Fine tuning data templates Please help
1
#32 opened 22 days ago
by
Cagatayd
add_special_tokens=False results in poor generation
3
#80 opened 5 months ago
by
DMaksimov
Why is "bos_token": null, in tokenizer_config.json?
6
#15 opened 23 days ago
by
3Simplex
TTS support?
3
#4 opened 24 days ago
by
yukiarimo
The base model doesn't generate coherently
4
#9 opened 2 months ago
by
migtissera
Fine-tuning Hyperparameters
6
#27 opened about 2 months ago
by
tanliboy
Error: size mismatch for model.layers.0.self_attn.q_proj.weight:
2
#6 opened about 2 months ago
by
tanliboy
dtype: float32 in base model vs. dtype: bfloat16 in the instruction fine-tuned model
#32 opened about 2 months ago
by
tanliboy
TypeError: arange() received an invalid combination of arguments
4
#12 opened 2 months ago
by
darrenbudiman
TypeError: 'NoneType' object cannot be interpreted as an integer
2
#3 opened 3 months ago
by
tanliboy
Crash in Fine-tuning
4
#14 opened 3 months ago
by
tanliboy
"bos_token": "<s>" vs. "<|endoftext|>"
1
#20 opened 3 months ago
by
tanliboy
Difference in chat templates between Phi-3-small-8k-instruct and Phi-3-medium-4k-instruct
1
#4 opened 4 months ago
by
tanliboy