Hu Zang's picture

4 51

Hu Zang

zanghu

AI & ML interests

None yet

Recent Activity

reacted to joaogante's post with 🤗 3 days ago

New sampling strategy dropped in 🤗 transformers -- Min P sampling 🔥 Are you tired of having `top_k` arbitrarily discarding high-quality continuations? Or `top_p` forgetting to exclude low-probability tokens, derailing your generation? Try out the new `min_p` flag in `generate`, fresh from a PR merged today! 🥬 Min P consists of a dynamic token filter -- as opposed to Top K, which keeps the K most likely tokens, and Top P, which keeps the most likely tokens up to a fixed cumulative probability, both static filters. Min P takes a base probability (defined in the `min_p` flag) and multiplies it by the probability of the most likely token in the distribution for the next token. All tokens less likely than the resulting value are filtered. What happens with this strategy? 👉 High probability token present -> aggressive filter (we don't want to miss on that high-probability case and risk derailing generation) 👉 No high probability token present -> relaxed filter (there are many continuation possibilities that the model finds plausible) You should set `min_p` to a low value, between 0.05 and 0.1. It behaves particularly well for creative text generation when paired up with temperature > 1. Kudos to @kalomaze and @menhguin for creating this technique 🔥 Read their discussion in the original issue for benchmarks (https://github.com/huggingface/transformers/issues/27670) Copy-pasteable version of the example in the image below here: https://pastebin.com/VqXNtuxd Have fun experimenting! 😎

reacted to joaogante's post with 👍 3 days ago

New sampling strategy dropped in 🤗 transformers -- Min P sampling 🔥 Are you tired of having `top_k` arbitrarily discarding high-quality continuations? Or `top_p` forgetting to exclude low-probability tokens, derailing your generation? Try out the new `min_p` flag in `generate`, fresh from a PR merged today! 🥬 Min P consists of a dynamic token filter -- as opposed to Top K, which keeps the K most likely tokens, and Top P, which keeps the most likely tokens up to a fixed cumulative probability, both static filters. Min P takes a base probability (defined in the `min_p` flag) and multiplies it by the probability of the most likely token in the distribution for the next token. All tokens less likely than the resulting value are filtered. What happens with this strategy? 👉 High probability token present -> aggressive filter (we don't want to miss on that high-probability case and risk derailing generation) 👉 No high probability token present -> relaxed filter (there are many continuation possibilities that the model finds plausible) You should set `min_p` to a low value, between 0.05 and 0.1. It behaves particularly well for creative text generation when paired up with temperature > 1. Kudos to @kalomaze and @menhguin for creating this technique 🔥 Read their discussion in the original issue for benchmarks (https://github.com/huggingface/transformers/issues/27670) Copy-pasteable version of the example in the image below here: https://pastebin.com/VqXNtuxd Have fun experimenting! 😎

liked a Space 16 days ago

bigcode/bigcode-models-leaderboard

View all activity

Organizations

None yet

zanghu's activity

New activity in Qwen/Qwen2.5-0.5B 21 days ago

FIM-Tokens not marked special

#4 opened 2 months ago by

New activity in IDEA-CCNL/Wenzhong2.0-GPT2-3.5B-chinese about 1 month ago

为什么这个模型的vacub.json文件里没有中文？疑惑

#1 opened over 2 years ago by

New activity in moka-ai/m3e-large over 1 year ago

怎么感觉large 没有明显的优势

#1 opened over 1 year ago by

New activity in code-search-net/code_search_net almost 2 years ago

Source data files are not accessible: 403 Forbidden error

#3 opened almost 2 years ago by

albertvillanova