Joao Gante

joaogante

AI & ML interests

None yet

Articles

Organizations

joaogante's activity

posted an update 14 days ago
view post
Post
1755
New sampling strategy dropped in ๐Ÿค— transformers -- Min P sampling ๐Ÿ”ฅ

Are you tired of having top_k arbitrarily discarding high-quality continuations? Or top_p forgetting to exclude low-probability tokens, derailing your generation? Try out the new min_p flag in generate, fresh from a PR merged today! ๐Ÿฅฌ

Min P consists of a dynamic token filter -- as opposed to Top K, which keeps the K most likely tokens, and Top P, which keeps the most likely tokens up to a fixed cumulative probability, both static filters. Min P takes a base probability (defined in the min_p flag) and multiplies it by the probability of the most likely token in the distribution for the next token. All tokens less likely than the resulting value are filtered. What happens with this strategy?
๐Ÿ‘‰ High probability token present -> aggressive filter (we don't want to miss on that high-probability case and risk derailing generation)
๐Ÿ‘‰ No high probability token present -> relaxed filter (there are many continuation possibilities that the model finds plausible)

You should set min_p to a low value, between 0.05 and 0.1. It behaves particularly well for creative text generation when paired up with temperature > 1.

Kudos to @kalomaze and @menhguin for creating this technique ๐Ÿ”ฅ Read their discussion in the original issue for benchmarks (https://github.com/huggingface/transformers/issues/27670)

Copy-pasteable version of the example in the image below here: https://pastebin.com/VqXNtuxd

Have fun experimenting! ๐Ÿ˜Ž
posted an update 24 days ago
view post
Post
2129
Adding a long prompt can help you fight LLM hallucinations. However, if you know exactly how you want your LLM output constrained, there are much better strategies! ๐Ÿ’ช

Did you know you can force your LLM to ALWAYS generate a valid JSON file? Or to follow a well-defined answer template? You can do that and more with the ๐Ÿค— transformers-compatible outlines library.

It doesn't only allow you to master your LLM -- your text generation application will also become faster! ๐Ÿ”ฅ The more constrained your text generation is, the bigger speedups you'll see!

Follow @remi and other outlines folks to stay on top of the constrained generation game ๐Ÿง 
replied to mayank-mishra's post 2 months ago
view reply

In transformers the main blocker is backward compatibility -- we assume in many places that batched inputs come with fixed input length. Once we lift this requirement without breaking backward compatibility, it should be a nice addition! ๐Ÿ‘

(Perhaps nested tensors will help)

replied to their post 4 months ago
view reply

@MaziyarPanahi no accuracy penalty at all :) The only catch on the transformers side is that you are limited to a batch size of one (and even that is not a technical limitation of the technique -- we simply haven't built that code path yet)

posted an update 4 months ago
view post
Post
Up to 3x faster LLM generation with no extra resources/requirements - ngram speculation has landed in ๐Ÿค— transformers! ๐ŸŽ๏ธ๐Ÿ’จ

All you need to do is to add prompt_lookup_num_tokens=10 to your generate call, and you'll get faster LLMs ๐Ÿ”ฅ


How does it work? ๐Ÿค”

Start with assisted generation, where a smaller model generates candidate sequences. The net result is a significant speedup if the model agrees with the candidate sequences! However, we do require a smaller model trained similarly ๐Ÿ˜•

The idea introduced (and implemented) by Apoorv Saxena consists of gathering the candidate sequences from the input text itself. If the latest generated ngram is in the input, use the continuation therein as a candidate! No smaller model is required while still achieving significant speedups ๐Ÿ”ฅ

In fact, the penalty of gathering and testing the candidates is so small that you should use this technique whenever possible!

Here is the code example that produces the outputs shown in the video: https://pastebin.com/bms6XtR4

Have fun ๐Ÿค—
  • 3 replies
ยท