Akhil B

hakunamatata1997

AI & ML interests

ML, DL, Gen AI , NLP , Computer Vision , XAI

Recent Activity

Organizations

hakunamatata1997's activity

Reacted to m-ric's post with ๐Ÿ”ฅ 2 days ago
view post
Post
1320
Great feature alert: ๐—ฌ๐—ผ๐˜‚ ๐—ฐ๐—ฎ๐—ป ๐—ป๐—ผ๐˜„ ๐˜‚๐˜€๐—ฒ ๐—ฎ๐—ป๐˜† ๐—ฆ๐—ฝ๐—ฎ๐—ฐ๐—ฒ ๐—ฎ๐˜€ ๐—ฎ ๐˜๐—ผ๐—ผ๐—น ๐—ณ๐—ผ๐—ฟ ๐˜†๐—ผ๐˜‚๐—ฟ ๐˜๐—ฟ๐—ฎ๐—ป๐˜€๐—ณ๐—ผ๐—ฟ๐—บ๐—ฒ๐—ฟ๐˜€.๐—ฎ๐—ด๐—ฒ๐—ป๐˜! ๐Ÿ› ๏ธ๐Ÿ”ฅ๐Ÿ”ฅ

This lets you take the coolest spaces, like FLUX.1-dev, and use them in agentic workflows with a few lines of code! ๐Ÿง‘โ€๐Ÿ’ป

On the video below, I set up my fake vacation pictures where I'm awesome at surfing (I'm really not) ๐Ÿ„

Head to the doc to learn this magic ๐Ÿ‘‰ https://huggingface.co/docs/transformers/main/en/agents_advanced#import-a-space-as-a-tool-
replied to their post 6 months ago
view reply

Tried sadtalker , too much time consumption. D-ID is proprietary . Looking something from opensource. Tried wav2lip and also enhancing that with GFPGAN , output is good but i want something fast.

posted an update 6 months ago
view post
Post
1518
I'm working on talking head generation that takes audio and video as input, can someone suggest me a good existing architecture that can generate videos with less latency or can we make it in real time?
ยท
replied to their post 6 months ago
view reply

Yeah tried QwenVL , it's poor on understanding text, QwenVL-Plus and Max are good but not open sourced ๐Ÿ˜ช

replied to their post 6 months ago
view reply

@merve more particularly if i say, something like understanding text good enough in images so the response are accurate enough from VLM

posted an update 6 months ago
view post
Post
1023
Can someone suggest me a good open source vision model which performs good at OCR?
ยท
replied to their post 6 months ago
view reply

On this point, I want to suggest a new rule- Users can upload their models to public space but once uploaded they cannot delete them ๐Ÿ˜… . What you say @clem @julien-c

posted an update 6 months ago
view post
Post
1431
Why salesforce removedSFR-Iterative-DPO-LLaMA-3-8B-R ? Any ideas?
ยท
Reacted to akhaliq's post with ๐Ÿ”ฅ 8 months ago
view post
Post
4384
Leave No Context Behind

Efficient Infinite Context Transformers with Infini-attention

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention (2404.07143)

This work introduces an efficient method to scale Transformer-based Large Language Models (LLMs) to infinitely long inputs with bounded memory and computation. A key component in our proposed approach is a new attention technique dubbed Infini-attention. The Infini-attention incorporates a compressive memory into the vanilla attention mechanism and builds in both masked local attention and long-term linear attention mechanisms in a single Transformer block. We demonstrate the effectiveness of our approach on long-context language modeling benchmarks, 1M sequence length passkey context block retrieval and 500K length book summarization tasks with 1B and 8B LLMs. Our approach introduces minimal bounded memory parameters and enables fast streaming inference for LLMs.
Reacted to lewtun's post with โค๏ธ 8 months ago
view post
Post
4771
Introducing Zephyr 141B-A35B ๐Ÿช:

HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1

Yesterday, Mistral released their latest base model (via magnet link of course ๐Ÿ˜…) and the community quickly converted it to transformers format and pushed it to the Hub: mistral-community/Mixtral-8x22B-v0.1

Early evals of this model looked extremely strong, so we teamed up with Argilla and KAIST AI to cook up a Zephyr recipe with a few new alignment techniques that came out recently:

๐Ÿง‘โ€๐Ÿณ Align the base model with Odds Ratio Preference Optimisation (ORPO). This novel algorithm developed by @JW17 and @nlee-208 and @j6mes and does not require an SFT step to achieve high performance and is thus much more computationally efficient than methods like DPO and PPO.

๐Ÿฆซ Use a brand new dataset of 7k high-quality, multi-turn preferences that has been developed by our friends at Argilla. To create this dataset, they took the excellent Capybara SFT dataset from @LDJnr LDJnr/Capybara and converted it into a preference dataset by augmenting the final turn with responses from new LLMs that were then ranked by GPT-4.

What we find especially neat about this approach is that training on 7k samples only takes ~1.3h on 4 H100 nodes, yet produces a model that is very strong on chat benchmarks like IFEval and BBH.

Kudos to @alvarobartt @JW17 and @nlee-208 for this very nice and fast-paced collab!

For more details on the paper and dataset, checkout our collection: HuggingFaceH4/zephyr-orpo-6617eba2c5c0e2cc3c151524
replied to m-ric's post 8 months ago
view reply

Did anyone research on frameworks or tools that are currently being used to make agents for production. I've been doing some research but most of them not suitable for production.

posted an update 9 months ago
view post
Post
Hello fellow huggers!
  • 2 replies
ยท