Akhil B

hakunamatata1997

AI & ML interests

Gen AI , NLP , Computer Vision , XAI

Organizations

hakunamatata1997's activity

replied to their post 25 days ago
view reply

Tried sadtalker , too much time consumption. D-ID is proprietary . Looking something from opensource. Tried wav2lip and also enhancing that with GFPGAN , output is good but i want something fast.

posted an update 25 days ago
view post
Post
1420
I'm working on talking head generation that takes audio and video as input, can someone suggest me a good existing architecture that can generate videos with less latency or can we make it in real time?
ยท
replied to their post about 1 month ago
view reply

Yeah tried QwenVL , it's poor on understanding text, QwenVL-Plus and Max are good but not open sourced ๐Ÿ˜ช

replied to their post about 1 month ago
view reply

@merve more particularly if i say, something like understanding text good enough in images so the response are accurate enough from VLM

posted an update about 1 month ago
view post
Post
999
Can someone suggest me a good open source vision model which performs good at OCR?
ยท
replied to their post about 1 month ago
view reply

On this point, I want to suggest a new rule- Users can upload their models to public space but once uploaded they cannot delete them ๐Ÿ˜… . What you say @clem @julien-c

posted an update about 1 month ago
view post
Post
1426
Why salesforce removedSFR-Iterative-DPO-LLaMA-3-8B-R ? Any ideas?
ยท
replied to m-ric's post 3 months ago
view reply

Did anyone research on frameworks or tools that are currently being used to make agents for production. I've been doing some research but most of them not suitable for production.

posted an update 4 months ago
view post
Post
Hello fellow huggers!
  • 2 replies
ยท