Shetu Mohanto

shetumohanto

AI & ML interests

GenAI | MLOps | AI agent | Computer Vision

Recent Activity

reacted to prithivMLmods's post with 🔥 1 day ago

Dropping some of the custom fine-tunes based on SigLIP2, with a single-label classification problem type! 🌀🧤 - AI vs Deepfake vs Real : https://huggingface.co/prithivMLmods/AI-vs-Deepfake-vs-Real-Siglip2 - Deepfake Detect : https://huggingface.co/prithivMLmods/Deepfake-Detect-Siglip2 - Fire Detection : https://huggingface.co/prithivMLmods/Fire-Detection-Siglip2 - Deepfake Quality Assess : https://huggingface.co/prithivMLmods/Deepfake-Quality-Assess-Siglip2 - Guard Against Unsafe Content : https://huggingface.co/prithivMLmods/Guard-Against-Unsafe-Content-Siglip2 🌠Collection : https://huggingface.co/collections/prithivMLmods/siglip2-custom-67bcdb2de8fe96b99fb4e19e

reacted to clem's post with 👍 11 days ago

reacted to clem's post with 🔥 11 days ago

View all activity

Organizations

shetumohanto's activity

reacted to prithivMLmods's post with 🔥 1 day ago

Post

5818

Dropping some of the custom fine-tunes based on SigLIP2,
with a single-label classification problem type! 🌀🧤

- AI vs Deepfake vs Real : prithivMLmods/AI-vs-Deepfake-vs-Real-Siglip2
- Deepfake Detect : prithivMLmods/Deepfake-Detect-Siglip2
- Fire Detection : prithivMLmods/Fire-Detection-Siglip2
- Deepfake Quality Assess : prithivMLmods/Deepfake-Quality-Assess-Siglip2
- Guard Against Unsafe Content : prithivMLmods/Guard-Against-Unsafe-Content-Siglip2

🌠Collection : prithivMLmods/siglip2-custom-67bcdb2de8fe96b99fb4e19e

reacted to clem's post with 👍🔥 11 days ago

Post

2812

What are the best organizations to follow on @huggingface ?

On top of my head:
- Deepseek (35,000 followers): https://huggingface.co/deepseek-ai
- Meta Llama (27,000 followers): https://huggingface.co/meta-llama
- Black Forrest Labs (11,000 followers): https://huggingface.co/black-forest-labs
- OpenAI (5,000 followers): https://huggingface.co/openai
- Nvidia (16,000 followers): https://huggingface.co/nvidia
- MIcrosoft (9,000 followers): https://huggingface.co/microsoft
- AllenAI (2,000 followers): https://huggingface.co/allenai
- Mistral (5,000 followers): https://huggingface.co/mistralai
- XAI (600 followers): https://huggingface.co/xai-org
- Stability AI (16,000 followers): https://huggingface.co/stabilityai
- Qwen (16,000 followers): https://huggingface.co/Qwen
- GoogleAI (8,000 followers): https://huggingface.co/google
- Unsloth (3,000 followers): https://huggingface.co/unsloth
- Bria AI (4,000 followers): https://huggingface.co/briaai
- NousResearch (1,300 followers): https://huggingface.co/NousResearch

Bonus, the agent course org with 17,000 followers: https://huggingface.co/agents-course

1 reply

reacted to mmhamdy's post with ❤️ 11 days ago

Post

2728

🎉 We're excited to introduce MemoryCode, a novel synthetic dataset designed to rigorously evaluate LLMs' ability to track and execute coding instructions across multiple sessions. MemoryCode simulates realistic workplace scenarios where a mentee (the LLM) receives coding instructions from a mentor amidst a stream of both relevant and irrelevant information.

💡 But what makes MemoryCode unique?! The combination of the following:

✅ Multi-Session Dialogue Histories: MemoryCode consists of chronological sequences of dialogues between a mentor and a mentee, mirroring real-world interactions between coworkers.

✅ Interspersed Irrelevant Information: Critical instructions are deliberately interspersed with unrelated content, replicating the information overload common in office environments.

✅ Instruction Updates: Coding rules and conventions can be updated multiple times throughout the dialogue history, requiring LLMs to track and apply the most recent information.

✅ Prospective Memory: Unlike previous datasets that cue information retrieval, MemoryCode requires LLMs to spontaneously recall and apply relevant instructions without explicit prompts.

✅ Practical Task Execution: LLMs are evaluated on their ability to use the retrieved information to perform practical coding tasks, bridging the gap between information recall and real-world application.

📌 Our Findings

1️⃣ While even small models can handle isolated coding instructions, the performance of top-tier models like GPT-4o dramatically deteriorates when instructions are spread across multiple sessions.

2️⃣ This performance drop isn't simply due to the length of the context. Our analysis indicates that LLMs struggle to reason compositionally over sequences of instructions and updates. They have difficulty keeping track of which instructions are current and how to apply them.

🔗 Paper: From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions (2502.13791)
📦 Code: https://github.com/for-ai/MemoryCode

reacted to csabakecskemeti's post with 🤗 11 days ago

Post

2746

Testing Training on AMD/ROCm the first time!

I've got my hands on an AMD Instinct MI100. It's about the same price used as a V100 but on paper has more TOPS (V100 14TOPS vs MI100 23TOPS) also the HBM has faster clock so the memory bandwidth is 1.2TB/s.
For quantized inference it's a beast (MI50 was also surprisingly fast)

For LORA training with this quick test I could not make the bnb config works so I'm running the FT on the fill size model.

Will share all the install, setup and setting I've learned in a blog post, together with the cooling shroud 3D design.

8 replies

New activity in microsoft/Florence-2-large 3 months ago

Assert config.vision_config.model_type == 'davit', 'only DaViT is supported for now'

#44 opened 8 months ago by

Truc95

reacted to davidberenstein1957's post with 🔥❤️ 3 months ago

Post

4234

Introducing the Synthetic Data Generator, a user-friendly application that takes a no-code approach to creating custom datasets with Large Language Models (LLMs). The best part: A simple step-by-step process, making dataset creation a non-technical breeze, allowing anyone to create datasets and models in minutes and without any code.

Blog: https://huggingface.co/blog/synthetic-data-generator
Space: argilla/synthetic-data-generator

4 replies

reacted to lewtun's post with 🔥❤️ 3 months ago

Post

6928

We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute 🔥

How? By combining step-wise reward models with tree search algorithms :)

We show that smol models can match or exceed the performance of their much larger siblings when given enough "time to think"

We're open sourcing the full recipe and sharing a detailed blog post.

In our blog post we cover:

📈 Compute-optimal scaling: How we implemented DeepMind's recipe to boost the mathematical capabilities of open models at test-time.

🎄 Diverse Verifier Tree Search (DVTS): An unpublished extension we developed to the verifier-guided tree search technique. This simple yet effective method improves diversity and delivers better performance, particularly at large test-time compute budgets.

🧭 Search and Learn: A lightweight toolkit for implementing search strategies with LLMs and built for speed with vLLM

Here's the links:

- Blog post: HuggingFaceH4/blogpost-scaling-test-time-compute

- Code: https://github.com/huggingface/search-and-learn

Enjoy!

2 replies

reacted to aaditya's post with ❤️ 3 months ago

Post

3437

Last Week in Medical AI: Top Research Papers/Models 🔥
🏅 (December 7 – December 14, 2024)

Medical LLM & Other Models
- PediaBench: Chinese Pediatric LLM
- Comprehensive pediatric dataset
- Advanced benchmarking platform
- Chinese healthcare innovation
- BiMediX: Bilingual Medical LLM
- Multilingual medical expertise
- Diverse medical knowledge integration
- Cross-cultural healthcare insights
- MMedPO: Vision-Language Medical LLM
- Clinical multimodal optimization
- Advanced medical image understanding
- Precision healthcare modeling

Frameworks and Methodologies
- TOP-Training: Medical Q&A Framework
- Hybrid RAG: Secure Medical Data Management
- Zero-Shot ATC Clinical Coding
- Chest X-Ray Diagnosis Architecture
- Medical Imaging AI Democratization

Benchmarks & Evaluations
- KorMedMCQA: Korean Healthcare Licensing Benchmark
- Large Language Model Medical Tasks
- Clinical T5 Model Performance Study
- Radiology Report Quality Assessment
- Genomic Analysis Benchmarking

Medical LLM Applications
- BRAD: Digital Biology Language Model
- TCM-FTP: Herbal Prescription Prediction
- LLaSA: Activity Analysis via Sensors
- Emergency Department Visit Predictions
- Neurodegenerative Disease AI Diagnosis
- Kidney Disease Explainable AI Model

Ethical AI & Privacy
- Privacy-Preserving LLM Mechanisms
- AI-Driven Digital Organism Modeling
- Biomedical Research Automation
- Multimodality in Medical Practice

Full thread in detail: https://x.com/OpenlifesciAI/status/1867999825721242101

4 replies

reacted to lewtun's post with ❤️ 3 months ago

Post

6928

2 replies

reacted to di-zhang-fdu's post with 🚀 3 months ago

Post

3070

The first version of LLaMA-O1 has been uploaded to HF now!Here We Come!
Supervised:
SimpleBerry/LLaMA-O1-Supervised-1129
Base(Pretrain):
SimpleBerry/LLaMA-O1-Base-1127
Supervised Finetune Dataset:
SimpleBerry/OpenLongCoT-SFT
Pretraining Dataset:
SimpleBerry/OpenLongCoT-Pretrain-1202
RLHF is on the way! View our GitHub Repo:
https://github.com/SimpleBerry/LLaMA-O1
Our ongoing related researches:
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B (2406.07394)
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning (2410.02884)
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning (2411.18203)
@AdinaY @akhaliq @jwu323
------
GGUF:https://huggingface.co/Lyte/LLaMA-O1-Supervised-1129-Q4_K_M-GGUF
online Demo (CPU-only): SimpleBerry/LLaMA-O1-Supervised-1129-Demo

3 replies

reacted to sayakpaul's post with 🔥 3 months ago

Post

1532

Let 2024 be the year of video model fine-tunes!

Check it out here:
https://github.com/a-r-r-o-w/cogvideox-factory/tree/main/training/mochi-1

reacted to hexgrad's post with 🔥 3 months ago

Post

3026

self.brag(): Kokoro finally got 300 votes in Pendrokar/TTS-Spaces-Arena after @Pendrokar was kind enough to add it 3 weeks ago.
Discounting the small sample size of votes, I think it is safe to say that hexgrad/Kokoro-TTS is currently a top 3 model among the contenders in that Arena. This is notable because:
- At 82M params, Kokoro is one of the smaller models in the Arena
- MeloTTS has 52M params
- F5 TTS has 330M params
- XTTSv2 has 467M params

5 replies

reacted to merve's post with 👍 3 months ago

Post

2218

The authors of ColPali trained a retrieval model based on SmolVLM 🤠 https://huggingface.co/vidore/colsmolvlm-alpha
TLDR;

- ColSmolVLM performs better than ColPali and DSE-Qwen2 on all English tasks

- ColSmolVLM is more memory efficient than ColQwen2 💗

reacted to andito's post with 🔥❤️ 3 months ago

Post

3370

Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs.

- SmolVLM generates tokens 7.5 to 16 times faster than Qwen2-VL! 🤯
- Other models at this size crash a laptop, but SmolVLM comfortably generates 17 tokens/sec on a macbook! 🚀
- SmolVLM can be fine-tuned on a Google collab! Or process millions of documents with a consumer GPU!
- SmolVLM even outperforms larger models in video benchmarks, despite not even being trained on videos!

Check out more!
Demo: HuggingFaceTB/SmolVLM
Blog: https://huggingface.co/blog/smolvlm
Model: HuggingFaceTB/SmolVLM-Instruct
Fine-tuning script: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb

reacted to openfree's post with ❤️ 3 months ago

Post

3241

Hackathon: 1-Minute Creative Innovation with AI
Total Prize: 20,000 USD(USDT)

"One-minute creation by AI Coding Autonomous Agent MOUSE-I"
Hosted by VIDraft | Organized by Korea AI Promotion Association (KAIPA)

🌟 Revolutionary Era of AI Coding
"Creating a web app in just one minute" - This is no longer just imagination, but reality. With the emergence of AI Coding Autonomous Agent MOUSE-I ("One-minute creation by AI Coding Autonomous Agent MOUSE-I"), we are witnessing a new era of software development.

🏆 Period & Prizes
Period: November 28 - December 23, 2024

Total Prize: 20,000 USD(USDT)

🏆
Top Rank: 10,000 USDT
Highest HuggingFace Trending Rank

❤️
Top Likes: 5,000 USDT
Most Likes

💫
Top Creative: 5,000 USDT
Most Innovative

🚀 Participation Process
1. Start with MOUSE-I
• Access https://VIDraft-mouse1.hf.space
• Notice VIDraft/Mouse-Hackathon
• Generate basic web app code in 1 minute
• Create unlimited content: games, dashboards, landing pages, utilities, etc.

2. Creative Development
• Free development based on MOUSE-I generated code
• Additional languages like Python can be used

3. Submission
• Public deployment on Hugging Face
• Register in Static mode
• Required in README.md:

short_description: "One-minute creation by AI Coding Autonomous Agent MOUSE-I"

📅 Key Dates
• Submission Deadline: December 23, 2024, midnight (NYC time)
• Winners Announcement: December 24, 2024

✨ Participant Benefits
• Full ownership and copyright of all creations
• Experience new paradigm of AI coding
• Multiple submissions allowed from the same account
• Contact: arxivgpt@gmail.com

"Give Yourself the Best Christmas Gift

reacted to as-cle-bert's post with 🔥 3 months ago

Post

1272

Hi HuggingFacers!🤗
I'm thrilled to introduce my latest project: 𝗦𝗲𝗻𝗧𝗿𝗘𝘃 (𝗦𝗲𝗻tence 𝗧𝗿ansformers 𝗘𝘃aluator), a python package that offers simple customizable evaluation for text retrieval accuracy and time performance of Sentence Transformers-compatible text embedders on PDF data!📊

Learn more in my LinkedIn post: https://www.linkedin.com/posts/astra-clelia-bertelli-583904297_python-embedders-semanticsearch-activity-7266754133557190656-j1e3

And on the GitHub repo: https://github.com/AstraBert/SenTrEv

Have fun!🍕