jointriple (Triple)

posted an update 1 day ago

Post

1299

OpenAI is losing money on the $200/month subscription 🤯. It's crazy how expensive it is to run these largest LLMs:

- ChatGPT Pro costs $200/month ($2,400/year) and is still unprofitable for OpenAI due to higher-than-expected usage.
- OpenAI reportedly expected losses of about $5 billion on revenue of $3.7 billion last year, with ChatGPT alone once costing an estimated $700,000 per day to operate. 💸🔥
- They build strong models and do great research. Whether this business model will work in the long run is one of the biggest questions in the AI economy today.

Source with the numbers 👇
https://techcrunch.com/2025/01/05/openai-is-losing-money-on-its-pricey-chatgpt-pro-plan-ceo-sam-altman-says/

3 replies

·

MoritzLaurer

posted an update 2 days ago

Post

2066

🚀 Releasing a new zeroshot-classifier based on ModernBERT! Some key takeaways:

- ⚡ Speed & efficiency: It's multiple times faster and uses significantly less memory than DeBERTav3. You can use larger batch sizes and enabling bf16 (instead of fp16) gave me a ~2x speed boost as well
- 📉 Performance tradeoff: It performs slightly worse than DeBERTav3 on average across my zeroshot classification task collection
- 🧠 Use cases: I recommend using it for scenarios requiring speed and a larger context window (8k).
- 💡 What’s next? I’m preparing a newer version trained on better + longer synthetic data to fully leverage the 8k context window and improve upon the training mix of my older zeroshot-v2.0 models. I also hope that there will be a multilingual variant in the future.

Great work by https://huggingface.co/answerdotai !

If you’re looking for a high-speed zeroshot classifier, give it a try!

📄 Resources below: 👇
Base model: MoritzLaurer/ModernBERT-base-zeroshot-v2.0
Large model: MoritzLaurer/ModernBERT-large-zeroshot-v2.0
Updated zeroshot collection: MoritzLaurer/zeroshot-classifiers-6548b4ff407bb19ff5c3ad6f
ModernBERT collection with paper: answerdotai/modernbert-67627ad707a4acbf33c41deb

MoritzLaurer

posted an update 19 days ago

Post

2592

Quite excited by the ModernBERT release! 0.15/0.4B small, 2T modern pre-training data and tokenizer with code, 8k context window, great efficient model for embeddings & classification!

This will probably be the basis for many future SOTA encoders! And I can finally stop using DeBERTav3 from 2021 :D

Congrats @answerdotai , @LightOnIO and collaborators like @tomaarsen !

Paper and models here 👇https://huggingface.co/collections/answerdotai/modernbert-67627ad707a4acbf33c41deb

3 replies

·

MoritzLaurer

posted an update 22 days ago

Post

1172

"Open-source AI: year in review 2024": amazing Space with lots of data-driven insights into AI in 2024! Check it out 👇
huggingface/open-source-ai-year-in-review-2024

2 replies

·

MoritzLaurer

posted an update 27 days ago

Post

1284

I've been building a small library for working with prompt templates on the HF hub: pip install prompt-templates. Motivation:

The community currently shares prompt templates in a wide variety of formats: in datasets, in model cards, as strings in .py files, as .txt/.yaml/.json/.jinja2 files etc. This makes sharing and working with prompt templates unnecessarily complicated.

Prompt templates are currently the main hyperparameter that people tune when building complex LLM systems or agents. If we don't have a common standard for sharing them, we cannot systematically test and improve our systems. After comparing different community approaches, I think that working with modular .yaml or .json files is the best approach.

The prompt-templates library :
- proposes a standard for sharing prompts (entirely locally or on the HF hub)
- provides some utilities that are interoperable with the broader ecosystem

Try it:

# !pip install prompt-templates
from prompt_templates import PromptTemplateLoader 
prompt_template = PromptTemplateLoader.from_hub(repo_id="MoritzLaurer/closed_system_prompts", filename="claude-3-5-artifacts-leak-210624.yaml")

The library is in early stages, feedback is welcome!

More details in the docs: https://github.com/MoritzLaurer/prompt_templates/

1 reply

·

josejointriple

updated a model 3 months ago

jointriple/brand_classification_20241021_model_distilbert_0_9957

Updated Oct 21, 2024 • 2

MoritzLaurer

posted an update 3 months ago

Post

4560

#phdone - I defended my PhD yesterday! A key lesson: it is amazing how open science and open source can empower beginners with limited resources:

I first learned about instruction-based classifiers like BERT-NLI 3-4 years ago, through the @HuggingFace ZeroShotClassificationPipeline. Digging deeper into this, it was surprisingly easy to find new datasets, newer base models, and reusable fine-tuning scripts on the HF Hub to create my own zeroshot models - although I didn't know much about fine-tuning at the time.

Thanks to the community effect of the Hub, my models were downloaded hundreds of thousands of times after a few months. Seeing my research being useful for people motivated me to improve and upload newer models. Leaving my contact details in the model cards led to academic cooperation and consulting contracts (and eventually my job at HF).

That's the power of open science & open source: learning, sharing, improving, collaborating.

I mean every word in my thesis acknowledgments (screenshot). I'm very grateful to my supervisors @vanatteveldt @CasAndreu @KasperWelbers for their guidance; to @profAndreaRenda and @CEPS_thinktank for enabling me to work part-time during the first year; to @huggingface for creating awesome tools and an awesome platform; and to many others who are not active on social media.

Links to the full thesis and the collection of my most recent models are below.

PS: If someone happens to speak Latin, let me know if my diploma contains some hidden Illuminati code or something :D

4 replies

·

MoritzLaurer

posted an update 4 months ago

Post

2305

The new NIM Serverless API by HF and Nvidia is a great option if you want a reliable API for open-weight LLMs like Llama-3.1-405B that are too expensive to run on your own hardware.

- It's pay-as-you-go, so it doesn't have rate limits like the standard HF Serverless API and you don't need to commit to hardware like for a dedicated endpoint.
- It works out-of-the box with the new v0.25 release of our huggingface_hub.InferenceClient
- It's specifically tailored to a small collection of popular open-weight models. For a broader selection of open models, we recommend using the standard HF Serverless API.
- Note that you need a token from an Enterprise Hub organization to use it.

Details in this blog post: https://huggingface.co/blog/inference-dgx-cloud
Compatible models in this HF collection: nvidia/nim-serverless-inference-api-66a3c6fcdcb5bbc6e975b508
Release notes with many more features of huggingface_hub==0.25.0: https://github.com/huggingface/huggingface_hub/releases/tag/v0.25.0

Copy-pasteable code in the first comment:

13 replies

·

MoritzLaurer

posted an update 4 months ago

Post

1628

Why would you fine-tune a model if you can just prompt an LLM? The new paper "What is the Role of Small Models in the LLM Era: A Survey" provides a nice pro/con overview. My go-to approach combines both:

1. Start testing an idea by prompting an LLM/VLM behind an API. It's fast and easy and I avoid wasting time on tuning a model on a task that might not make it into production anyways.

2. The LLM/VLM then needs to be manually validated. Anyone seriously considering putting AI into production has to do at least some manual validation. Setting up a good validation pipeline with a tool like Argilla is crucial and it can be reused for any future experiments. Note: you can use LLM-as-a-judge to automate some evals, but you always also need to validate the judge!

3. Based on this validation I can then (a) either just continue using the prompted LLM if it is accurate enough and it makes sense financially given my load; or (b) if the LLM is not accurate enough or too expensive to run in the long-run, I reuse the existing validation pipeline to annotate some additional data for fine-tuning a smaller model. This can be sped up by reusing & correcting synthetic data from the LLM (or just pure distillation).

Paper: https://arxiv.org/pdf/2409.06857
Argilla docs: https://docs.argilla.io/latest/
Argilla is also very easy to deploy with Hugging Face Spaces (or locally): https://huggingface.co/new-space?template=argilla%2Fargilla-template-space