Moritz Laurer

MoritzLaurer

AI & ML interests

None yet

Articles

Organizations

Posts 3

view post
Post
893
Why does Meta invest millions in Llama 3 and then makes it available for free? Here is Zuckerberg's explanation to investors in the Q3 2023 earnings call:

"The second part of our playbook is open source software infrastructure. Our long-standing strategy has been to build and open source general infrastructure while keeping our specific product implementations proprietary.

[...] First, open source software is typically safer and more secure, as well as more compute efficient to operate due to all the ongoing feedback, scrutiny, and development from the community. This is a big deal because safety is one of the most important issues in AI. Efficiency improvements and lowering the compute costs also benefit everyone including us.

Second, open source software often becomes an industry standard, and when companies standardize on building with our stack, that then becomes easier to integrate new innovations into our products. That’s subtle, but the ability to learn and improve quickly is a huge advantage and being an industry standard enables that.

Third, open source is hugely popular with developers and researchers. We know that people want to work on open systems that will be widely adopted, so this helps us recruit the best people at Meta, which is a very big deal for leading in any new technology area.

And again, we typically have unique data and build unique product integrations anyway, so providing infrastructure like Llama as open source doesn't reduce our main advantages. This is why our long-standing strategy has been to open source general infrastructure and why I expect it to continue to be the right approach for us going forward."

Fully earnings call transcript: https://s21.q4cdn.com/399680738/files/doc_financials/2023/q4/META-Q4-2023-Earnings-Call-Transcript.pdf
view post
Post
2903
🆕 Releasing a new series of 8 zeroshot classifiers: better performance, fully commercially useable thanks to synthetic data, up to 8192 tokens, run on any hardware.

Summary:
🤖 The zeroshot-v2.0-c series replaces commercially restrictive training data with synthetic data generated with mistralai/Mixtral-8x7B-Instruct-v0.1 (Apache 2.0). All models are released under the MIT license.
🦾 The best model performs 17%-points better across 28 tasks vs. facebook/bart-large-mnli (the most downloaded commercially-friendly baseline).
🌏 The series includes a multilingual variant fine-tuned from BAAI/bge-m3 for zeroshot classification in 100+ languages and with a context window of 8192 tokens
🪶 The models are 0.2 - 0.6 B parameters small, so they run on any hardware. The base-size models are +2x faster than bart-large-mnli while performing significantly better.
🤏 The models are not generative LLMs, they are efficient encoder-only models specialized in zeroshot classification through the universal NLI task.
🤑 For users where commercially restrictive training data is not an issue, I've also trained variants with even more human data for improved performance.

Next steps:
✍️ I'll publish a blog post with more details soon
🔮 There are several improvements I'm planning for v2.1. Especially the multilingual model has room for improvement.

All models are available for download in this Hugging Face collection: MoritzLaurer/zeroshot-classifiers-6548b4ff407bb19ff5c3ad6f

These models are an extension of the approach explained in this paper, but with additional synthetic data: https://arxiv.org/abs/2312.17543