Luc Georges's picture

Luc Georges

mcpotato

AI & ML interests

None yet

Recent Activity

Organizations

Hugging Face's profile picture Scanned Tokens's profile picture Hugging Face H4's profile picture Inference Endpoints's profile picture Blog-explorers's profile picture hf-security's profile picture Dev Mode Explorers's profile picture Hugging Face Discord Community's profile picture HF MC Players's profile picture

mcpotato's activity

posted an update 10 days ago
view post
Post
2388
Stoked to announce we've partnered with JFrog to continue improving safety on the Hub! 🐾

Their model scanner brings new scanning capabilities to the table, aimed at reducing alert fatigue.

More on that in our blog post: https://huggingface.co/blog/jfrog
  • 1 reply
·
reacted to dvilasuero's post with đŸš€đŸ”„ 9 months ago
view post
Post
8185
Today is a huge day in Argilla’s history. We couldn’t be more excited to share this with the community: we’re joining Hugging Face!

We’re embracing a larger mission, becoming part of a brilliant and kind team and a shared vision about the future of AI.

Over the past year, we’ve been collaborating with Hugging Face on countless projects: launching partner of Docker Spaces, empowering the community to clean Alpaca translations into Spanish and other languages, launching argilla/notus-7b-v1 building on Zephyr’s learnings, the Data is Better Together initiative with hundreds of community contributors, or releasing argilla/OpenHermesPreferences, one of the largest open preference tuning datasets

After more than 2,000 Slack messages and over 60 people collaborating for over a year, it already felt like we were part of the same team, pushing in the same direction. After a week of the smoothest transition you can imagine, we’re now the same team.

To those of you who’ve been following us, this won’t be a huge surprise, but it will be a big deal in the coming months. This acquisition means we’ll double down on empowering the community to build and collaborate on high quality datasets, we’ll bring full support for multimodal datasets, and we’ll be in a better place to collaborate with the Open Source AI community. For enterprises, this means that the Enterprise Hub will unlock highly requested features like single sign-on and integration with Inference Endpoints.

As a founder, I am proud of the Argilla team. We're now part of something bigger and a larger team but with the same values, culture, and goals. Grateful to have shared this journey with my beloved co-founders Paco and Amélie.

Finally, huge thanks to the Chief Llama Officer @osanseviero for sparking this and being such a great partner during the acquisition process.

Would love to answer any questions you have so feel free to add them below!
·
reacted to trisfromgoogle's post with 🚀 11 months ago
view post
Post
1855
Very excited to share the first two official Gemma variants from Google! Today at Google Cloud Next, we announced cutting-edge models for code and research!

First, google/codegemma-release-66152ac7b683e2667abdee11 - a new set of code-focused Gemma models at 2B and 7B, in both pretrained and instruction-tuned variants. These exhibit outstanding performance on academic benchmarks and (in my experience) real-life usage. Read more in the excellent HuggingFace blog: https://huggingface.co/blog/codegemma

Second, ( google/recurrentgemma-release-66152cbdd2d6619cb1665b7a), which is based on the outstanding Google DeepMind research in Griffin: https://arxiv.org/abs/2402.19427. RecurrentGemma is a research variant that enables higher throughput and vastly improved memory usage. We are excited about new architectures, especially in the lightweight Gemma sizes, where innovations like RecurrentGemma can scale modern AI to many more use cases.

For details on the launches of these models, check out our launch blog -- and please do not hesitate to send us feedback. We are excited to see what you build with CodeGemma and RecurrentGemma!

Huge thanks to the Hugging Face team for helping ensure that these models work flawlessly in the Hugging Face ecosystem at launch!
·
reacted to loubnabnl's post with â€ïžđŸ€— about 1 year ago
view post
Post
⭐ Today we’re releasing The Stack v2 & StarCoder2: a series of 3B, 7B & 15B code generation models trained on 3.3 to 4.5 trillion tokens of code:

- StarCoder2-15B matches or outperforms CodeLlama 34B, and approaches DeepSeek-33B on multiple benchmarks.
- StarCoder2-3B outperforms StarCoderBase-15B and similar sized models.
- The Stack v2 a 4x larger dataset than the Stack v1, resulting in 900B unique code tokens 🚀
As always, we released everything from models and datasets to curation code. Enjoy!

🔗 StarCoder2 collection: bigcode/starcoder2-65de6da6e87db3383572be1a
🔗 Paper: https://drive.google.com/file/d/17iGn3c-sYNiLyRSY-A85QOzgzGnGiVI3/view
🔗 BlogPost: https://huggingface.co/blog/starcoder2
🔗 Code Leaderboard: bigcode/bigcode-models-leaderboard
reacted to clem's post with đŸ€— about 1 year ago
view post
Post
Is synthetic data the future of AI? đŸ”„đŸ”„đŸ”„

@HugoLaurencon @Leyo & @VictorSanh are introducing HuggingFaceM4/WebSight , a multimodal dataset featuring 823,000 pairs of synthetically generated HTML/CSS codes along with screenshots of the corresponding rendered websites to train GPT4-V-like models đŸŒđŸ’»

While crafting their upcoming foundation vision language model, they faced the challenge of converting website screenshots into usable HTML/CSS codes. Most VLMs suck at this and there was no public dataset available for this specific task, so they decided to create their own.

They prompted existing LLMs to generate 823k HTML/CSS codes of very simple websites. Through supervised fine-tuning of a vision language model on WebSight, they were able to generate the code to reproduce a website component, given a screenshot.

You can explore the dataset here: HuggingFaceM4/WebSight

What do you think?
·
reacted to abhishek's post with đŸ€— about 1 year ago
view post
Post
Hello Huggers! đŸ€—
·