Just tested Argilla's new data annotation feature - it's a game changer for AI project quality.
Upload CSVs, work with published datasets, or improve existing ones directly on HuggingFace Hub. Setup took < 2 minutes, no code needed (see example below where I selected a dataset to classify tweets in categories).
Real world impact: Missing in Chicago won a Pulitzer using a similar approach - 200 volunteers labeled police misconduct files to train their model. That's the power of good data annotation.
Three immediate use cases I see: - Build collaborative training sets with your community (surprisingly underused in AI journalism) - Turn your website chatbot logs into high-quality fine-tuning data - Compare generated vs published content (great for SEO headlines)
Works for solo projects or teams up to 100 people. All integrated with HuggingFace Hub for immediate model training.
Interesting to see tools like this making data quality more accessible. Data quality is the hidden driver of AI success that we don't talk about enough.
Dive into multi-model evaluations, pinpoint the best model for your needs, and explore insights across top open LLMs all in one place. Ready to level up your model comparison game?
๐ท๐ฝโโ๏ธ๐๐จ Announcing the Foundation Model Development Cheatsheet!
My first ๐คPost๐ค ever to announce the release of a fantastic collaborative resource to support model developers across the full development stack: The FM Development Cheatsheet available here: https://fmcheatsheet.org/
The cheatsheet is a growing database of the many crucial resources coming from open research and development efforts to support the responsible development of models. This new resource highlights essential yet often underutilized tools in order to make it as easy as possible for developers to adopt best practices, covering among other aspects: ๐ง๐ผโ๐คโ๐ง๐ผ data selection, curation, and governance; ๐ accurate and limitations-aware documentation; โก energy efficiency throughout the training phase; ๐ thorough capability assessments and risk evaluations; ๐ environmentally and socially conscious deployment strategies.
We strongly encourage developers working on creating and improving models to make full use of the tools listed here, and to help keep the resource up to date by adding the resources that you yourself have developed or found useful in your own practice ๐ค
๐คฏ Plot twist: Size isn't everything in AI! A lean 32B parameter model just showed up to the party and outperformed a 70B one. Efficiency > Scale? The AI world just got more interesting...
Cohere For AI released Aya Expanse, a new family of multilingual models (8B and 32B) spanning 23 popular languages.
This is no Woodstock AI but will be fun nonetheless haha. Iโll be hosting a live workshop with team members next week about the Enterprise Hugging Face hub.
1,000 spots available first-come first serve with some surprises during the stream!
I feel like this incredible resource hasn't gotten the attention it deserves in the community!
@clefourrier and generally the HuggingFace evaluation team put together a fantastic guidebook covering a lot about ๐๐ฉ๐๐๐จ๐๐ง๐๐ข๐ก from basics to advanced tips.
Collaboration between UK AI Safety Institute and Gray Swan AI to create a dataset for measuring harmfulness of LLM agents.
The benchmark contains both harmful and benign sets of 11 categories with varied difficulty levels and detailed evaluation, not only testing success rate but also tool level accuracy.
We provide refusal and accuracy metrics across a wide range of models in both no attack and prompt attack scenarios.
Just started going through the latest "State of AI Report 2024", and I cannot get over the predictions!
The report predicts major developments in AI over the next 12 months, including a $10B+ investment from a sovereign state into a large US AI lab, triggering national security scrutiny, and a viral app created by someone without coding skills.
It forecasts changes in data collection practices due to frontier labs facing trials, softer-than-expected EU AI Act implementations, and the rise of an open-source alternative to OpenAI GPT-4 outperforming in benchmarks.
NVIDIAโs dominance will remain largely unchallenged, investment in humanoid robots will decline, Appleโs on-device AI research will gain momentum, and a research paper by an AI scientist will be accepted at a major conference.
Lastly, a GenAI-based video game is expected to achieve breakout success.
Yet to go through all 200+ pages... will post summarized thoughts later.
2 replies
ยท
reacted to mervenoyan's
post with ๐ฅabout 1 month ago