argilla-internal-testing

company

https://argilla.io/

argilla_io

https://github.com/argilla-io

Activity Feed

AI & ML interests

Data Quality

Recent Activity

davidberenstein1957 updated a dataset 2 days ago

argilla-internal-testing/test_import_dataset_from_hub_with_classlabel_52ccdcb0-b077-40e9-81f7-6aa02239fc6d

davidberenstein1957 published a dataset 2 days ago

argilla-internal-testing/test_import_dataset_from_hub_with_classlabel_52ccdcb0-b077-40e9-81f7-6aa02239fc6d

davidberenstein1957 updated a dataset 2 days ago

argilla-internal-testing/test_import_dataset_from_hub_using_wrong_settings_with_records_False

View all activity

argilla-internal-testing's activity

davidberenstein1957

posted an update 1 day ago

Post

1453

🚨 New Bonus Unit: Tracing & Evaluating Your Agent! 🚨

Learn how to transform your agent from a simple demo into a robust, reliable product ready for real users.

UNIT: https://huggingface.co/learn/agents-course/bonus-unit2/introduction

In this unit, you'll learn:
- Offline Evaluation – Benchmark and iterate your agent using datasets.
- Online Evaluation – Continuously track key metrics such as latency, costs, and user feedback.

Happy testing and improving!

Thanks Langfuse team!

davidberenstein1957

updated a dataset 2 days ago

argilla-internal-testing/test_import_dataset_from_hub_with_classlabel_52ccdcb0-b077-40e9-81f7-6aa02239fc6d

Viewer • Updated 2 days ago • 3 • 9

davidberenstein1957

published a dataset 2 days ago

argilla-internal-testing/test_import_dataset_from_hub_with_classlabel_52ccdcb0-b077-40e9-81f7-6aa02239fc6d

Viewer • Updated 2 days ago • 3 • 9

davidberenstein1957

updated 5 datasets 2 days ago

argilla-internal-testing/test_import_dataset_from_hub_using_wrong_settings_with_records_False

Updated 2 days ago • 1.3k

argilla-internal-testing/test_import_dataset_from_hub_using_wrong_settings_with_records_True

Viewer • Updated 2 days ago • 3 • 5.83k

argilla-internal-testing/test_import_dataset_from_hub_using_settings_with_recordsFalse

Updated 2 days ago • 2.19k

argilla-internal-testing/test_import_dataset_from_hub_using_settings_with_recordsTrue

Viewer • Updated 2 days ago • 3 • 1.47k

argilla-internal-testing/test_import_dataset_from_hub_with_classlabel_c87337d6-6cae-4a62-a82a-7c4ce64df2b9

Viewer • Updated 2 days ago • 3 • 10

davidberenstein1957

published a dataset 2 days ago

argilla-internal-testing/test_import_dataset_from_hub_with_classlabel_c87337d6-6cae-4a62-a82a-7c4ce64df2b9

Viewer • Updated 2 days ago • 3 • 10

davidberenstein1957

updated a dataset 2 days ago

argilla-internal-testing/test_import_dataset_from_hub_with_records_False

Updated 2 days ago • 1.44k

burtenshaw

posted an update 6 days ago

Post

3321

The Hugging Face Agents Course now includes three major agent frameworks!

🔗 https://huggingface.co/agents-course

This includes LlamaIndex, LangChain, and our very own smolagents. We've worked to integrate the three frameworks in distinctive ways so that learners can reflect on when and where to use each.

This also means that you can follow the course if you're already familiar with one of these frameworks, and soak up some of the fundamental knowledge in earlier units.

Hopefully, this makes the agents course as open to as many people as possible.

3 replies

burtenshaw

posted an update 13 days ago

Post

2193

The open LLM leaderboard is completed, retired, dead, ‘ascended to a higher plane’. And in its shadow we have an amazing range of leaderboards built and maintained by the community.

In this post, I just want to list some of those great leaderboards that you should bookmark for staying up to date:

- Chatbot Arena LLM Leaderboard is the first port of call for checking out the best model. It’s not the fastest because humans will need to use the models to get scores, but it’s worth the wait. lmarena-ai/chatbot-arena-leaderboard

- OpenVLM Leaderboard is great for getting scores on vision language models opencompass/open_vlm_leaderboard

- Ai2 are doing a great job on RewardBench and I hope they keep it up because reward models are the unsexy workhorse of the field. allenai/reward-bench

- The GAIA leaderboard is great for evaluating agent applications. gaia-benchmark/leaderboard

🤩 This seems like such a sustainable way of building for the long term, where rather than leaning on a single company to evaluate all LLMs, we share the load.

3 replies

burtenshaw

posted an update 13 days ago

Post

1973

Still speed running Gemma 3 to think. Today I focused on setting up gpu poor hardware to run GRPO.

This is a plain TRL and PEFT notebook which works on mac silicone or colab T4. This uses the 1b variant of Gemma 3 and a reasoning version of GSM8K dataset.

🧑‍🍳 There’s more still in the oven like releasing models, an Unsloth version, and deeper tutorials, but hopefully this should bootstrap your projects.

Here’s a link to the 1b notebook: https://colab.research.google.com/drive/1mwCy5GQb9xJFSuwt2L_We3eKkVbx2qSt?usp=sharing

1 reply

burtenshaw

posted an update 14 days ago

Post

1795

everybody and their dog is fine-tuning Gemma 3 today, so I thought I'd do a longer post on the tips and sharp edges I find. let's go!

1. has to be install everything form main and nightly. this is what I'm working with to get unsloth and TRL running

git+https://github.com/huggingface/transformers@main
git+https://github.com/huggingface/trl.git@main
bitsandbytes
peft

plus this with --no-deps

git+https://github.com/unslothai/unsloth-zoo.git@nightly
git+https://github.com/unslothai/unsloth.git@nightly

2. will brown's code to turn GSM8k into a reasoning dataset is a nice toy experiment https://gist.github.com/willccbb/4676755236bb08cab5f4e54a0475d6fb

3. with a learning rate of 5e-6 rewards and loss stayed flat for the first 100 or so steps.

4. so far none of my runs have undermined the outputs after 1 epoch. therefore, I'm mainly experimenting with bigger LoRA adapters.

from trl import GRPOConfig

training_args = GRPOConfig(
    learning_rate = 5e-6,
    adam_beta1 = 0.9,
    adam_beta2 = 0.99,
    weight_decay = 0.1,
    warmup_ratio = 0.1,
    lr_scheduler_type = "cosine",
    optim = "adamw_8bit",
    logging_steps = 1,
    per_device_train_batch_size = 2,
    gradient_accumulation_steps = 1,
    num_generations = 2,
    max_prompt_length = 256,
    max_completion_length = 1024 - 256,
    num_train_epochs = 1,
    max_steps = 250,
    save_steps = 250,
    max_grad_norm = 0.1,
    report_to = "none",
)

5. vision fine-tuning isn't available in TRL's GRPOTrainer, so stick to text datasets. but no need to load the model differently in transformers or Unsloth

from transformers import AutoModelForImageTextToText

model = AutoModelForImageTextToText.from_pretrained("google/gemma-3-4b-it)

if you want an introduction to GRPO, check out the reasoning course, it walks you through the algorithm, theory, and implementation in a smooth way.

https://huggingface.co/reasoning-course

2 replies

burtenshaw

posted an update 14 days ago

Post

1897

Here’s a notebook to make Gemma reason with GRPO & TRL. I made this whilst prepping the next unit of the reasoning course:

In this notebooks I combine together google’s model with some community tooling

- First, I load the model from the Hugging Face hub with transformers’s latest release for Gemma 3
- I use PEFT and bitsandbytes to get it running on Colab
- Then, I took Will Browns processing and reward functions to make reasoning chains from GSM8k
- Finally, I used TRL’s GRPOTrainer to train the model

Next step is to bring Unsloth AI in, then ship it in the reasoning course. Links to notebook below.

https://colab.research.google.com/drive/1Vkl69ytCS3bvOtV9_stRETMthlQXR4wX?usp=sharing

4 replies

davidberenstein1957

posted an update 21 days ago

Post

2379

🔥 Text2SQL, explore and share any data analysis!

🤗 Hugging Face - Dataset Studio is an amazing new feature.

🚀 Start yourself: fka/awesome-chatgpt-prompts

📺 YouTube: https://youtu.be/5LUZq7MHolA?feature=shared

burtenshaw

posted an update 21 days ago

Post

3692

I’m super excited to work with @mlabonne to build the first practical example in the reasoning course.

🔗 https://huggingface.co/reasoning-course

Here's a quick walk through of the first drop of material that works toward the use case:

- a fundamental introduction to reinforcement learning. Answering questions like, ‘what is a reward?’ and ‘how do we create an environment for a language model?’

- Then it focuses on Deepseek R1 by walking through the paper and highlighting key aspects. This is an old school way to learn ML topics, but it always works.

- Next, it takes to you Transformers Reinforcement Learning and demonstrates potential reward functions you could use. This is cool because it uses Marimo notebooks to visualise the reward.

- Finally, Maxime walks us through a real training notebook that uses GRPO to reduce generation length. I’m really into this because it works and Maxime took the time to validate it share assets and logging from his own runs for you to compare with.

Maxime’s work and notebooks have been a major part of the open source community over the last few years. I, like everyone, have learnt so much from them.

davidberenstein1957

posted an update 22 days ago

Post

4221

🥊 Epic Agent Framework Showdown! Available today!

🔵 In the blue corner, the versatile challenger with a proven track record of knowledge retrieval: LlamaIndex!

🛑 In the red corner, the defender, weighing in with lightweight efficiency: Hugging Face smolagents!

🔗 URL: https://huggingface.co/agents-course

We just published the LlamaIndex unit for the agents course, and it is set to offer a great contrast between the smolagents unit by looking at

- What makes llama-index stand-out
- How the LlamaHub is used for integrations
- Creating QueryEngine components
- Using agents and tools
- Agentic and multi-agent workflows

The team has been working flat-out on this for a few weeks. Supported by Logan Markewich and Laurie Voss over at LlamaIndex.

Who won? You decide!

davidberenstein1957

posted an update 23 days ago

Post

3021

🫸 New release to push vector search to the Hub with vicinity and work with any serialisable objects.

🧑‍🏫 KNN, HNSW, USEARCH, ANNOY, PYNNDESCENT, FAISS, and VOYAGER.

🔗 Example Repo: minishlab/my-vicinity-repo

burtenshaw

posted an update 27 days ago

Post

5505

I made a real time voice agent with FastRTC, smolagents, and hugging face inference providers. Check it out in this space:

🔗 burtenshaw/coworking_agent

9 replies

AI & ML interests

Recent Activity

Team members 6

argilla-internal-testing's activity