Voxel51 (Voxel51)

abhishek

posted an update 26 days ago

Post

1617

🎉 SUPER BLACK FRIDAY DEAL 🎉

Train almost any model on a variety of tasks such as llm finetuning, text classification/regression, summarization, question answering, image classification/regression, object detection, tabular data, etc for FREE using AutoTrain locally. 🔥
https://github.com/huggingface/autotrain-advanced

abhishek

posted an update about 2 months ago

Post

5410

INTRODUCING Hugging Face AutoTrain Client 🔥
Fine-tuning models got even easier!!!!
Now you can fine-tune SOTA models on all compatible dataset-model pairs on Hugging Face Hub using Python on Hugging Face Servers. Choose from a number of GPU flavors, millions of models and dataset pairs and 10+ tasks 🤗

To try, install autotrain-advanced using pip. You can ignore dependencies and install without --no-deps and then you'd need to install some dependencies by hand.

"pip install autotrain-advanced"

Github repo: https://github.com/huggingface/autotrain-advanced

6 replies

·

abhishek

authored a paper 2 months ago

AutoTrain: No-code training for state-of-the-art models

Paper • 2410.15735 • Published Oct 21 • 58

abhishek

posted an update 2 months ago

Post

4387

AutoTrain: No-code training for state-of-the-art models (2410.15735)

abhishek

posted an update 4 months ago

Post

1501

NEW COMPETITION ALERT 🚀
Artificio/ROAM1RealWorldAdversarialAttack

abhishek

posted an update 4 months ago

Post

1854

🚨 NEW TASK ALERT 🚨
Extractive Question Answering: because sometimes generative is not all you need 😉
AutoTrain is the only open-source, no code solution to offer so many tasks across different modalities. Current task count: 23 🚀
Check out the blog post on getting started with this task: https://huggingface.co/blog/abhishek/extractive-qa-autotrain

harpreetsahota

posted an update 7 months ago

Post

2136

The Coachella of Computer Vision, CVPR, is right around the corner. In anticipation of the conference, I curated a dataset of the papers.

I'll have a technical blog post out tomorrow doing some analysis on the dataset, but I'm so hyped that I wanted to get it out to the community ASAP.

The dataset consists of the following fields:

- An image of the first page of the paper
- title: The title of the paper
- authors_list: The list of authors
- abstract: The abstract of the paper
- arxiv_link: Link to the paper on arXiv
- other_link: Link to the project page, if found
- category_name: The primary category this paper according to [arXiv taxonomy](https://arxiv.org/category_taxonomy)
- all_categories: All categories this paper falls into, according to arXiv taxonomy
- keywords: Extracted using GPT-4o

Here's how I created the dataset 👇🏼

Generic code for building this dataset can be found [here](https://github.com/harpreetsahota204/CVPR-2024-Papers).

This dataset was built using the following steps:

- Scrape the CVPR 2024 website for accepted papers
- Use DuckDuckGo to search for a link to the paper's abstract on arXiv
- Use arXiv.py (python wrapper for the arXiv API) to extract the abstract and categories, and download the pdf for each paper
- Use pdf2image to save the image of paper's first page
- Use GPT-4o to extract keywords from the abstract

Voxel51/CVPR_2024_Papers

abhishek

posted an update 7 months ago

Post

3317

You can now train/finetune custom sentence transformer embedding models using AutoTrain. Read blog: https://huggingface.co/blog/abhishek/finetune-custom-embeddings-autotrain

2 replies

·

abhishek

posted an update 8 months ago

Post

2935

🚨 NEW TASK ALERT 🚨
🎉 AutoTrain now supports Object Detection! 🎉
Transform your projects with these powerful new features:
🔹 Fine-tune any supported model from the Hugging Face Hub
🔹 Seamless logging with TensorBoard or W&B
🔹 Support for local and hub datasets
🔹 Configurable training for tailored results
🔹 Train locally or leverage Hugging Face Spaces
🔹 Deployment-ready with API inference or Hugging Face endpoints
AutoTrain: https://hf.co/autotrain

abhishek

posted an update 8 months ago

Post

3062

🚀🚀🚀🚀 Introducing AutoTrain Configs! 🚀🚀🚀🚀
Now you can train models using yaml config files! 💥 These configs are easy to understand and are not at all overwhelming. So, even a person with almost zero knowledge of machine learning can train state of the art models without writing any code. Check out example configs in the config directory of autotrain-advanced github repo and feel free to share configs by creating a pull request 🤗
Github repo: https://github.com/huggingface/autotrain-advanced

2 replies

·

abhishek

posted an update 8 months ago

Post

3072

How to Finetune phi-3 on MacBook Pro
https://huggingface.co/blog/abhishek/phi3-finetune-macbook

abhishek

posted an update 8 months ago

Post

2371

Trained another version of llama3-8b-instruct which beats the base model. This time without losing too many points on gsm8k benchmark. Again, using AutoTrain 💥 pip install autotrain-advanced
Trained model: abhishek/autotrain-llama3-orpo-v2

1 reply

·

abhishek

posted an update 8 months ago

Post

3476

With AutoTrain, you can already finetune the latest llama3 models without writing a single line of code. Here's an example finetune of llama3 8b model: abhishek/autotrain-llama3-no-robots

2 replies

·

jamarks

posted an update 8 months ago

Post

2170

FiftyOne Datasets <> Hugging Face Hub Integration!

As of yesterday's release of FiftyOne 0.23.8, the FiftyOne open source library for dataset curation and visualization is now integrated with the Hugging Face Hub!

You can now load Parquet datasets from the hub and have them converted directly into FiftyOne datasets. To load MNIST, for example:

pip install -U fiftyone

import fiftyone as fo
import fiftyone.utils.huggingface as fouh

dataset = fouh.load_from_hub(
    "mnist",
    format="ParquetFilesDataset",
    classification_fields="label",
)
session = fo.launch_app(dataset)

You can also load FiftyOne datasets directly from the hub. Here's how you load the first 1000 samples from the VisDrone dataset:

import fiftyone as fo
import fiftyone.utils.huggingface as fouh

dataset = fouh.load_from_hub("jamarks/VisDrone2019-DET", max_samples=1000)

# Launch the App
session = fo.launch_app(dataset)

And tying it all together, you can push your FiftyOne datasets directly to the hub:

import fiftyone.zoo as foz
import fiftyone.utils.huggingface as fouh

dataset = foz.load_zoo_dataset("quickstart")
fouh.push_to_hub(dataset, "my-dataset")

Major thanks to @tomaarsen @davanstrien @severo @osanseviero and @julien-c for helping to make this happen!!!

Full documentation and details here: https://docs.voxel51.com/integrations/huggingface.html#huggingface-hub

3 replies

·

harpreetsahota

posted an update 10 months ago

Post

google/gemma-7b-it is super good!

I wasn't convinced at first, but after vibe-checking it...I'm quite impressed.

I've got a notebook here, which is kind of a framework for vibe-checking LLMs.

In this notebook, I take Gemma for a spin on a variety of prompts:
• [nonsensical tokens]( harpreetsahota/diverse-token-sampler
• [conversation where I try to get some PII)( harpreetsahota/red-team-prompts-questions)
• [summarization ability]( lighteval/summarization)
• [instruction following]( harpreetsahota/Instruction-Following-Evaluation-for-Large-Language-Models
• [chain of thought reasoning]( ssbuild/alaca_chain-of-thought)

I then used LangChain evaluators (GPT-4 as judge), and track everything in LangSmith. I made public links to the traces where you can inspect the runs.

I hope you find this helpful, and I am certainly open to feedback, criticisms, or ways to improve.

Cheers:

You can find the notebook here: https://colab.research.google.com/drive/1RHzg0FD46kKbiGfTdZw9Fo-DqWzajuoi?usp=sharing

harpreetsahota

posted an update 11 months ago

Post

✌🏼Two new models dropped today 👇🏽

1) 👩🏾‍💻 𝐃𝐞𝐜𝐢𝐂𝐨𝐝𝐞𝐫-𝟔𝐁

👉🏽 Supports 𝟖 𝐥𝐚𝐧𝐠𝐮𝐚𝐠𝐞𝐬: C, C# C++, GO, Rust, Python, Java, and Javascript.

👉🏽 Released under the 𝐀𝐩𝐚𝐜𝐡𝐞 𝟐.𝟎 𝐥𝐢𝐜𝐞𝐧𝐬𝐞

🥊 𝐏𝐮𝐧𝐜𝐡𝐞𝐬 𝐚𝐛𝐨𝐯𝐞 𝐢𝐭𝐬 𝐰𝐞𝐢𝐠𝐡𝐭 𝐜𝐥𝐚𝐬𝐬 𝐨𝐧 𝐇𝐮𝐦𝐚𝐧𝐄𝐯𝐚𝐥: Beats out CodeGen 2.5 7B and StarCoder 7B on most supported languages. Has a 3-point lead over StarCoderBase 15.5B for Python

💻 𝑻𝒓𝒚 𝒊𝒕 𝒐𝒖𝒕:

🃏 𝐌𝐨𝐝𝐞𝐥 𝐂𝐚𝐫𝐝: Deci/DeciCoder-6B

📓 𝐍𝐨𝐭𝐞𝐛𝐨𝐨𝐤: https://colab.research.google.com/drive/1QRbuser0rfUiFmQbesQJLXVtBYZOlKpB

🪧 𝐇𝐮𝐠𝐠𝐢𝐧𝐠𝐅𝐚𝐜𝐞 𝐒𝐩𝐚𝐜𝐞: Deci/DeciCoder-6B-Demo

2) 🎨 𝐃𝐞𝐜𝐢𝐃𝐢𝐟𝐟𝐮𝐬𝐢𝐨𝐧 𝐯𝟐.𝟎

👉🏽 Produces quality images on par with Stable Diffusion v1.5, but 𝟐.𝟔 𝐭𝐢𝐦𝐞𝐬 𝐟𝐚𝐬𝐭𝐞𝐫 𝐢𝐧 𝟒𝟎% 𝐟𝐞𝐰𝐞𝐫 𝐢𝐭𝐞𝐫𝐚𝐭𝐢𝐨𝐧𝐬

👉🏽 Employs a 𝐬𝐦𝐚𝐥𝐥𝐞𝐫 𝐚𝐧𝐝 𝐟𝐚𝐬𝐭𝐞𝐫 𝐔-𝐍𝐞𝐭 𝐜𝐨𝐦𝐩𝐨𝐧𝐞𝐧𝐭 𝐰𝐡𝐢𝐜𝐡 𝐡𝐚𝐬 𝟖𝟔𝟎 𝐦𝐢𝐥𝐥𝐢𝐨𝐧 𝐩𝐚𝐫𝐚𝐦𝐞𝐭𝐞𝐫𝐬.

👉🏽 Uses an optimized scheduler, 𝐒𝐪𝐮𝐞𝐞𝐳𝐞𝐝𝐃𝐏𝐌++, which 𝐜𝐮𝐭𝐬 𝐝𝐨𝐰𝐧 𝐭𝐡𝐞 𝐧𝐮𝐦𝐛𝐞𝐫 𝐨𝐟 𝐬𝐭𝐞𝐩𝐬 𝐧𝐞𝐞𝐝𝐞𝐝 𝐭𝐨 𝐠𝐞𝐧𝐞𝐫𝐚𝐭𝐞 𝐚 𝐪𝐮𝐚𝐥𝐢𝐭𝐲 𝐢𝐦𝐚𝐠𝐞 𝐟𝐫𝐨𝐦 𝟏𝟔 𝐭𝐨 𝟏𝟎.

👉🏽 Released under the 𝐂𝐫𝐞𝐚𝐭𝐢𝐯𝐞𝐌𝐋 𝐎𝐩𝐞𝐧 𝐑𝐀𝐈𝐋++-𝐌 𝐋𝐢𝐜𝐞𝐧𝐬𝐞.

💻 𝑻𝒓𝒚 𝒊𝒕 𝒐𝒖𝒕:

🃏 𝐌𝐨𝐝𝐞𝐥 𝐂𝐚𝐫𝐝: Deci/DeciDiffusion-v2-0

📓 𝐍𝐨𝐭𝐞𝐛𝐨𝐨𝐤: https://colab.research.google.com/drive/11Ui_KRtK2DkLHLrW0aa11MiDciW4dTuB

🪧 𝐇𝐮𝐠𝐠𝐢𝐧𝐠𝐅𝐚𝐜𝐞 𝐒𝐩𝐚𝐜𝐞: Deci/DeciDiffusion-v2-0

Help support the projects by liking the model cards and the spaces!

Cheers and happy hacking!

abhishek

posted an update 11 months ago

Post

Happy to announce, brand new, open-source Hugging Face Competitions platform 🚀 Now, create a machine learning competition for your friends, colleagues or the world for FREE* and host it on Hugging Face: the AI community building the future. Creating a competition requires only two steps: pip install competitions, then run competitions create and create competition by answering a few questions 💥 Checkout the github repo: https://github.com/huggingface/competitions and docs: https://hf.co/docs/competitions