Clem ๐Ÿค— PRO

clem

AI & ML interests

multi-modal, time-series, biology and chemistry

Organizations

clem's activity

replied to danielhanchen's post 7 days ago
replied to fdaudens's post 9 days ago
replied to gsarti's post 9 days ago
posted an update 11 days ago
posted an update 12 days ago
view post
Post
2651
Already almost 1,000 llama3 model variations have been shared publicly on HF (many more in private use at companies): https://huggingface.co/models?p=5&sort=trending&search=llama3.

Everyone should fine-tune their own models for their use-cases, languages, industry, infra constraints,...

10,000 llama3 variants by the end of next week?
ยท
replied to visheratin's post 16 days ago
posted an update 17 days ago
view post
Post
2637
We noticed that all the open-source models and datasets from https://huggingface.co/WizardLM in their personal Hugging Face account & in the Microsoft Hugging Face organization (https://huggingface.co/microsoft) have been made private by the author, which will lead some demos to fail (these models were collectively downloaded over a hundred thousand times a month).

This is the explanation that @WizardLM communicated a few hours ago: https://huggingface.co/posts/WizardLM/329547800484476#661e0d17bca1a6038b60503e

We apologize for the inconvenience & are trying to get in touch with the author & Microsoft in order to try to find a good resolution for community members. Let us know if you have any questions!
  • 1 reply
ยท
posted an update 18 days ago
posted an update 29 days ago
view post
Post
2490
Introducing gretelai/synthetic_text_to_sql by https://huggingface.co/gretelai

It stands as the largest and most diverse synthetic Text-to-SQL dataset available to-date.

The dataset includes:

- 105,851 records partitioned into 100,000 train and 5,851 test records
~23M total tokens, including ~12M SQL tokens
- Coverage across 100 distinct domains/verticals
- Comprehensive array of SQL tasks: data definition, retrieval, manipulation, analytics & reporting
- Wide range of SQL complexity levels, including subqueries, single joins, multiple joins, aggregations, window functions, set operations
- Database context, including table and view create statements
- Natural language explanations of what the SQL query is doing
- Contextual tags to optimize model training

Blogpost: https://gretel.ai/blog/synthetic-text-to-sql-dataset
Dataset: gretelai/synthetic_text_to_sql
  • 1 reply
ยท
replied to Smooke's post 30 days ago
replied to julien-c's post about 1 month ago
posted an update 2 months ago
view post
Post
Terribly excited about open-source + on-device AI these days! Great to see @qualcomm release 80+ models optimized and curated for their devices and chips on HF: https://huggingface.co/qualcomm

  • 1 reply
ยท
replied to dvilasuero's post 2 months ago
view reply

Unpopular opinion: this is the most impactful release of the day (because open)!

replied to DmitryRyumin's post 2 months ago
view reply

would be cool to have some integration with the HF hub

replied to trisfromgoogle's post 2 months ago
replied to stas's post 3 months ago
replied to victor's post 3 months ago
replied to manu's post 3 months ago
view reply

๐Ÿ‡ซ๐Ÿ‡ท๐Ÿ‡ซ๐Ÿ‡ท๐Ÿ‡ซ๐Ÿ‡ท

replied to dvilasuero's post 3 months ago
replied to clefourrier's post 3 months ago
replied to julien-c's post 3 months ago
posted an update 3 months ago
posted an update 3 months ago
view post
Post
With the Google announcement last week, I think we're now officially the only AI startup out there who has commercial collaborations with all the major cloud providers (AWS, GCP, Azure) and hardware providers (Nvidia, AMD, Intel, Qualcomm,...), making our vision of being the independent and agnostic platform for all AI builders truer than ever!

Let's go!
posted an update 3 months ago
replied to jimfan's post 3 months ago
posted an update 3 months ago
replied to abidlabs's post 3 months ago
replied to Norod78's post 3 months ago
posted an update 4 months ago
view post
Post
Most upvoted papers of 2023 on HF. What do you think are going to be the most prominent research topics in AI for 2024 (also, don't forget to add your papers to the hub this year!).

From: hysts/daily-papers
  • 1 reply
ยท
replied to their post 4 months ago
posted an update 4 months ago
view post
Post
Is synthetic data the future of AI? ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ

@HugoLaurencon @Leyo & @VictorSanh are introducing HuggingFaceM4/WebSight , a multimodal dataset featuring 823,000 pairs of synthetically generated HTML/CSS codes along with screenshots of the corresponding rendered websites to train GPT4-V-like models ๐ŸŒ๐Ÿ’ป

While crafting their upcoming foundation vision language model, they faced the challenge of converting website screenshots into usable HTML/CSS codes. Most VLMs suck at this and there was no public dataset available for this specific task, so they decided to create their own.

They prompted existing LLMs to generate 823k HTML/CSS codes of very simple websites. Through supervised fine-tuning of a vision language model on WebSight, they were able to generate the code to reproduce a website component, given a screenshot.

You can explore the dataset here: HuggingFaceM4/WebSight

What do you think?
ยท
replied to abhishek's post 4 months ago