Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
19.4
TFLOPS
21
Mads
PRO
mhenrichsen
Follow
AddiH's profile picture
Zamenhof's profile picture
bostani2's profile picture
43 followers
·
1 following
mhenrichsen
AI & ML interests
None yet
Recent Activity
replied
to
singhsidhukuldeep
's
post
10 days ago
Fascinating new research alert! Just read a groundbreaking paper on understanding Retrieval-Augmented Generation (RAG) systems and their performance factors. Key insights from this comprehensive study: >> Architecture Deep Dive The researchers analyzed RAG systems across 6 datasets (3 code-related, 3 QA-focused) using multiple LLMs. Their investigation revealed critical insights into four key design factors: Document Types Impact: • Oracle documents (ground truth) aren't always optimal • Distracting documents significantly degrade performance • Surprisingly, irrelevant documents boost code generation by up to 15.6% Retrieval Precision: • Performance varies dramatically by task • QA tasks need 20-100% retrieval recall • Perfect retrieval still fails up to 12% of the time on previously correct instances Document Selection: • More documents ≠ better results • Adding documents can cause errors on previously correct samples • Performance degradation increases ~1% per 5 additional documents in code tasks Prompt Engineering: • Most advanced prompting techniques underperform simple zero-shot prompts • Technique effectiveness varies significantly across models and tasks • Complex prompts excel at difficult problems but struggle with simple ones >> Technical Implementation The study utilized: • Multiple retrievers including BM25, dense retrievers, and specialized models • Comprehensive corpus of 70,956 unique API documents • Over 200,000 API calls and 1,000+ GPU hours of computation • Sophisticated evaluation metrics tracking both correctness and system confidence 💡 Key takeaway: RAG system optimization requires careful balancing of multiple factors - there's no one-size-fits-all solution.
replied
to
julien-c
's
post
10 days ago
After some heated discussion 🔥, we clarify our intent re. storage limits on the Hub TL;DR: - public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible - private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise) docs: https://huggingface.co/docs/hub/storage-limits We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community 🔥 cc: @reach-vb @pierric @victor and the HF team
new
activity
28 days ago
syvai/hviske-v2:
Tegnsætning og store bogstaver
View all activity
Organizations
spaces
2
Sort: Recently updated
Sleeping
🚀
Axolotl_Launcher
Runtime error
4
🚀
DanskGPT
models
13
Sort: Recently updated
mhenrichsen/gemma-2b-it
Text Generation
•
Updated
Feb 21
•
25
mhenrichsen/gemma-2b
Text Generation
•
Updated
Feb 21
•
550
•
1
mhenrichsen/gemma-7b-it
Text Generation
•
Updated
Feb 21
•
7
mhenrichsen/gemma-7b
Text Generation
•
Updated
Feb 21
•
1.27k
•
4
mhenrichsen/danskgpt-tiny-chat
Text Generation
•
Updated
Jan 27
•
39
•
12
mhenrichsen/hestenettetLM
Text Generation
•
Updated
Jan 16
•
800
•
3
mhenrichsen/danskgpt-tiny
Text Generation
•
Updated
Jan 13
•
3.16k
•
18
mhenrichsen/tinymix-8x1b
Text Generation
•
Updated
Jan 2
•
17
•
1
mhenrichsen/hviske
Automatic Speech Recognition
•
Updated
Dec 9, 2023
•
75
•
17
mhenrichsen/context-aware-splitter-1b-english
Text Generation
•
Updated
Nov 30, 2023
•
20
•
8
Expand 13 models
datasets
6
Sort: Recently updated
mhenrichsen/creator
Viewer
•
Updated
Dec 12, 2023
•
1k
•
37
mhenrichsen/context-aware-splits-english
Viewer
•
Updated
Nov 16, 2023
•
28k
•
49
•
5
mhenrichsen/hestenettet
Viewer
•
Updated
Nov 5, 2023
•
14.5k
•
52
•
3
mhenrichsen/terra
Viewer
•
Updated
Sep 27, 2023
•
25.4M
•
656
mhenrichsen/context-aware-splits
Viewer
•
Updated
Sep 17, 2023
•
12.3k
•
49
•
3
mhenrichsen/alpaca_2k_test
Viewer
•
Updated
Jul 22, 2023
•
2k
•
11.9k
•
25