3 2 12

Sam Witteveen

samwit

sam_witteveen

AI & ML interests

conversational AI, NLU

Recent Activity

upvoted an article 4 months ago

Welcome, Gradio 5

liked a model 4 months ago

Mozilla/Llama-3.2-3B-Instruct-llamafile

updated a model 5 months ago

samwit/cot_gf_plat_2k_phi35_full

View all activity

Organizations

samwit's activity

upvoted an article 4 months ago

Article

Welcome, Gradio 5

Oct 9, 2024

• 125

liked a model 4 months ago

Mozilla/Llama-3.2-3B-Instruct-llamafile

Updated Jan 6 • 1.26k • 47

updated a model 5 months ago

samwit/cot_gf_plat_2k_phi35_full

Updated Sep 25, 2024

New activity in google/gemma-2-9b-it 8 months ago

Is there a ver of Transformers that we can use to run this yet?

#1 opened 8 months ago by

samwit

liked a dataset 8 months ago

arcee-ai/BAAI-Infinity-Instruct-System

Viewer • Updated Jun 24, 2024 • 2.36M • 145 • 15

updated a model 9 months ago

samwit/paligemma_vqav2

Updated May 26, 2024 • 5

liked a dataset 9 months ago

glaiveai/glaive-function-calling-v2

Viewer • Updated Sep 27, 2023 • 113k • 693 • 411

liked a dataset 10 months ago

Anthropic/persuasion

Viewer • Updated Apr 9, 2024 • 3.94k • 430 • 181

liked a dataset 12 months ago

HuggingFaceTB/cosmopedia

Viewer • Updated Aug 12, 2024 • 31.1M • 9.36k • 578

reacted to akhaliq's post with 👍 12 months ago

Post

In Search of Needles in a 10M Haystack

Recurrent Memory Finds What LLMs Miss

paper addresses the challenge of processing long documents using generative transformer models. To evaluate different approaches, we introduce BABILong, a new benchmark designed to assess model capabilities in extracting and processing distributed facts within extensive texts. Our evaluation, which includes benchmarks for GPT-4 and RAG, reveals that common methods are effective only for sequences up to 10^4 elements. In contrast, fine-tuning GPT-2 with recurrent memory augmentations enables it to handle tasks involving up to 10^7 elements. This achievement marks a substantial leap, as it is by far the longest input processed by any open neural network model to date, demonstrating a significant improvement in the processing capabilities for long sequences