karmiq (Karel Minarik)

upvoted an article 7 months ago

Article

🇨🇿 BenCzechMark - Can your LLM Understand Czech?

Oct 1, 2024

• 20

upvoted a paper 10 months ago

Segment Any Text: A Universal Approach for Robust, Efficient and Adaptable Sentence Segmentation

Paper • 2406.16678 • Published Jun 24, 2024 • 16

upvoted a collection 10 months ago

Nemotron 4 340B

Collection

Nemotron-4: open models for Synthetic Data Generation (SDG). Includes Base, Instruct, and Reward models. • 4 items • Updated 4 days ago • 162

upvoted a paper 11 months ago

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Paper • 2406.01574 • Published Jun 3, 2024 • 47

upvoted an article 11 months ago

Article

Training and Finetuning Embedding Models with Sentence Transformers v3

May 28, 2024

• 210

upvoted a paper about 1 year ago

Anticipatory Music Transformer

Paper • 2306.08620 • Published Jun 14, 2023 • 9

upvoted a collection about 1 year ago

Czech evaluation datasets

Collection

This collections should contain czech evaluation datasets • 8 items • Updated Jan 14, 2024 • 3

upvoted 3 papers about 1 year ago

upvoted 7 papers over 1 year ago

Text Embeddings Reveal (Almost) As Much As Text

Paper • 2310.06816 • Published Oct 10, 2023 • 1

Shai: A large language model for asset management

Paper • 2312.14203 • Published Dec 21, 2023 • 6

Borges and AI

Paper • 2310.01425 • Published Sep 27, 2023 • 2

Recursively Summarizing Books with Human Feedback

Paper • 2109.10862 • Published Sep 22, 2021 • 1

An In-depth Look at Gemini's Language Abilities

Paper • 2312.11444 • Published Dec 18, 2023 • 1

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

Paper • 2101.00027 • Published Dec 31, 2020 • 6

Jailbroken: How Does LLM Safety Training Fail?

Paper • 2307.02483 • Published Jul 5, 2023 • 13

upvoted a collection over 1 year ago

Zephyr 7B

Collection

Models, datasets, and demos associated with Zephyr 7B. For code to train the models, see: https://github.com/huggingface/alignment-handbook • 9 items • Updated Apr 12, 2024 • 148

upvoted a paper over 1 year ago

FinGPT: Large Generative Models for a Small Language

Paper • 2311.05640 • Published Nov 3, 2023 • 32

Karel Minarik

AI & ML interests

Organizations

karmiq's activity

🇨🇿 BenCzechMark - Can your LLM Understand Czech?

Segment Any Text: A Universal Approach for Robust, Efficient and Adaptable Sentence Segmentation

Nemotron 4 340B

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Training and Finetuning Embedding Models with Sentence Transformers v3

Anticipatory Music Transformer

Czech evaluation datasets

Retrieval-Augmented Generation for Large Language Models: A Survey

Improving Text Embeddings with Large Language Models

Multilingual E5 Text Embeddings: A Technical Report

Text Embeddings Reveal (Almost) As Much As Text

Shai: A large language model for asset management

Borges and AI

Recursively Summarizing Books with Human Feedback

An In-depth Look at Gemini's Language Abilities

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

Jailbroken: How Does LLM Safety Training Fail?

Zephyr 7B

FinGPT: Large Generative Models for a Small Language