Papers
arxiv:2412.04092

GEITje 7B Ultra: A Conversational Model for Dutch

Published on Dec 5
Authors:

Abstract

Language models have rapidly evolved, predominantly focusing on English while often neglecting extensive pretraining in other languages. This approach has required initiatives to adapt powerful, English-centric models to other linguistic contexts through finetuning. For Dutch, such a recent endeavour is ``GEITje'' a model originally derived from the English-based Mistral 7B. Building on this fundamental work, the current research extends the capabilities of GEITje by supervised finetuning on newly created high-quality synthetic conversational datasets, along with an additional preference alignment procedure on a synthetic feedback dataset. Both the developed models and the created datasets are openly available.

Community

Sign up or log in to comment

Models citing this paper 3

Datasets citing this paper 6

Browse 6 datasets citing this paper

Spaces citing this paper 8

Collections including this paper 2