ctx window & languages?

#1
by JosephusCheung - opened
  1. max_position_embeddings = 4096 in config.json, is it the same as the context window?
  2. What languages are included in the training data? Are languages other than Chinese and English explicitly excluded?
This comment has been hidden

It has definitely been trained on Japanese.

Brackets are my translations:

Me: What is the meaning of life?
DeepSeek-LLM-67B-chat: The meaning of life is a philosophical question that has been debated throughout history. [response truncated]
Me: 日本語が分かりますか? [Do you understand Japanese?]
DeepSeek-LLM-67B-chat: はい、日本語を理解します。どのようにお手伝いできますか? [Yes, I understand Japanese, how can I help?]
Me: 最初の質問に対するあなたの答えを訳していただけませんか? [Could you translate the response to my first question?]
DeepSeek-LLM-67B-chat: はい、当然です。 [Yes, of course. (quite unnatural)]

「人生の意味とは何ですか?」[What is the meaning of life?]

[Translates what it defined as the meaning of life]

It understands Japanese well, but the text it produces is very much like that of Google Translate - understandable but not fully natural.

I actually wanted to ask: The main languages are Chinese and English, but I wonder if other languages have been strictly filtered out in the data. For example, in datasets like CC, there are some other languages mixed in with Chinese and English. It seems that the YI model has strictly filtered these out, resulting in poor translation performance and subpar annotations for Wikipedia terms.

DeepSeek org

In response to your first question, yes, the current context window for DeepSeek LLM is set at 4K. We are actively working on developing a version with an extended context window to better accommodate longer sequences. This enhanced version will be released as soon as it's ready.

Regarding your second query, we did not intentionally exclude other languages in our training dataset. However, it predominantly comprises English and Chinese content. As a result, while the model is capable of processing various languages, its performance in languages other than English and Chinese may not be as robust.

We appreciate your understanding and interest in our project's development.

zdaxie changed discussion status to closed

Sign up or log in to comment