Arxiv-CS-RAG / sample_outputs.json
bishmoy's picture
added sample_outputs.json
7b3311b verified
raw
history blame
1.96 kB
{"search_placeholder": "### 10 Oct 2023 | [Mistral 7B](https://arxiv.org/abs/2310.06825) | [\u2b07\ufe0f](https://arxiv.org/pdf/2310.06825)\n*Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, L'elio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timoth'ee Lacroix, William El Sayed* \n\nWe introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered\nfor superior performance and efficiency. Mistral 7B outperforms Llama 2 13B\nacross all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and\ncode generation. Our model leverages grouped-query attention (GQA) for faster\ninference, coupled with sliding window attention (SWA) to effectively handle\nsequences of arbitrary length with a reduced inference cost. We also provide a\nmodel fine-tuned to follow instructions, Mistral 7B -- Instruct, that surpasses\nthe Llama 2 13B -- Chat model both on human and automated benchmarks. Our\nmodels are released under the Apache 2.0 license.\n", "output_placeholder": "Mistral is a 7-billion-parameter language model engineered for superior performance and efficiency. It was introduced in the paper \"Mistral 7B: A Superior Large Language Model\" [1]. Mistral outperforms other language models like Llama 2 13B and Llama 1 34B in various benchmarks, including reasoning, mathematics, and code generation. The model uses grouped-query attention (GQA) and sliding window attention (SWA) for faster inference and handling sequences of arbitrary length with reduced inference cost. Additionally, a fine-tuned version of Mistral, Mistral 7B -- Instruct, was released, which surpasses Llama 2 13B -- Chat model on human and automated benchmarks [1]. \n[1] Mistral 7B: A Superior Large Language Model. (2023). Retrieved from https://arxiv.org/abs/2303.14311."}