SicariusSicariiStuff
commited on
Commit
•
f3a522c
1
Parent(s):
4445052
Update README.md
Browse files
README.md
CHANGED
@@ -18,7 +18,7 @@ language:
|
|
18 |
|
19 |
<details>
|
20 |
<summary><b>July 26th, 2024, moving on to LLAMA 3.1</b></summary>
|
21 |
-
One step forward, one step backward. Many issues were solved, but a few new ones were encountered. As I already updated in my "blog"(https://huggingface.co/SicariusSicariiStuff/Blog_And_Updates#july-26th-2024), I originally wanted to finetune
|
22 |
|
23 |
LLAMA 3.1 is **128k context**, which probably means that in practice it will be somewhat coherent at 32k context, as a guesstimate. Also, I've heard from several people who have done some early tests, that the new LLAMA 3.1 8B is even better than the new Mistral Nemo 12B. IDK if that's true, but overall LLAMA 3.1 does seem to be a much better version of the "regular" LLAMA 3.
|
24 |
|
|
|
18 |
|
19 |
<details>
|
20 |
<summary><b>July 26th, 2024, moving on to LLAMA 3.1</b></summary>
|
21 |
+
One step forward, one step backward. Many issues were solved, but a few new ones were encountered. As I already updated in my "blog"(https://huggingface.co/SicariusSicariiStuff/Blog_And_Updates#july-26th-2024), I originally wanted to finetune Gradient' 0.25M\1M\4M LLAMA3 8B model, but almost at the same time I concluded the model is really not that great in even 8k context, Zuck the CHAD dropped LLAMA 3.1.
|
22 |
|
23 |
LLAMA 3.1 is **128k context**, which probably means that in practice it will be somewhat coherent at 32k context, as a guesstimate. Also, I've heard from several people who have done some early tests, that the new LLAMA 3.1 8B is even better than the new Mistral Nemo 12B. IDK if that's true, but overall LLAMA 3.1 does seem to be a much better version of the "regular" LLAMA 3.
|
24 |
|