AuriAetherwiing commited on
Commit
66af367
1 Parent(s): cfb223c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -68
README.md CHANGED
@@ -1,17 +1,79 @@
1
  ---
2
  library_name: transformers
3
- license: llama3.3
 
4
  base_model: meta-llama/Llama-3.3-70B-Instruct
5
  tags:
6
  - generated_from_trainer
7
  model-index:
8
  - name: dev/shm/EVA-LLaMA-3.33-70B-v0.1
9
  results: []
 
 
 
 
 
 
 
 
 
 
 
10
  ---
11
 
12
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
- should probably proofread and complete it, then remove this comment. -->
14
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
16
  <details><summary>See axolotl config</summary>
17
 
@@ -386,67 +448,4 @@ deepspeed: deepspeed_configs/zero3_bf16.json
386
  weight_decay: 0.2
387
  ```
388
 
389
- </details><br>
390
-
391
- # dev/shm/EVA-LLaMA-3.33-70B-v0.1
392
-
393
- This model is a fine-tuned version of [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) on the None dataset.
394
- It achieves the following results on the evaluation set:
395
- - Loss: 1.0225
396
-
397
- ## Model description
398
-
399
- More information needed
400
-
401
- ## Intended uses & limitations
402
-
403
- More information needed
404
-
405
- ## Training and evaluation data
406
-
407
- More information needed
408
-
409
- ## Training procedure
410
-
411
- ### Training hyperparameters
412
-
413
- The following hyperparameters were used during training:
414
- - learning_rate: 3e-05
415
- - train_batch_size: 1
416
- - eval_batch_size: 1
417
- - seed: 42
418
- - distributed_type: multi-GPU
419
- - num_devices: 8
420
- - gradient_accumulation_steps: 8
421
- - total_train_batch_size: 64
422
- - total_eval_batch_size: 8
423
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
424
- - lr_scheduler_type: cosine
425
- - lr_scheduler_warmup_steps: 20
426
- - num_epochs: 3
427
-
428
- ### Training results
429
-
430
- | Training Loss | Epoch | Step | Validation Loss |
431
- |:-------------:|:------:|:----:|:---------------:|
432
- | 1.6108 | 0.0061 | 1 | 1.6226 |
433
- | 1.0653 | 0.2498 | 41 | 1.0166 |
434
- | 0.8656 | 0.4996 | 82 | 0.9681 |
435
- | 0.8904 | 0.7494 | 123 | 0.9443 |
436
- | 0.9196 | 0.9992 | 164 | 0.9317 |
437
- | 0.5136 | 1.2451 | 205 | 0.9584 |
438
- | 0.5903 | 1.4947 | 246 | 0.9509 |
439
- | 0.544 | 1.7443 | 287 | 0.9394 |
440
- | 0.5435 | 1.9939 | 328 | 0.9347 |
441
- | 0.2605 | 2.2420 | 369 | 1.0237 |
442
- | 0.2796 | 2.4916 | 410 | 1.0240 |
443
- | 0.305 | 2.7412 | 451 | 1.0220 |
444
- | 0.2457 | 2.9909 | 492 | 1.0225 |
445
-
446
-
447
- ### Framework versions
448
-
449
- - Transformers 4.45.1
450
- - Pytorch 2.5.1+cu124
451
- - Datasets 2.21.0
452
- - Tokenizers 0.20.3
 
1
  ---
2
  library_name: transformers
3
+ license: other
4
+ license_name: eva-llama3.3
5
  base_model: meta-llama/Llama-3.3-70B-Instruct
6
  tags:
7
  - generated_from_trainer
8
  model-index:
9
  - name: dev/shm/EVA-LLaMA-3.33-70B-v0.1
10
  results: []
11
+ datasets:
12
+ - anthracite-org/kalo-opus-instruct-22k-no-refusal
13
+ - Nopm/Opus_WritingStruct
14
+ - Gryphe/Sonnet3.5-SlimOrcaDedupCleaned
15
+ - Gryphe/Sonnet3.5-Charcard-Roleplay
16
+ - Gryphe/ChatGPT-4o-Writing-Prompts
17
+ - Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned
18
+ - Epiculous/SynthRP-Gens-v1.1-Filtered-n-Cleaned
19
+ - nothingiisreal/Reddit-Dirty-And-WritingPrompts
20
+ - allura-org/Celeste-1.x-data-mixture
21
+ - cognitivecomputations/dolphin-2.9.3
22
  ---
23
 
24
+ <h1>EVA LLaMA 3.33 70B v0.0</h1>
25
+
26
+
27
+ <p>
28
+ A RP/storywriting specialist model, full-parameter finetune of Llama-3.3-70B-Instruct on mixture of synthetic and natural data.<br>
29
+ It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and "flavor" of the resulting model.<br>
30
+ This model was built with Llama by Meta.<br>
31
+ </p>
32
+
33
+ <p>
34
+ <p>Prompt format is ChatML.</p><br>
35
+ <h3>Recommended sampler values:</h3>
36
+ <ul>
37
+ <li>Temperature: 1</li>
38
+ <li>Min-P: 0.05</li>
39
+ <li>Repetition Penalty: 1.03</li>
40
+ </ul>
41
+
42
+ <h3>Recommended SillyTavern preset (via CalamitousFelicitousness):</h3>
43
+ <ul><li><a href="https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-72B-v0.2/blob/main/EV01.json">Master import</a></li></ul>
44
+
45
+ </p>
46
+
47
+ <p>
48
+ <br>
49
+ <h3>
50
+ Training data:
51
+ </h3>
52
+ <ul>
53
+ <li>Celeste 70B 0.1 data mixture minus Opus Instruct subset. See that model's <a href=https://huggingface.co/nothingiisreal/L3.1-70B-Celeste-V0.1-BF16>card</a> for details.</li>
54
+ <li>Kalomaze's Opus_Instruct_25k dataset, filtered for refusals.</li>
55
+ <li>A subset (1k rows) of ChatGPT-4o-WritingPrompts by Gryphe</li>
56
+ <li>A subset (2k rows) of Sonnet3.5-Charcards-Roleplay by Gryphe</li>
57
+ <li>Synthstruct and SynthRP datasets by Epiculous</li>
58
+ <li>A subset from Dolphin-2.9.3, including filtered version of not_samantha and a small subset of systemchat.</li>
59
+ </ul>
60
+ <h3>
61
+ Training time and hardware:
62
+ </h3>
63
+ <ul><li>10 hours on 8xH100 SXM</a></li></ul><br>
64
+ </p>
65
+ <p>Model was created by Kearm, Auri and Cahvay.</p>
66
+ <h4>Special thanks:</h4><ul>
67
+ <li>to Cahvay for his work on dataset filtering.</li>
68
+ <li>to Gryphe, Lemmy, Kalomaze, Nopm, Epiculous and CognitiveComputations for the data</li>
69
+ <li>and to Allura-org for support, feedback, beta-testing and doing quality control of EVA models.</li></ul>
70
+
71
+ <h3>Licensing</h3>
72
+ <p>Llama-3.3-70B-Instruct by Meta is licensed under <a href=https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct/blob/main/LICENSE>Llama 3.3 Community License Agreement (further referred as L3.3 license)</a> and is a subject to <a href=https://www.llama.com/llama3_3/use-policy>Acceptable Use Policy for Llama Materials</a>.<br>
73
+ This derivative is free for personal, research and commercial use on terms of L3.3 license with one extra clause: <br>
74
+ - Infermatic Inc and any of its employees or paid associates cannot utilize, distribute, download, or otherwise make use of EVA models for any purpose.</p>
75
+
76
+
77
  [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
78
  <details><summary>See axolotl config</summary>
79
 
 
448
  weight_decay: 0.2
449
  ```
450
 
451
+ </details><br>