jrc commited on
Commit
2bde5f8
1 Parent(s): dc1a633

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -4
README.md CHANGED
@@ -37,6 +37,8 @@ Phi3 was trained using [torchtune](https://github.com/pytorch/torchtune) and the
37
  tune run lora_finetune_distributed.py --config mini_lora.yaml
38
  ```
39
 
 
 
40
  ### Training Data
41
 
42
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
@@ -45,13 +47,12 @@ This model was finetuned on the following datasets:
45
 
46
  * TIGER-Lab/MATH-plus: An advanced math-specific dataset with 894k samples.
47
 
48
-
49
  #### Hardware
50
 
51
  4 x NVIDIA A100 GPUs
52
 
53
  Max VRAM used per GPU: 29 GB
54
- Real time: 12 hours
55
 
56
  ## Evaluation
57
 
@@ -64,7 +65,6 @@ tune run eleuther_eval --config eleuther_evaluation \
64
  batch_size=32
65
  ```
66
 
67
-
68
  | Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
69
  |------------------------------------|-------|------|-----:|-----------|-----:|---|-----:|
70
  |minerva_math |N/A |none | 4|exact_match|0.1670|± |0.0051|
@@ -76,7 +76,8 @@ tune run eleuther_eval --config eleuther_evaluation \
76
  | - minerva_math_prealgebra | 1|none | 4|exact_match|0.3077|± |0.0156|
77
  | - minerva_math_precalc | 1|none | 4|exact_match|0.0623|± |0.0104|
78
 
 
79
 
80
  ## Model Card Contact
81
 
82
- [More Information Needed]
 
37
  tune run lora_finetune_distributed.py --config mini_lora.yaml
38
  ```
39
 
40
+ You can see a full Weights & Biases run [here](https://api.wandb.ai/links/jcummings/hkey76vj).
41
+
42
  ### Training Data
43
 
44
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
 
47
 
48
  * TIGER-Lab/MATH-plus: An advanced math-specific dataset with 894k samples.
49
 
 
50
  #### Hardware
51
 
52
  4 x NVIDIA A100 GPUs
53
 
54
  Max VRAM used per GPU: 29 GB
55
+ Real time: 10 hours
56
 
57
  ## Evaluation
58
 
 
65
  batch_size=32
66
  ```
67
 
 
68
  | Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
69
  |------------------------------------|-------|------|-----:|-----------|-----:|---|-----:|
70
  |minerva_math |N/A |none | 4|exact_match|0.1670|± |0.0051|
 
76
  | - minerva_math_prealgebra | 1|none | 4|exact_match|0.3077|± |0.0156|
77
  | - minerva_math_precalc | 1|none | 4|exact_match|0.0623|± |0.0104|
78
 
79
+ This shows a large improvement over the base Phi3 Mini model.
80
 
81
  ## Model Card Contact
82
 
83
+ Drop me a line at @official_j3rck