Crystalcareai commited on
Commit
c6b7bd7
1 Parent(s): 7514bd7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -11
README.md CHANGED
@@ -36,23 +36,35 @@ The model's deep understanding of SEC filings and related financial data makes i
36
 
37
  ## Evaluation
38
 
 
 
 
 
39
  To ensure the robustness and effectiveness of Llama-3-SEC, the model has undergone rigorous evaluation on both domain-specific and general benchmarks. Key evaluation metrics include:
40
 
41
- <div style="display: flex; flex-wrap: wrap; justify-content: center;">
42
- <div style="flex: 1 1 400px; margin: 10px;">
43
- <img src="https://i.ibb.co/xGHRfLf/Screenshot-2024-06-11-at-10-23-59-PM.png" alt="Domain Specific Perplexity of Model Variants" width="100%">
44
- </div>
45
- <div style="flex: 1 1 400px; margin: 10px;">
46
- <img src="https://i.ibb.co/2v6PdDx/Screenshot-2024-06-11-at-10-25-03-PM.png" alt="Domain Specific Evaluations of Model Variants" width="100%">
47
- </div>
48
- </div>
49
- <div style="text-align: center;">
50
- <img src="https://i.ibb.co/K5d0wMh/Screenshot-2024-06-11-at-10-23-18-PM.png" alt="General Evaluations of Model Variants" width="80%">
51
- </div>
52
 
 
 
 
 
 
 
 
53
 
54
  The evaluation results demonstrate significant improvements in domain-specific performance while maintaining strong general capabilities, thanks to the use of advanced CPT and model merging techniques.
55
 
 
 
 
 
 
 
 
56
  ## Training and Inference
57
 
58
  Llama-3-SEC has been trained using the llama3 chat template, which allows for efficient and effective fine-tuning of the model on the SEC data. This template ensures that the model maintains its strong conversational abilities while incorporating the domain-specific knowledge acquired during the CPT process.
 
36
 
37
  ## Evaluation
38
 
39
+ Here's the updated version with smaller images using Markdown:
40
+
41
+ ## Evaluation
42
+
43
  To ensure the robustness and effectiveness of Llama-3-SEC, the model has undergone rigorous evaluation on both domain-specific and general benchmarks. Key evaluation metrics include:
44
 
45
+ - Domain-specific perplexity, measuring the model's performance on SEC-related data
46
+
47
+ ![Domain Specific Perplexity of Model Variants](https://i.ibb.co/xGHRfLf/Screenshot-2024-06-11-at-10-23-59-PM.png){width=400}
48
+
49
+ - Extractive numerical reasoning tasks, using subsets of TAT-QA and ConvFinQA datasets
 
 
 
 
 
 
50
 
51
+ ![Domain Specific Evaluations of Model Variants](https://i.ibb.co/2v6PdDx/Screenshot-2024-06-11-at-10-25-03-PM.png){width=400}
52
+
53
+ - General evaluation metrics, such as BIG-bench, AGIEval, GPT4all, and TruthfulQA, to assess the model's performance on a wide range of tasks
54
+
55
+ ![General Evaluations of Model Variants](https://i.ibb.co/K5d0wMh/Screenshot-2024-06-11-at-10-23-18-PM.png){width=600}
56
+
57
+ - General perplexity on various datasets, including bigcode/starcoderdata, open-web-math/open-web-math, allenai/peS2o, mattymchen/refinedweb-3m, and Wikitext
58
 
59
  The evaluation results demonstrate significant improvements in domain-specific performance while maintaining strong general capabilities, thanks to the use of advanced CPT and model merging techniques.
60
 
61
+ In this version:
62
+ - The images are resized using the `{width=X}` syntax in Markdown, where `X` represents the desired width in pixels.
63
+ - The first two images are set to a width of 400 pixels.
64
+ - The last image is set to a width of 600 pixels.
65
+
66
+ This approach keeps the original structure and text of the evaluation section while simply making the images smaller using Markdown syntax.
67
+
68
  ## Training and Inference
69
 
70
  Llama-3-SEC has been trained using the llama3 chat template, which allows for efficient and effective fine-tuning of the model on the SEC data. This template ensures that the model maintains its strong conversational abilities while incorporating the domain-specific knowledge acquired during the CPT process.