Crystalcareai commited on
Commit
1b82d97
1 Parent(s): 4cc235f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -20,7 +20,7 @@ Llama-3-SEC is a state-of-the-art domain-specific large language model trained o
20
  ## Model Details
21
 
22
  - **Base Model:** Meta-Llama-3-70B-Instruct
23
- - **Training Data:** 70B tokens of SEC filings data, carefully mixed with 1B tokens of general data from Together AI's RedPajama dataset to maintain a balance between domain-specific knowledge and general language understanding
24
  - **Training Method:** Continual Pre-Training (CPT) using the Megatron-Core framework, followed by model merging with the base model using the state-of-the-art TIES merging technique in the Arcee Mergekit toolkit
25
  - **Training Infrastructure:** AWS SageMaker HyperPod cluster with 4 nodes, each equipped with 32 H100 GPUs, ensuring efficient and scalable training of this massive language model
26
 
 
20
  ## Model Details
21
 
22
  - **Base Model:** Meta-Llama-3-70B-Instruct
23
+ - **Training Data:** 19B tokens of SEC filings data, carefully mixed with 1B tokens of general data from Together AI's RedPajama dataset: [RedPajama-Data-1T](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T) to maintain a balance between domain-specific knowledge and general language understanding
24
  - **Training Method:** Continual Pre-Training (CPT) using the Megatron-Core framework, followed by model merging with the base model using the state-of-the-art TIES merging technique in the Arcee Mergekit toolkit
25
  - **Training Infrastructure:** AWS SageMaker HyperPod cluster with 4 nodes, each equipped with 32 H100 GPUs, ensuring efficient and scalable training of this massive language model
26