Update README.md
Browse files
README.md
CHANGED
@@ -52,7 +52,7 @@ The primary goal of releasing a patched version of this model was to address thi
|
|
52 |
|
53 |
## Details of the Adjustment
|
54 |
|
55 |
-
The [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) model was pulled directly from HuggingFace and loaded using transformers. Then, the input embedding and output embedding
|
56 |
|
57 |
The special (untrained & problematic) tokens can be found by locating the rows where the entire row of the embedding values are all zeros, which imply they were not trained during the pretraining phase of the model from Meta. Such untrained tokens could lead to heavy computational issues, like gradient explosions or `NaN` gradients, during downstream fine-tuning on specific tasks.
|
58 |
|
|
|
52 |
|
53 |
## Details of the Adjustment
|
54 |
|
55 |
+
The [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) model was pulled directly from HuggingFace and loaded using transformers. Then, the input embedding and output embedding values are retrieved using `model.get_input_embeddings().weight.data` and `model.get_output_embeddings().weight.data`. These 2 matrics are identical in shape, with each row representing a token id, and each column representing an embedding feature.
|
56 |
|
57 |
The special (untrained & problematic) tokens can be found by locating the rows where the entire row of the embedding values are all zeros, which imply they were not trained during the pretraining phase of the model from Meta. Such untrained tokens could lead to heavy computational issues, like gradient explosions or `NaN` gradients, during downstream fine-tuning on specific tasks.
|
58 |
|