Update README.md
Browse files
README.md
CHANGED
|
@@ -5,12 +5,12 @@ license: apache-2.0
|
|
| 5 |
|
| 6 |
## Model Description
|
| 7 |
|
| 8 |
-
This model is a fine-tuned version of the LLaDA 8B Base model, obtained through a specialized Supervised Fine-Tuning (SFT) process. It innovatively discards the complex attention mask design typically associated with block diffusion, while preserving full attention mechanisms. This allows the model to achieve block diffusion-style inference efficiently—leveraging KV cache for streamlined generation, outputting an EOS token upon completion of the response
|
| 9 |
|
| 10 |
Key innovations:
|
| 11 |
- **Full Attention Preservation**: Maintains standard full attention without the overhead of intricate masking.
|
| 12 |
- **Block Diffusion Inference**: Enables iterative block-wise generation via KV cache management, ensuring coherent and controlled outputs.
|
| 13 |
-
- **EOS
|
| 14 |
|
| 15 |
This approach balances computational efficiency with high-quality generation, making it suitable for tasks requiring structured, multi-step reasoning.
|
| 16 |
|
|
|
|
| 5 |
|
| 6 |
## Model Description
|
| 7 |
|
| 8 |
+
This model is a fine-tuned version of the LLaDA 8B Base model, obtained through a specialized Supervised Fine-Tuning (SFT) process. It innovatively discards the complex attention mask design typically associated with block diffusion, while preserving full attention mechanisms. This allows the model to achieve block diffusion-style inference efficiently—leveraging KV cache for streamlined generation, outputting an EOS token upon completion of the response to seamlessly exit the generation process.
|
| 9 |
|
| 10 |
Key innovations:
|
| 11 |
- **Full Attention Preservation**: Maintains standard full attention without the overhead of intricate masking.
|
| 12 |
- **Block Diffusion Inference**: Enables iterative block-wise generation via KV cache management, ensuring coherent and controlled outputs.
|
| 13 |
+
- **EOS Handling**: Trained to naturally emit EOS tokens at response boundaries.
|
| 14 |
|
| 15 |
This approach balances computational efficiency with high-quality generation, making it suitable for tasks requiring structured, multi-step reasoning.
|
| 16 |
|