maomaocun
/

dLLM-Var-no-template

Model card Files Files and versions

maomaocun commited on about 1 month ago

Commit

3281e74

·

verified ·

1 Parent(s): 01fd92c

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -5,12 +5,12 @@ license: apache-2.0
 ## Model Description
-This model is a fine-tuned version of the LLaDA 8B Base model, obtained through a specialized Supervised Fine-Tuning (SFT) process. It innovatively discards the complex attention mask design typically associated with block diffusion, while preserving full attention mechanisms. This allows the model to achieve block diffusion-style inference efficiently—leveraging KV cache for streamlined generation, outputting an EOS token upon completion of the response, and utilizing template-based SFT to seamlessly exit the generation process.
 Key innovations:
 - **Full Attention Preservation**: Maintains standard full attention without the overhead of intricate masking.
 - **Block Diffusion Inference**: Enables iterative block-wise generation via KV cache management, ensuring coherent and controlled outputs.
-- **EOS and Template Handling**: Trained to naturally emit EOS tokens at response boundaries.
 This approach balances computational efficiency with high-quality generation, making it suitable for tasks requiring structured, multi-step reasoning.

 ## Model Description
+This model is a fine-tuned version of the LLaDA 8B Base model, obtained through a specialized Supervised Fine-Tuning (SFT) process. It innovatively discards the complex attention mask design typically associated with block diffusion, while preserving full attention mechanisms. This allows the model to achieve block diffusion-style inference efficiently—leveraging KV cache for streamlined generation, outputting an EOS token upon completion of the response to seamlessly exit the generation process.
 Key innovations:
 - **Full Attention Preservation**: Maintains standard full attention without the overhead of intricate masking.
 - **Block Diffusion Inference**: Enables iterative block-wise generation via KV cache management, ensuring coherent and controlled outputs.
+- **EOS Handling**: Trained to naturally emit EOS tokens at response boundaries.
 This approach balances computational efficiency with high-quality generation, making it suitable for tasks requiring structured, multi-step reasoning.