maomaocun commited on
Commit
3281e74
·
verified ·
1 Parent(s): 01fd92c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -5,12 +5,12 @@ license: apache-2.0
5
 
6
  ## Model Description
7
 
8
- This model is a fine-tuned version of the LLaDA 8B Base model, obtained through a specialized Supervised Fine-Tuning (SFT) process. It innovatively discards the complex attention mask design typically associated with block diffusion, while preserving full attention mechanisms. This allows the model to achieve block diffusion-style inference efficiently—leveraging KV cache for streamlined generation, outputting an EOS token upon completion of the response, and utilizing template-based SFT to seamlessly exit the generation process.
9
 
10
  Key innovations:
11
  - **Full Attention Preservation**: Maintains standard full attention without the overhead of intricate masking.
12
  - **Block Diffusion Inference**: Enables iterative block-wise generation via KV cache management, ensuring coherent and controlled outputs.
13
- - **EOS and Template Handling**: Trained to naturally emit EOS tokens at response boundaries.
14
 
15
  This approach balances computational efficiency with high-quality generation, making it suitable for tasks requiring structured, multi-step reasoning.
16
 
 
5
 
6
  ## Model Description
7
 
8
+ This model is a fine-tuned version of the LLaDA 8B Base model, obtained through a specialized Supervised Fine-Tuning (SFT) process. It innovatively discards the complex attention mask design typically associated with block diffusion, while preserving full attention mechanisms. This allows the model to achieve block diffusion-style inference efficiently—leveraging KV cache for streamlined generation, outputting an EOS token upon completion of the response to seamlessly exit the generation process.
9
 
10
  Key innovations:
11
  - **Full Attention Preservation**: Maintains standard full attention without the overhead of intricate masking.
12
  - **Block Diffusion Inference**: Enables iterative block-wise generation via KV cache management, ensuring coherent and controlled outputs.
13
+ - **EOS Handling**: Trained to naturally emit EOS tokens at response boundaries.
14
 
15
  This approach balances computational efficiency with high-quality generation, making it suitable for tasks requiring structured, multi-step reasoning.
16