DAMO-NLP-SG
/

CLEX-7B-16K

@@ -34,11 +34,8 @@ class CLEXLlamaConfig(LlamaConfig):
     This is the configuration class to store the configuration of a [`LlamaModel`]. It is used to instantiate an LLaMA
     model according to the specified arguments, defining the model architecture. Instantiating a configuration with the
     defaults will yield a similar configuration to that of the LLaMA-7B.
     Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
     documentation from [`PretrainedConfig`] for more information.
     Args:
         vocab_size (`int`, *optional*, defaults to 32000):
             Vocabulary size of the LLaMA model. Defines the number of different tokens that can be represented by the
@@ -86,18 +83,13 @@ class CLEXLlamaConfig(LlamaConfig):
             these scaling strategies behave:
             https://www.reddit.com/r/LocalLLaMA/comments/14mrgpr/dynamically_scaled_rope_further_increases/. This is an
             experimental feature, subject to breaking API changes in future versions.
         Example:
     ```python
     >>> from transformers import LlamaModel, LlamaConfig
     >>> # Initializing a LLaMA llama-7b style configuration
     >>> configuration = LlamaConfig()
     >>> # Initializing a model from the llama-7b style configuration
     >>> model = LlamaModel(configuration)
     >>> # Accessing the model configuration
     >>> configuration = model.config
     ```"""
@@ -118,7 +110,6 @@ class CLEXLlamaConfig(LlamaConfig):
         self.log_scale = log_scale
         self.rope_theta = 10000
         self.max_position_embeddings = 4096
-        self.data_length = 4096
         self.rope_scaling = rope_scaling
         self._rope_scaling_validation()

     This is the configuration class to store the configuration of a [`LlamaModel`]. It is used to instantiate an LLaMA
     model according to the specified arguments, defining the model architecture. Instantiating a configuration with the
     defaults will yield a similar configuration to that of the LLaMA-7B.
     Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
     documentation from [`PretrainedConfig`] for more information.
     Args:
         vocab_size (`int`, *optional*, defaults to 32000):
             Vocabulary size of the LLaMA model. Defines the number of different tokens that can be represented by the
             these scaling strategies behave:
             https://www.reddit.com/r/LocalLLaMA/comments/14mrgpr/dynamically_scaled_rope_further_increases/. This is an
             experimental feature, subject to breaking API changes in future versions.
         Example:
     ```python
     >>> from transformers import LlamaModel, LlamaConfig
     >>> # Initializing a LLaMA llama-7b style configuration
     >>> configuration = LlamaConfig()
     >>> # Initializing a model from the llama-7b style configuration
     >>> model = LlamaModel(configuration)
     >>> # Accessing the model configuration
     >>> configuration = model.config
     ```"""
         self.log_scale = log_scale
         self.rope_theta = 10000
         self.max_position_embeddings = 4096
         self.rope_scaling = rope_scaling
         self._rope_scaling_validation()