--- library_name: transformers license: apache-2.0 language: - ru base_model: - t-tech/T-lite-it-1.0 pipeline_tag: text-generation --- # T-lite-it-1.0_Q4_0 T-lite-it-1.0_Q4_0 is a quantized version of the **T-lite-it-1.0** model, originally based on the Qwen 2.5 7B architecture and fine-tuned for Russian-language tasks. This version is optimized for memory-constrained environments, making it suitable for fine-tuning and inference on GPUs with as little as **8GB VRAM**. The quantization was performed using **BitsAndBytes**, reducing the model to 4-bit precision. ## Model Description - **Language:** Russian - **Base Model:** T-Lite-IT-1.0 (derived from Qwen 2.5 7B) - **Quantization:** 4-bit precision using `BitsAndBytes` - **Tasks:** Text generation, conversation, question answering, and chain-of-thought reasoning - **Fine-Tuning Ready**: Ideal for further fine-tuning in low-resource environments. - **VRAM Requirements**: Fine-tuning and inference possible with **8GB VRAM** or more. ## Usage To load the model, ensure you have the required dependencies installed: ```bash pip install transformers bitsandbytes ``` Then, load the model with the following code: ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "MilyaShams/T-lite-it-1.0_Q4_0" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, load_in_4bit=True, device_map="auto" ) ``` ## Fine-Tuning The model is designed for fine-tuning with resource constraints. Use tools like Hugging Face's `Trainer` or `peft` (Parameter-Efficient Fine-Tuning) to adapt the model to specific tasks. Example configuration for fine-tuning: - Batch Size: Adjust to fit within 8GB VRAM (e.g., batch_size=2). - Gradient Accumulation: Use to simulate larger batch sizes. ## Model Card Authors [Milyausha Shamsutdinova](https://github.com/MilyaushaShamsutdinova)