---
library_name: transformers
license: apache-2.0
language:
- ru
base_model:
- t-tech/T-lite-it-1.0
pipeline_tag: text-generation
---

# T-lite-it-1.0_Q4_0

<!-- Provide a quick summary of what the model is/does. -->

T-lite-it-1.0_Q4_0 is a quantized version of the **T-lite-it-1.0** model, originally based on the Qwen 2.5 7B architecture and fine-tuned for Russian-language tasks. This version is optimized for memory-constrained environments, making it suitable for fine-tuning and inference on GPUs with as little as **8GB VRAM**. The quantization was performed using **BitsAndBytes**, reducing the model to 4-bit precision.

## Model Description

<!-- Provide a longer summary of what this model is. -->

- **Language:** Russian
- **Base Model:** T-Lite-IT-1.0 (derived from Qwen 2.5 7B)
- **Quantization:** 4-bit precision using `BitsAndBytes`
- **Tasks:** Text generation, conversation, question answering, and chain-of-thought reasoning
- **Fine-Tuning Ready**: Ideal for further fine-tuning in low-resource environments.
- **VRAM Requirements**: Fine-tuning and inference possible with **8GB VRAM** or more.


## Usage

To load the model, ensure you have the required dependencies installed:
```bash
pip install transformers bitsandbytes
```

Then, load the model with the following code:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "MilyaShams/T-lite-it-1.0_Q4_0"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name, 
    load_in_4bit=True, 
    device_map="auto"
)
```

## Fine-Tuning

The model is designed for fine-tuning with resource constraints. Use tools like Hugging Face's `Trainer` or `peft` (Parameter-Efficient Fine-Tuning) to adapt the model to specific tasks.

Example configuration for fine-tuning:

- Batch Size: Adjust to fit within 8GB VRAM (e.g., batch_size=2).
- Gradient Accumulation: Use to simulate larger batch sizes.

<!-- ## Inference

Perform inference with low latency:

```python
input_text = "Привет! Как я могу помочь?"
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
output = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(output[0], skip_special_tokens=True))
``` -->


## Model Card Authors

[Milyausha Shamsutdinova](https://github.com/MilyaushaShamsutdinova)