bangla-llama-4bit / README.md
asif00's picture
Update README.md
67354a2 verified
|
raw
history blame
No virus
1.22 kB
---
language:
- bn
license: apache-2.0
tags:
- transformers
- llama
- trl
- sft
base_model: unsloth/llama-3-8b-bnb-4bit
library_name: transformers
pipeline_tag: question-answering
---
How to use it:
# Use a pipeline as a high-level helper
```python
from transformers import pipeline
pipe = pipeline("question-answering", model="asif00/bangla-llama-4bit")
```
# Load model directly
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("asif00/bangla-llama-4bit")
model = AutoModelForCausalLM.from_pretrained("asif00/bangla-llama-4bit")
```
# To get a cleaned up version of the response, you can use:
```python
def generate_response(question, context):
inputs = tokenizer([
prompt.format(
question,
context,
""
)
], return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=1024, use_cache=True)
responses = tokenizer.batch_decode(outputs, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
response_start = responses.find("### Response:") + len("### Response:")
response = responses[response_start:].strip()
return response
```