suyash2739
/

English_to_Hinglish_cmu_hinglish_dog

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

English_to_Hinglish_cmu_hinglish_dog / README.md

suyash2739's picture

Update README.md

4d3ad25 verified 4 months ago

|

No virus

2.49 kB

	---
	language:
	- en
	- hi
	license: apache-2.0
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- llama
	- trl
	- Hinglish
	base_model: unsloth/llama-3-8b-Instruct-bnb-4bit
	datasets:
	- cmu_hinglish_dog
	- suyash2739/Hinglish
	---
	# Better model

	I have just deployed a better model than this on [https://huggingface.co/suyash2739/English_to_Hinglish_fintuned_lamma_3_8b_instruct ]

	# Loss Curve

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/65187b234965add2b08b2990/f-qJHUQGxN9yaXym_5u4V.png)

	# Evaluation Loss

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/65187b234965add2b08b2990/6VsNF_rgDjXlubd4x8dMk.png)


	# Colab Files:
	- Model_Use.ipynb file to use the model
	- Hinglish_train_lamma_3_8b_instruct_.ipynb to see how the model is trained

	# Inference:

	```
	!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
	!pip install --no-deps xformers trl peft accelerate bitsandbytes
	```

	```python
	from unsloth import FastLanguageModel
	import torch
	max_seq_length = 2048
	dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
	load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

	model, tokenizer = FastLanguageModel.from_pretrained(
	model_name = "suyash2739/English_to_Hinglish_cmu_hinglish_dog",
	max_seq_length = max_seq_length,
	dtype = dtype,
	load_in_4bit = load_in_4bit,
	)
	```

	```python
	prompt = """Translate the input from English to Hinglish to give the response.

	### Input:
	{}

	### Response:
	{}"""

	```

	```python

	inputs = tokenizer(
	[
	prompt.format(
	"""This is a fine-tuned Hinglish translation model using Llama 3.""", # input
	"", # output - leave this blank for generation!
	)
	], return_tensors = "pt").to("cuda")

	from transformers import TextStreamer
	text_streamer = TextStreamer(tokenizer)
	```

	```python
	_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 2048)
	## ye ek fine-tuned Hinglish translation model hai jisme Llama 3 use kiya gaya hai

	```



	# Uploaded model

	- Developed by: suyash2739
	- License: apache-2.0
	- Finetuned from model : unsloth/llama-3-8b-Instruct-bnb-4bit

	This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

	[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)