lumatic-ai
/

BongLlama-1.1B-Chat-alpha-v0

@@ -52,7 +52,7 @@ We are continuously working on training and developing this model and improve it
 - **Shared by [Optional]:** LumaticAI
 - **Model type:** Language model
 - **Language(s) (NLP):** en, bn
-- **License:** apache-2.0
 - **Parent Model:** TinyLlama/TinyLlama-1.1B-Chat-v1.0
@@ -82,6 +82,120 @@ We are continuously working on training and developing this model and improve it
 Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.
 # Training Details
 ## Training Data
@@ -215,117 +329,4 @@ lumatic-ai
 # Model Card Contact
-email : contact@lumaticai.com
-# How to Get Started with the Model
-Use the code below to get started with the model.
-<details>
-<summary> Click to expand </summary>
-### Pipeline
-```
-import torch
-from transformers import AutoModelForCausalLM, AutoTokenizer
-from transformers import pipeline
-def formatted_prompt(question)-> str:
-    return f"<|im_start|>user\n{question}<|im_end|>\n<|im_start|>assistant:"
-hub_model_name = "lumatic-ai/BongLlama-1.1B-Chat-alpha-v0"
-tokenizer = AutoTokenizer.from_pretrained(hub_model_name)
-pipe = pipeline(
-    "text-generation",
-    model=hub_model_name,
-    torch_dtype=torch.float16,
-    device_map="auto",
-)
-from time import perf_counter
-start_time = perf_counter()
-prompt = formatted_prompt('হ্যালো')
-sequences = pipe(
-    prompt,
-    do_sample=True,
-    temperature=0.1,
-    top_p=0.9,
-    num_return_sequences=1,
-    eos_token_id=tokenizer.eos_token_id,
-    max_new_tokens=256
-)
-for seq in sequences:
-    print(f"Result: {seq['generated_text']}")
-output_time = perf_counter() - start_time
-print(f"Time taken for inference: {round(output_time,2)} seconds")
-```
-### Streaming Response (ChatGPT, Bard like)
-```
-import torch
-from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
-def formatted_prompt(question)-> str:
-    return f"<|im_start|>user\n{question}<|im_end|>\n<|im_start|>assistant:"
-hub_model_name = "lumatic-ai/BongLlama-1.1B-Chat-alpha-v0"
-tokenizer = AutoTokenizer.from_pretrained(hub_model_name)
-model = AutoModelForCausalLM.from_pretrained(hub_model_name)
-prompt = formatted_prompt('prompt here')
-inputs = tokenizer([prompt], return_tensors="pt")
-streamer = TextStreamer(tokenizer)
-_ = model.generate(**inputs, eos_token_id=[tokenizer.eos_token_id],streamer=streamer, max_new_tokens=256)
-```
-### Using Generation Config
-```
-import torch
-from transformers import GenerationConfig
-from time import perf_counter
-def formatted_prompt(question)-> str:
-    return f"<|im_start|>user\n{question}<|im_end|>\n<|im_start|>assistant:"
-hub_model_name = "lumatic-ai/BongLlama-1.1B-Chat-alpha-v0"
-tokenizer = AutoTokenizer.from_pretrained(hub_model_name)
-model = AutoModelForCausalLM.from_pretrained(hub_model_name)
-prompt = formatted_prompt('হ্যালো')
-# Check for GPU availability
-if torch.cuda.is_available():
-    device = "cuda"
-else:
-    device = "cpu"
-# Move model and inputs to the GPU (if available)
-model.to(device)
-inputs = tokenizer(prompt, return_tensors="pt").to(device)
-generation_config = GenerationConfig(
-    penalty_alpha=0.6,
-    do_sample=True,
-    top_k=5,
-    temperature=0.5,
-    repetition_penalty=1.2,
-    max_new_tokens=256,
-    pad_token_id=tokenizer.eos_token_id
-)
-start_time = perf_counter()
-outputs = model.generate(**inputs, generation_config=generation_config)
-print(tokenizer.decode(outputs[0], skip_special_tokens=True))
-output_time = perf_counter() - start_time
-print(f"Time taken for inference: {round(output_time, 2)} seconds")
-```
-</details>

 - **Shared by [Optional]:** LumaticAI
 - **Model type:** Language model
 - **Language(s) (NLP):** en, bn
+- **License:** mit
 - **Parent Model:** TinyLlama/TinyLlama-1.1B-Chat-v1.0
 Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.
+# How to Get Started with the Model
+Use the code below to get started with the model.
+<details>
+<summary> Click to expand </summary>
+### Pipeline
+```
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from transformers import pipeline
+def formatted_prompt(question)-> str:
+    return f"<|im_start|>user\n{question}<|im_end|>\n<|im_start|>assistant:"
+hub_model_name = "lumatic-ai/BongLlama-1.1B-Chat-alpha-v0"
+tokenizer = AutoTokenizer.from_pretrained(hub_model_name)
+pipe = pipeline(
+    "text-generation",
+    model=hub_model_name,
+    torch_dtype=torch.float16,
+    device_map="auto",
+)
+from time import perf_counter
+start_time = perf_counter()
+prompt = formatted_prompt('হ্যালো')
+sequences = pipe(
+    prompt,
+    do_sample=True,
+    temperature=0.1,
+    top_p=0.9,
+    num_return_sequences=1,
+    eos_token_id=tokenizer.eos_token_id,
+    max_new_tokens=256
+)
+for seq in sequences:
+    print(f"Result: {seq['generated_text']}")
+output_time = perf_counter() - start_time
+print(f"Time taken for inference: {round(output_time,2)} seconds")
+```
+### Streaming Response (ChatGPT, Bard like)
+```
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
+def formatted_prompt(question)-> str:
+    return f"<|im_start|>user\n{question}<|im_end|>\n<|im_start|>assistant:"
+hub_model_name = "lumatic-ai/BongLlama-1.1B-Chat-alpha-v0"
+tokenizer = AutoTokenizer.from_pretrained(hub_model_name)
+model = AutoModelForCausalLM.from_pretrained(hub_model_name)
+prompt = formatted_prompt('prompt here')
+inputs = tokenizer([prompt], return_tensors="pt")
+streamer = TextStreamer(tokenizer)
+_ = model.generate(**inputs, eos_token_id=[tokenizer.eos_token_id],streamer=streamer, max_new_tokens=256)
+```
+### Using Generation Config
+```
+import torch
+from transformers import GenerationConfig
+from time import perf_counter
+def formatted_prompt(question)-> str:
+    return f"<|im_start|>user\n{question}<|im_end|>\n<|im_start|>assistant:"
+hub_model_name = "lumatic-ai/BongLlama-1.1B-Chat-alpha-v0"
+tokenizer = AutoTokenizer.from_pretrained(hub_model_name)
+model = AutoModelForCausalLM.from_pretrained(hub_model_name)
+prompt = formatted_prompt('হ্যালো')
+# Check for GPU availability
+if torch.cuda.is_available():
+    device = "cuda"
+else:
+    device = "cpu"
+# Move model and inputs to the GPU (if available)
+model.to(device)
+inputs = tokenizer(prompt, return_tensors="pt").to(device)
+generation_config = GenerationConfig(
+    penalty_alpha=0.6,
+    do_sample=True,
+    top_k=5,
+    temperature=0.5,
+    repetition_penalty=1.2,
+    max_new_tokens=256,
+    pad_token_id=tokenizer.eos_token_id
+)
+start_time = perf_counter()
+outputs = model.generate(**inputs, generation_config=generation_config)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+output_time = perf_counter() - start_time
+print(f"Time taken for inference: {round(output_time, 2)} seconds")
+```
+</details>
 # Training Details
 ## Training Data
 # Model Card Contact
+email : contact@lumaticai.com