AliMuhammad73 commited on
Commit
1fab0d9
·
verified ·
1 Parent(s): 9c19cf9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -8
README.md CHANGED
@@ -1,9 +1,38 @@
1
- ---
2
- tags:
3
- - model_hub_mixin
4
- - pytorch_model_hub_mixin
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
6
 
7
- This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
8
- - Library: [More Information Needed]
9
- - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Custom Urdu LLM
2
+
3
+ This is a custom transformer-based Large Language Model for Urdu.
4
+
5
+ ## Model Details
6
+ - **Architecture:** Transformer (GPT-based)
7
+ - **Framework:** PyTorch
8
+ - **Tokenizer:** SentencePiece
9
+ - **Hyperparameters:**
10
+ - Vocabulary Size: 20,000
11
+ - Embedding Size: 768
12
+ - Attention Heads: 12
13
+ - Layers: 12
14
+ - Dropout: 0.2
15
+
16
+ ## Usage
17
+
18
+ First you will need to download the ```modeling_gpt.py``` file from the repo. Once that's been done, you can define another file and use the following code to generate text from the model:
19
 
20
+ ```python
21
+ from modeling_gpt import GPTLanguageModel
22
+ from transformers import AutoTokenizer
23
+
24
+ model = GPTLanguageModel.from_pretrained("AliMuhammad73/testing-model")
25
+ tokenizer = AutoTokenizer.from_pretrained("AliMuhammad73/testing-model")
26
+
27
+ # sentence in urdu
28
+ prompt = "پاکستان ایک ایسا ملک ہے جو جنوبی ایشیا میں واقع ہے۔ اس کی سرحدیں ہندوستان، چین، افغانستان، اور "
29
+ encoded = tokenizer.encode(prompt)
30
+ encoded_tensor = torch.tensor(encoded).unsqueeze(0)
31
+ output = model.generate(encoded_tensor, max_new_tokens=64)
32
+
33
+ response = tokenizer.decode(output[0].squeeze().tolist())
34
+ ```
35
+
36
+ ---
37
+ license: apache-2.0
38
+ ---