Abinaya Mahendiran commited on
Commit
0388313
1 Parent(s): 80d39c5

Updated README

Browse files
Files changed (1) hide show
  1. README.md +12 -13
README.md CHANGED
@@ -1,7 +1,6 @@
1
  ---
2
 
3
  language: ta
4
- license: MIT
5
  datasets:
6
  - oscar
7
  - IndicNLP
@@ -19,13 +18,13 @@ To setup the project, run the following command,
19
  pip install -r requirements.txt
20
  ```
21
 
22
- ## Model
23
  Pretrained model on Tamil language using a causal language modeling (CLM) objective.
24
 
25
  ## Dataset Used:
26
  The GTP-2 model is trained on [oscar dataset - ta](https://huggingface.co/datasets/oscar) and [IndicNLP dataset - ta](https://indicnlp.ai4bharat.org/corpora/)
27
 
28
- ## Intended uses & limitations
29
  You can use the raw model for next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task. See the [model hub](https://huggingface.co/models?filter=gpt) to look for fine-tuned versions on a task that interests you.
30
 
31
  ## How to pretrain the model:
@@ -57,14 +56,14 @@ python src/convert_flax_to_pytorch.py
57
  ```
58
  - Use the following snippet to perform language generation,
59
  ```python
60
- from transformers import AutoTokenizer, AutoModelWithLMHead, pipeline
61
- model_name = 'abinayam/gpt-2-tamil'
62
- model = AutoModelWithLMHead.from_pretrained(model_name)
63
- tokenizer = AutoTokenizer.from_pretrained(model_name)
64
- set_seed(42)
65
- input_text = "ஒரு ஊரிலே ஒரு காக்கைக்கு"
66
- max_len = 300
67
- no_seq = 5
68
- generator = pipeline('text-generation', model=model, tokenizer=tokenizer)
69
- sequence = generator(input_text, max_length=max_len, num_return_sequences=no_seq)
70
  ```
1
  ---
2
 
3
  language: ta
 
4
  datasets:
5
  - oscar
6
  - IndicNLP
18
  pip install -r requirements.txt
19
  ```
20
 
21
+ ## Model:
22
  Pretrained model on Tamil language using a causal language modeling (CLM) objective.
23
 
24
  ## Dataset Used:
25
  The GTP-2 model is trained on [oscar dataset - ta](https://huggingface.co/datasets/oscar) and [IndicNLP dataset - ta](https://indicnlp.ai4bharat.org/corpora/)
26
 
27
+ ## Intended uses & limitations:
28
  You can use the raw model for next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task. See the [model hub](https://huggingface.co/models?filter=gpt) to look for fine-tuned versions on a task that interests you.
29
 
30
  ## How to pretrain the model:
56
  ```
57
  - Use the following snippet to perform language generation,
58
  ```python
59
+ >>> from transformers import AutoTokenizer, AutoModelWithLMHead, pipeline
60
+ >>> model_name = 'abinayam/gpt-2-tamil'
61
+ >>> model = AutoModelWithLMHead.from_pretrained(model_name)
62
+ >>> tokenizer = AutoTokenizer.from_pretrained(model_name)
63
+ >>> set_seed(42)
64
+ >>> input_text = "ஒரு ஊரிலே ஒரு காக்கைக்கு"
65
+ >>> max_len = 300
66
+ >>> no_seq = 5
67
+ >>> generator = pipeline('text-generation', model=model, tokenizer=tokenizer)
68
+ >>> sequence = generator(input_text, max_length=max_len, num_return_sequences=no_seq)
69
  ```