File size: 1,782 Bytes
aa095a4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1577bd5
 
 
 
 
aa095a4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1577bd5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
---
license: artistic-2.0
language:
- en
library_name: transformers
pipeline_tag: text2text-generation
tags:
- code
- keyword-generation
- t5
- english
---

## KeywordGen-v1 Model

KeywordGen-v1 is a T5-based model fine-tuned for keyword generation from a piece of text. Given an input text, the model will return relevant keywords.

### Model details

This model was trained using the T5 base model, and was fine-tuned on a custom dataset. The training data consists of text and corresponding keywords. The model generates keywords by predicting the relevant words or phrases present in the input text.

## Important Usage Note

This model is optimized for processing larger inputs. For the most accurate results, I recommend using inputs of at least 4-5 sentences. Inputs shorter than this may lead to suboptimal keyword generation.


### How to use

You can use this model in your application using the Hugging Face Transformers library. Here is an example:

```python
from transformers import T5TokenizerFast, T5ForConditionalGeneration

# Load the tokenizer and model
tokenizer = T5TokenizerFast.from_pretrained('mrutyunjay-patil/keywordGen-v1')
model = T5ForConditionalGeneration.from_pretrained('mrutyunjay-patil/keywordGen-v1')

# Define the input text
input_text = "I love going to the park."

# Encode the input text
input_ids = tokenizer.encode(input_text, return_tensors='pt')

# Generate the keywords
outputs = model.generate(input_ids)

# Decode the outputs
keywords = tokenizer.decode(outputs[0])
```

### Limitations and bias

As this is the first version, the model might perform poorly on texts that are very different from the texts in the training data. It might also be biased towards the types of text or keywords that are overrepresented in the training data.