--- license: artistic-2.0 language: - en library_name: transformers pipeline_tag: text2text-generation tags: - code - keyword-generation - t5 - english --- # KeywordGen-v1 Model KeywordGen-v1 is a T5-based model fine-tuned for keyword generation from a piece of text. Given an input text, the model will return relevant keywords. ### Model details This model was trained using the T5 base model, and was fine-tuned on a custom dataset. The training data consists of text and corresponding keywords. The model generates keywords by predicting the relevant words or phrases present in the input text. ## Important Usage Note This model is optimized for processing larger inputs and will generate 3 keywords as output. For the most accurate results, I recommend using inputs of at least 4-5 sentences. Inputs shorter than this may lead to suboptimal keyword generation. ## Suggestion for Usage This model was made to generate keywords from reviews. To perform efficiently combine multiple reviews as one and give it as input to the model. ### How to use You can use this model in your application using the Hugging Face Transformers library. Make sure to prefix your input with "Keyword: " for the model to generate keywords. Here is an example: ```python from transformers import T5TokenizerFast, T5ForConditionalGeneration # Load the tokenizer and model tokenizer = T5TokenizerFast.from_pretrained('mrutyunjay-patil/keywordGen-v1') model = T5ForConditionalGeneration.from_pretrained('mrutyunjay-patil/keywordGen-v1') # Define the input text input_text = "Keyword: I recently purchased the new headphones and they are incredible. The sound quality is superb, providing crystal clear audio in all ranges. The noise-cancelling feature is very effective, blocking out almost all ambient noise. I also love the comfortable design - they fit perfectly over my ears and don't cause any discomfort, even after long periods of use. The battery life is also impressive, lasting up to 20 hours on a single charge. Overall, I'm extremely satisfied with this product." # Encode the input text input_ids = tokenizer.encode(input_text, return_tensors='pt') # Generate the keywords outputs = model.generate(input_ids) # Decode the outputs keywords = tokenizer.decode(outputs[0]) ``` ### Limitations and bias As this is the first version, the model might perform poorly on texts that are very different from the texts in the training data. It might also be biased towards the types of text or keywords that are overrepresented in the training data.