--- license: mit language: - en pipeline_tag: text2text-generation --- # T5-Base Job Description to Resume JSON This model fine-tunes google/t5-base to convert job descriptions into structured resume JSON data. ## Model description This model is based on the T5-base architecture fine-tuned on a dataset of 10,000 job description and resume pairs. It takes a job description as input and generates a JSON representation of a resume tailored to that job. **Base model:** google/t5-base **Fine-tuning task:** Text-to-JSON conversion **Training data:** 10,000 job description and resume pairs ## Intended uses & limitations **Intended uses:** - Generating structured resume data from job descriptions - Assisting job seekers in tailoring resumes to specific job postings - Automating parts of the resume creation process **Limitations:** - The model's output quality depends on the input job description's detail and clarity - Generated resumes may require human review and editing - The model may not capture nuanced or industry-specific requirements - The model is not tokenized to output "{" or "}", and instead uses "RB>" and "LB>" respectively ## Training data The model was trained on 10,000 pairs of job descriptions and corresponding resume JSON data. The data distribution and any potential biases in the training set are not specified. ## Training procedure The model was fine-tuned using the standard T5 text-to-text framework. Specific hyperparameters and training details are not provided. # How to Get Started with the Model Use the code below to get started with the model.
Click to expand ```python from transformers import T5Tokenizer, T5ForConditionalGeneration def load_model_and_tokenizer(model_path): """ Load the tokenizer and model from the specified path. """ tokenizer = T5Tokenizer.from_pretrained("google-t5/t5-base") model = T5ForConditionalGeneration.from_pretrained(model_path) return tokenizer, model def generate_text(prompt, tokenizer, model): """ Generate text using the model based on the given prompt. """ # Encode the input prompt to get the tensor input_ids = tokenizer(prompt, return_tensors="pt", padding=True).input_ids # Generate the output using the model outputs = model.generate(input_ids, max_length=512, num_return_sequences=1) # Decode the output tensor to human-readable text generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True) return generated_text def main(): model_path = "nakamoto-yama/t5-resume-generation" print(f"Loading model and tokenizer from {model_path}") tokenizer, model = load_model_and_tokenizer(model_path) # Test the model with a prompt while True: prompt = input("Enter a job description or title: ") if prompt.lower() == 'exit': break response = generate_text(f"generate resume JSON for the following job: {prompt}", tokenizer, model) response = response.replace("LB>", "{").replace("RB>", "}") print(f"Generated Response: {response}") if __name__ == "__main__": main() ``` See the [Hugging Face T5](https://huggingface.co/docs/transformers/model_doc/t5#transformers.T5Model) docs and a [Colab Notebook](https://colab.research.google.com/github/google-research/text-to-text-transfer-transformer/blob/main/notebooks/t5-trivia.ipynb) created by the model developers for more examples.
## Ethical considerations This model automates part of the resume creation process, which could have implications for job seeking and hiring practices. Users should be aware of potential biases in the training data that may affect the generated resumes. ## Additional information For more details on the base T5 model, refer to the [T5 paper](https://arxiv.org/abs/1910.10683) and the [google/t5-base model card](https://huggingface.co/google/t5-base).