| <!DOCTYPE html> |
| <html lang="en"> |
| <head> |
| <meta charset="UTF-8" /> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0" /> |
| <title>SpaCy NER Training Guide</title> |
| <link |
| rel="stylesheet" |
| href="https://maxcdn.bootstrapcdn.com/bootstrap/4.5.2/css/bootstrap.min.css" |
| /> |
| <style> |
| body { |
| background-color: #121212; |
| font-family: "Poppins", sans-serif; |
| color: #e0e0e0; |
| margin: 0; |
| padding: 0; |
| } |
| h1, |
| h2 { |
| color: #007bff; |
| } |
| .step { |
| margin-bottom: 30px; |
| border: 1px solid #007bff; |
| border-radius: 5px; |
| padding: 20px; |
| background-color: #1e1e1e; |
| } |
| .btn-primary { |
| color: #fff; |
| background-color: #007bff; |
| border: 1px solid #007bff; |
| } |
| .btn-primary:hover { |
| background-color: transparent; |
| border: 1px solid #007bff; |
| } |
| </style> |
| </head> |
| <body> |
| <div class="container"> |
| <h1>SpaCy NER Model Training Guide</h1> |
|
|
| <div class="step"> |
| <h2>Step 1: Upload Your Resume File</h2> |
| <p> |
| Upload a resume or document file for text extraction. Supported |
| formats include: |
| </p> |
| <ul> |
| <li>PDF</li> |
| <li>DOCX (Word Document)</li> |
| <li>RSF (Rich Structured Format)</li> |
| <li>ODT (Open Document Text)</li> |
| <li>PNG, JPG, JPEG (Image Formats)</li> |
| <li>JSON</li> |
| </ul> |
| <p> |
| Ensure that your file is in one of the supported formats before |
| uploading. The system will extract and process the text from your |
| document automatically. |
| </p> |
| <a href="{{ url_for('index') }}" class="btn btn-primary" |
| >Proceed to Upload</a |
| > |
| </div> |
|
|
| <div class="step"> |
| <h2>Step 2: Preview and Edit Extracted Text</h2> |
| <p> |
| After uploading your document, you will be shown a preview of the |
| extracted text. This preview allows you to edit the text if needed to |
| correct any extraction errors or remove unwanted content. Once you're |
| satisfied, click "Next" to proceed to Named Entity Recognition (NER) |
| annotations. |
| </p> |
| <a href="{{ url_for('text_preview') }}" class="btn btn-primary" |
| >Proceed to Text Preview</a |
| > |
| </div> |
|
|
| <div class="step"> |
| <h2>Step 3: Annotate Named Entities</h2> |
| <p> |
| In this step, you will preview the Named Entity Recognition (NER) |
| results generated from your text. You can add new entity labels, |
| select relevant text for each label, and make manual adjustments. Once |
| you’ve annotated the text with the appropriate labels, save your |
| annotations and export the data in JSON format for model training. |
| NOTE:(following labels can be taken in use: ["ABOUT","CERTIFICATE", |
| "COMPANY","CONTACT","COURSE", "DOB", "EMAIL", "EXPERIENCE", "HOBBIES", |
| "INSTITUTE", "JOB_TITLE", "LANGUAGE", "LAST_QUALIFICATION_YEAR", "LINK", |
| "LOCATION", "PERSON", "PROJECTS", "QUALIFICATION", "SCHOOL", "SKILL", |
| "SOFT_SKILL", "UNIVERSITY", "YEARS_EXPERIENCE"]) |
| </p> |
| <p>Instructions:</p> |
| <ul> |
| <li>Click "Begin!" to load the extracted text.</li> |
| <li> |
| Highlight sections of the text and assign them to the available |
| labels. |
| </li> |
| <li>Add new labels if necessary.</li> |
| <li> |
| Once done, click "Export" to download your annotations as a JSON |
| file. |
| </li> |
| </ul> |
| <a href="{{ url_for('ner_preview') }}" class="btn btn-primary" |
| >Proceed to NER Annotation</a |
| > |
| </div> |
|
|
| <div class="step"> |
| <h2>Step 4: Save and Format JSON Data</h2> |
| <p> |
| Upload your annotated JSON file from the previous step. The system |
| will process and reformat the JSON file to ensure compatibility with |
| the SpaCy model training process. After formatting, you can proceed to |
| the model training step. |
| </p> |
| <p>Instructions:</p> |
| <ul> |
| <li> |
| Upload the JSON file you downloaded after the annotation step. |
| </li> |
| <li>Click "Process" to reformat the file.</li> |
| <li> |
| Once processing is complete, click "Next" to proceed with training. |
| </li> |
| </ul> |
| <a href="{{ url_for('json_file') }}" class="btn btn-primary" |
| >Proceed to Save JSON</a |
| > |
| </div> |
|
|
| <div class="step"> |
| <h2>Step 5: Train the NER Model</h2> |
| <p> |
| In this final step, you will convert the formatted JSON data into the |
| SpaCy format and begin training the NER model. You can customize the |
| training by selecting the number of epochs (iterations) the model will |
| go through and setting the version for the trained model. |
| </p> |
| <p>Guidelines:</p> |
| <ul> |
| <li> |
| Number of epochs: The higher the number of epochs, the more times |
| the model will learn from the data, but too many epochs can lead to |
| overfitting. Start with 10 epochs for a balanced training approach. |
| </li> |
| <li> |
| Model versioning: Provide a version name for this training session, |
| so you can keep track of different versions of the model. |
| </li> |
| </ul> |
| <p> |
| Once the training is complete, you can download the latest version of |
| the trained model for use in production. |
| </p> |
| <a href="{{ url_for('spacy_file') }}" class="btn btn-primary" |
| >Proceed to Model Training</a |
| > |
| </div> |
| </div> |
|
|
| <script src="https://code.jquery.com/jquery-3.5.1.slim.min.js"></script> |
| <script src="https://cdn.jsdelivr.net/npm/@popperjs/core@2.11.6/dist/umd/popper.min.js"></script> |
| <script src="https://stackpath.bootstrapcdn.com/bootstrap/4.5.2/js/bootstrap.min.js"></script> |
| </body> |
| </html> |
|
|