import streamlit import spacy_streamlit import spacy from lxml import etree import pandas as pd from spacy import Language from spacy.tokens import Doc streamlit.set_page_config(layout="wide") samples_test = {"FRAN_IR_050370.xml": "./samples/FRAN_IR_050370.xml"} # TITLE APP streamlit.title("NER4Archives visualizer") streamlit.sidebar.title("NER4Archives visualizer") streamlit.sidebar.write("## Motivation") streamlit.sidebar.markdown("""
This application is a proof-of-concept to apply and evaluate text classification task (also called Named-Entity Recognition) on XML EAD finding aids and evaluate NER predictions.
In context of NER4Archives project (INRIA-ALMAnaCH/Archives nationales), the goal is to train NER models on annotated dataset extracted from XML EAD finding aids and test it on new data.
Most of the models available here are trained with the NLP spaCy framework and its available on the HF organisation hub. Other models may be added in the future.
The project also includes a downstream entity linking task. The SpaCy fishing extension (based on entity-fishing) is used here to support this purpose.
NER4Archives - 2022