File size: 2,087 Bytes
3a6ebd0
 
 
 
 
 
16e395a
3a6ebd0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
import gradio as gr
from haystack.nodes import FARMReader, PreProcessor, PDFToTextConverter, TfidfRetriever
from haystack.document_stores import InMemoryDocumentStore
from haystack.pipelines import ExtractiveQAPipeline

document_store = InMemoryDocumentStore()
model = "Saturdays/mdeberta-v3-base-squad2_refugees_dataset"
reader = FARMReader(model_name_or_path=model)
preprocessor = PreProcessor(
    clean_empty_lines=True,
    clean_whitespace=True,
    clean_header_footer=True,
    split_by="word",
    split_length=100,
    split_respect_sentence_boundary=True,
    split_overlap=3
)


def print_answers(results):
    fields = ["answer", "score"]  # "context",
    answers = results["answers"]
    filtered_answers = []

    for ans in answers:
        filtered_ans = {
            field: getattr(ans, field)
            for field in fields
            if getattr(ans, field) is not None
        }
        filtered_answers.append(filtered_ans)

    return filtered_answers


def pdf_to_document_store(pdf_file):
    document_store.delete_documents()
    converter = PDFToTextConverter(
        remove_numeric_tables=True, valid_languages=["es"])
    documents = [converter.convert(file_path=pdf_file, meta=None)[0]]
    preprocessed_docs = preprocessor.process(documents)
    document_store.write_documents(preprocessed_docs)
    return None


def predict(question):
    pdf_to_document_store("data.pdf")
    retriever = TfidfRetriever(document_store=document_store)
    pipe = ExtractiveQAPipeline(reader, retriever)
    result = pipe.run(query=question, params={"Retriever": {
                      "top_k": 5}, "Reader": {"top_k": 3}})
    answers = print_answers(result)
    return answers


title = "Chatbot Refugiados"

iface = gr.Interface(fn=predict,
                     inputs=[gr.inputs.Textbox(lines=3, label='Haz una pregunta')],
                     outputs="text",
                     title=title,
                     theme="huggingface",
                     examples=['Dónde pedir ayuda?', 'qué hacer al llegar a España?']
                     )
iface.launch()