Spaces:

jackkuo
/

Automated-Enzyme-Kinetics-Extractor

Running

App Files Files Community

jackkuo commited on Nov 24, 2024

Commit

0bd5c8c

verified ·

1 Parent(s): 71d5a29

Update app.py

Browse files

Files changed (1) hide show

app.py +0 -40

app.py CHANGED Viewed

@@ -413,46 +413,6 @@ with gr.Blocks(title="Automated Enzyme Kinetics Extractor") as demo:
                 search_output.value = initial_output  # 直接将错误消息赋值
             else:
                 search_output.value = initial_output.to_html(classes='data', index=False, header=True)
-        with gr.Tab("Paper"):
-            gr.Markdown(
-                '''<h1 align="center"> Leveraging Large Language Models for Automated Extraction of Enzyme Kinetics Data from Scientific Literature </h1>
-                <p><strong>Abstract:</strong>
-                <br>Enzyme kinetics data reported in the literature are essential for guiding biomedical research, yet their extraction is traditionally performed manually, a process that is both time-consuming and prone to errors, while there is no automatic extraction pipeline available for enzyme kinetics data. Though Large Language Models (LLMs) have witnessed a significant advancement in information extraction in recent years, the inherent capabilities of processing comprehensive scientific data, both precise extraction and objective evaluation, have been less-investigated. Hence achieving fully automated extraction with satisfactory accuracy and offering a comprehensive performance evaluation standard remain a challenging task. This research introduces a novel framework leveraging LLMs for automatic information extraction from academic literature on enzyme kinetics. It integrated OCR conversion, content extraction, and output formatting through prompt engineering, marking a significant advancement in automated data extraction for scientific research. We contributed a meticulously curated golden benchmark of 156 research articles, which serves as both an accurate validation tool and a valuable resource for evaluating LLM capabilities in extraction tasks. This benchmark enables a rigorous assessment of LLMs in scientific language comprehension, biomedical concept understanding, and tabular data interpretation. The best-performing model achieved a recall rate of 92% and a precision rate of 88%.  Our approach culminates in the LLM Enzyme Kinetics Archive (LLENKA), a comprehensive dataset derived from 3,435 articles, offering the research community a structured, high-quality resource for enzyme kinetics data facilitating future research endeavors. Our work leveraged the comprehensive inherent capabilities of LLMs and successfully developed an automated information extraction pipeline that enhances productivity, surpasses manual curation, and serves as a paradigm in various fields.
-                <br>Figure 1: Pipeline for Enzyme Kinetics Data Extraction
-                </p>'''
-            )
-            gr.Image("static/img.png", label="Pipeline for Enzyme Kinetics Data Extraction")
-            gr.Markdown(
-                '''
-                <p align="center">Figure 1: Pipeline for Enzyme Kinetics Data Extraction
-                </p>'''
-            )
-            gr.Markdown(
-                '''
-| Model                     | Overall Entries Extracted | Overall Correct Entries | Overall Recall | Overall Precision | Mean Recall by Paper | Mean Precision by Paper | Km Entries Extracted | Km Correct Entries | Km Recall | Km Precision | Kcat Entries Extracted | Kcat Correct Entries | Kcat Recall | Kcat Precision | Kcat/Km Entries Extracted | Kcat/Km Correct Entries | Kcat/Km Recall | Kcat/Km Precision |
-|---------------------------|--------------------------|-------------------------|----------------|-------------------|-----------------------|--------------------------|----------------------|---------------------|-----------|--------------|------------------------|-----------------------|-------------|----------------|--------------------------|-------------------------|---------------|-------------------|
-| llama 3.1-405B            | 8700                     | 7839                    | 0.72           | 0.90              | 0.80                  | 0.89                     | 2870                 | 2648                | 0.74      | 0.92         | 2849                   | 2594                  | 0.73        | 0.91           | 2981                     | 2597                    | 0.69          | 0.87              |
-| claude-3.5-sonnet-20240620| 11348                    | 9967                    | 0.92           | 0.88              | 0.93                  | 0.90                     | 3840                 | 3314                | 0.93      | 0.86         | 3732                   | 3310                  | 0.94        | 0.89           | 3776                     | 3343                    | 0.89          | 0.89              |
-| GPT-4o                    | 9810                     | 8703                    | 0.80           | 0.89              | 0.85                  | 0.90                     | 3294                 | 2932                | 0.82      | 0.89         | 3188                   | 2892                  | 0.82        | 0.91           | 3328                     | 2879                    | 0.77          | 0.87              |
-| qwen-plus-0806            | 8673                     | 7763                    | 0.72           | 0.90              | 0.77                  | 0.90                     | 2932                 | 2665                | 0.75      | 0.91         | 2914                   | 2638                  | 0.75        | 0.91           | 2827                     | 2460                    | 0.66          | 0.87              |
-'''
-            )
-            gr.Markdown(
-                '''
-                <p align="center">
-                Table 1: Overall Performance of Various Models Examined on 156 Papers
-                </p>
-                <p><strong>Please note:</strong>
-                 <br>1. Test model versions: all models were tested in September 2024, The GPT-4o interface was tested on September 23, 2024, while the other model versions are labeled by name.
-                 <br>2. Llama 3.1 is locally deployed, while the other models use online interfaces.
-                 <br>3. The temperature used for all models during testing was 0.3.
-                 <br>4. The maximum outputs of different models also vary, which is discussed in our paper: GPT-4o has 4096 tokens, Claude 3.5 has 8192 tokens, and Qwen-Plus has 8000 tokens, and Llama 3.1 has 4096 tokens.
-                 <br>5. Due to local GPU resource limitations, Llama 3.1 uses a maximum input of 32k tokens.
-                </p>
-                '''
-            )
     extract_button.click(extract_pdf_pypdf, inputs=file_input, outputs=text_output)
     exp.click(update_input, outputs=model_input)

                 search_output.value = initial_output  # 直接将错误消息赋值
             else:
                 search_output.value = initial_output.to_html(classes='data', index=False, header=True)
     extract_button.click(extract_pdf_pypdf, inputs=file_input, outputs=text_output)
     exp.click(update_input, outputs=model_input)