Spaces:

openworm
/

OpenWormLLM

Running

App Files Files Community

Storing curation lists

by vah1dgh - opened Dec 8, 2024

Discussion

vah1dgh

Dec 8, 2024

Is there a plan to store curated queries from panelists for further investigation?

Here’s an example scenario that illustrates the potential workflow:

A researcher seeks to understand the significance of a particular parameter in a model and what values have been reported in different references.
They pose a question to the panelists and request the references they relied upon.
2.1. The references could be curated, and additional ones could be considered, with the potential to incorporate features such as file attachments and tools like Sider PDF for PDF reading and analysis.
The researcher applies the reported parameter values to a model and examines their impact on a more complex or higher-level model.
3.1. A list of higher-level models that incorporate the parameter could also be included
The results of this investigation are documented and submitted along with the modified model.
The model could then be evaluated and scored.
If approved by the researcher, the report, along with the curated references, could be archived as a resource for other readers.

vah1dgh

Dec 8, 2024

For PDF data extraction, since OpenAI does not provide a dedicated API, an alternative approach for now would be using pypdf with OpenAI's text processing capabilities:

pip install pypdf

import pypdf
from openai import OpenAI

# Extract text from PDF
pdf_reader = pypdf.PdfReader("document.pdf")
text = ""
for page in pdf_reader.pages:
    text += page.extract_text()

# Then send to OpenAI API
client = OpenAI()
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": f"Extract key information from this text: {text}"}
    ]
)

vah1dgh

Dec 8, 2024

As much as my time allows, I can also collaborate on these if you're interested.

vah1dgh

Dec 8, 2024

Regarding the reference management system, we can also create and store data in a new dataset on Hugging Face. This would allow us to do CRUD on BibTeX entries using a code like this:

from datasets import load_dataset
import bibtexparser

# Load the BibTeX dataset from Hugging Face
dataset = load_dataset("your-dataset-name")

# Save to a temporary file and parse with bibtexparser
with open("temp.bib", "w") as bibtex_file:
    bibtex_file.write(dataset['train'][0]['content'])  # Adjust field name as needed

with open("temp.bib", "r") as bibtex_file:
    bib_database = bibtexparser.load(bibtex_file)

# Process the BibTeX data
print(bib_database.entries)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment