Spaces:
Running
Running
Storing curation lists
#9
by
vah1dgh
- opened
Is there a plan to store curated queries from panelists for further investigation?
Here’s an example scenario that illustrates the potential workflow:
- A researcher seeks to understand the significance of a particular parameter in a model and what values have been reported in different references.
- They pose a question to the panelists and request the references they relied upon.
2.1. The references could be curated, and additional ones could be considered, with the potential to incorporate features such as file attachments and tools like Sider PDF for PDF reading and analysis. - The researcher applies the reported parameter values to a model and examines their impact on a more complex or higher-level model.
3.1. A list of higher-level models that incorporate the parameter could also be included - The results of this investigation are documented and submitted along with the modified model.
- The model could then be evaluated and scored.
- If approved by the researcher, the report, along with the curated references, could be archived as a resource for other readers.
For PDF data extraction, since OpenAI does not provide a dedicated API, an alternative approach for now would be using pypdf
with OpenAI's text processing capabilities:
pip install pypdf
import pypdf
from openai import OpenAI
# Extract text from PDF
pdf_reader = pypdf.PdfReader("document.pdf")
text = ""
for page in pdf_reader.pages:
text += page.extract_text()
# Then send to OpenAI API
client = OpenAI()
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "user", "content": f"Extract key information from this text: {text}"}
]
)
As much as my time allows, I can also collaborate on these if you're interested.
Regarding the reference management system, we can also create and store data in a new dataset on Hugging Face. This would allow us to do CRUD on BibTeX entries using a code like this:
from datasets import load_dataset
import bibtexparser
# Load the BibTeX dataset from Hugging Face
dataset = load_dataset("your-dataset-name")
# Save to a temporary file and parse with bibtexparser
with open("temp.bib", "w") as bibtex_file:
bibtex_file.write(dataset['train'][0]['content']) # Adjust field name as needed
with open("temp.bib", "r") as bibtex_file:
bib_database = bibtexparser.load(bibtex_file)
# Process the BibTeX data
print(bib_database.entries)