metadata

license: apache-2.0

SciPhi-SearchAgent-Alpha-7B Model Card

The SciPhi-SearchAgent-Alpha-7B is a Large Language Model (LLM) fine-tuned from Mistral-7B-v0.1. This model was fine tuned with a fully synthetic dataset to specialize at performing retrieval-augmented generation (RAG) over detailed web search results. This work aims to train an agent which specializes in using search, such as AgentSearch, to generate accurate and well-cited summaries from a range of search results, providing more accurate answers to user queries. Please refer to the docs here for more information on how to run the agent end-to-end.

Currently, SciPhi-SearchAgent-Alpha-7B is available via hosted api at https://www.sciphi.ai.

You can try a demonstration of SearchAgent here.

Model Architecture

Base Model: Mistral-7B-v0.1

Architecture Features:

Transformer-based model
Grouped-Query Attention
Sliding-Window Attention
Byte-fallback BPE tokenizer

Using the Model

It is recommended to use a single search query. The model will return an answer using search results as context.

Using the AgentSearch package an example is shown below.

export SCIPHI_API_KEY=MY_SCIPHI_API_KEY
# Use the SciPhi `SearchAgent` for LLM RAG w/ AgentSearch
python -m agent_search.scripts.run_rag run --query="What is Fermat's last theorem?"

Alternatively, you may provide your own search context directly to the model by adhereing to the following format:

### Instruction:
Your task is to perform retrieval augmented generation (RAG) over the given query and search results. Return your answer with three sections `My Work`, `My Answer`, and `My Further Considerations`. 

Query:
{query}

Search Results:
{search_results}

Query:
{query}

### Response:

References

Mistral AI. (2023). Model Card for Mistral-7B-v0.1. The Mistral-7B-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters. Mistral-7B-v0.1 outperforms Llama 2 13B on all benchmarks tested. For full details, please refer to the paper and release blog post. Model Architecture: Transformer with Grouped-Query Attention, Sliding-Window Attention, and Byte-fallback BPE tokenizer. Link