alirezamsh
commited on
Commit
•
1bceb72
1
Parent(s):
c4e8117
Update README.md
Browse files
README.md
CHANGED
@@ -10,7 +10,7 @@ datasets:
|
|
10 |
|
11 |
The model is built by applying [Orchestration of Expert](https://arxiv.org/abs/2401.13979) for the math domain. The dedicated model either generates solutions or, when necessary, utilizes GPT-4 (or similar performing LLM) to fill in gaps in its knowledge base. Specifically, when given an input, the dedicated model first determines if the input question is solvable with the base model. If solvable, the orchestrator is detached, and token generation is invoked using the base LLM expert. If the problem is difficult and requires a larger model like GPT-4, it produces `<GPT4>` token (i.e., token_id=32000).
|
12 |
|
13 |
-
The Orchestrator is trained to estimate the knowledge of the base model, for any given query, then we marge into the base model (MetaMath7b, here).
|
14 |
In general for **any domain**, you can build it by:
|
15 |
- Choose a base LLM expert 🤗
|
16 |
- Train a domain-specific Orchestrator
|
|
|
10 |
|
11 |
The model is built by applying [Orchestration of Expert](https://arxiv.org/abs/2401.13979) for the math domain. The dedicated model either generates solutions or, when necessary, utilizes GPT-4 (or similar performing LLM) to fill in gaps in its knowledge base. Specifically, when given an input, the dedicated model first determines if the input question is solvable with the base model. If solvable, the orchestrator is detached, and token generation is invoked using the base LLM expert. If the problem is difficult and requires a larger model like GPT-4, it produces `<GPT4>` token (i.e., token_id=32000).
|
12 |
|
13 |
+
The Orchestrator is first trained to estimate the knowledge of the base model, for any given query, then we marge into the base model (MetaMath7b, here).
|
14 |
In general for **any domain**, you can build it by:
|
15 |
- Choose a base LLM expert 🤗
|
16 |
- Train a domain-specific Orchestrator
|