Spaces:
Running
Running
mairindubh
commited on
Commit
•
6c39993
1
Parent(s):
c3af1bf
Update README.md
Browse files
README.md
CHANGED
@@ -1,10 +1,92 @@
|
|
1 |
---
|
2 |
title: README
|
3 |
-
emoji:
|
4 |
-
colorFrom:
|
5 |
-
colorTo:
|
6 |
sdk: static
|
7 |
pinned: false
|
8 |
---
|
9 |
|
10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
title: README
|
3 |
+
emoji: 🐶
|
4 |
+
colorFrom: purple
|
5 |
+
colorTo: indigo
|
6 |
sdk: static
|
7 |
pinned: false
|
8 |
---
|
9 |
|
10 |
+
### InstructLab
|
11 |
+
|
12 |
+
**Project Name**: InstructLab
|
13 |
+
|
14 |
+
**Description**:
|
15 |
+
[InstructLab](https://instructlab.ai) (based on [the Large-scale Alignment for ChatBots technique](https://arxiv.org/abs/2403.01081))
|
16 |
+
is an innovative open-source initiative led by Red Hat and IBM.
|
17 |
+
The project aims to enhance the capabilities of Large Language Models
|
18 |
+
(LLMs) through a community-driven approach that leverages a novel
|
19 |
+
taxonomy-based curation process and synthetic data generation. InstructLab
|
20 |
+
provides tools for users to engage with and improve LLMs, contributing skills
|
21 |
+
and knowledge to the project’s taxonomy repository.
|
22 |
+
|
23 |
+
**Key Features**:
|
24 |
+
- **ilab Command-Line Interface (CLI)**: Allows users to interact
|
25 |
+
with, train, and fine-tune LLMs using custom taxonomy data. The CLI
|
26 |
+
supports various platforms including macOS, Fedora Linux, and Windows.
|
27 |
+
- **Synthetic Data Generation**: Enhances LLM training through the
|
28 |
+
creation of synthetic datasets.
|
29 |
+
- **Taxonomy Repository**: A structured repository where users can
|
30 |
+
submit and manage their contributions of skills and knowledge.
|
31 |
+
|
32 |
+
**Core Components**:
|
33 |
+
1. **ilab CLI Tool**: Facilitates model interaction, training, and
|
34 |
+
data generation.
|
35 |
+
2. **Taxonomy Tree**: Organizes skills and knowledge contributions for
|
36 |
+
model tuning.
|
37 |
+
3. **Community Collaboration**: Encourages open-source contributions,
|
38 |
+
including new features, bug fixes, and documentation improvements.
|
39 |
+
|
40 |
+
**Granite and Merlinite Models**:
|
41 |
+
- **Merlinite**: Merlinite is instruct-tuned from the Mistral model,
|
42 |
+
providing overall better accuracy than Mistral. It is continuously
|
43 |
+
improved using user-submitted data from the taxonomy repository,
|
44 |
+
incorporating both skills and knowledge.
|
45 |
+
|
46 |
+
|
47 |
+
- **Granite**: [Granite](https://huggingface.co/ibm-granite/granite-7b-base)
|
48 |
+
is a base model developed from scratch by IBM Research, trained on 2 trillion
|
49 |
+
tokens. The datasets the model was trained on are openly cited in [its
|
50 |
+
HuggingFace model card](https://huggingface.co/ibm-granite/granite-7b-base).
|
51 |
+
|
52 |
+
**Installation and Usage**:
|
53 |
+
- [Detailed instructions are available for setting up the `ilab` CLI
|
54 |
+
tool](https://github.com/instructlab/instructlab) on various operating systems. Key steps include installing
|
55 |
+
necessary dependencies, creating a virtual environment, and
|
56 |
+
initializing the `ilab` tool.
|
57 |
+
- The CLI supports commands for chatting with models, generating
|
58 |
+
synthetic data, downloading pre-trained models, and training models
|
59 |
+
with user-generated data.
|
60 |
+
|
61 |
+
**Community and Contribution**:
|
62 |
+
- InstructLab welcomes contributions from the open-source community.
|
63 |
+
Users can submit pull requests to the taxonomy repository, participate
|
64 |
+
in discussions, and contribute to ongoing development.
|
65 |
+
- The project maintains [a comprehensive guide for contributors](https://github.com/instructlab/community),
|
66 |
+
outlining best practices and governance.
|
67 |
+
|
68 |
+
**Getting Started**:
|
69 |
+
1. **Install ilab CLI**: Follow the installation instructions specific
|
70 |
+
to your operating system.
|
71 |
+
2. **Initialize ilab**: Set up the local environment and clone the
|
72 |
+
taxonomy repository.
|
73 |
+
3. **Contribute**: Create and submit new skills and knowledge to improve LLMs.
|
74 |
+
|
75 |
+
**Repository Links**:
|
76 |
+
- [InstructLab Main Repository](https://github.com/instructlab/instructlab)
|
77 |
+
- [Taxonomy Repository](https://github.com/instructlab/taxonomy)
|
78 |
+
- [Community Repository](https://github.com/instructlab/community)
|
79 |
+
|
80 |
+
**Contact and Support**:
|
81 |
+
- Join the InstructLab community on
|
82 |
+
[Slack](https://instruct-lab.slack.com.) for support and
|
83 |
+
collaboration.
|
84 |
+
- Refer to the [documentation](https://github.com/instructlab/instructlab)
|
85 |
+
for detailed guides and troubleshooting tips.
|
86 |
+
|
87 |
+
**Licenses**:
|
88 |
+
- InstructLab is released under the Apache-2.0 license.
|
89 |
+
|
90 |
+
For more details and to get involved, visit the [InstructLab GitHub
|
91 |
+
page](https://github.com/instructlab).
|
92 |
+
|