mairindubh commited on
Commit
6c39993
1 Parent(s): c3af1bf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +86 -4
README.md CHANGED
@@ -1,10 +1,92 @@
1
  ---
2
  title: README
3
- emoji: 🐨
4
- colorFrom: indigo
5
- colorTo: pink
6
  sdk: static
7
  pinned: false
8
  ---
9
 
10
- Edit this `README.md` markdown file to author your organization card.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: README
3
+ emoji: 🐶
4
+ colorFrom: purple
5
+ colorTo: indigo
6
  sdk: static
7
  pinned: false
8
  ---
9
 
10
+ ### InstructLab
11
+
12
+ **Project Name**: InstructLab
13
+
14
+ **Description**:
15
+ [InstructLab](https://instructlab.ai) (based on [the Large-scale Alignment for ChatBots technique](https://arxiv.org/abs/2403.01081))
16
+ is an innovative open-source initiative led by Red Hat and IBM.
17
+ The project aims to enhance the capabilities of Large Language Models
18
+ (LLMs) through a community-driven approach that leverages a novel
19
+ taxonomy-based curation process and synthetic data generation. InstructLab
20
+ provides tools for users to engage with and improve LLMs, contributing skills
21
+ and knowledge to the project’s taxonomy repository.
22
+
23
+ **Key Features**:
24
+ - **ilab Command-Line Interface (CLI)**: Allows users to interact
25
+ with, train, and fine-tune LLMs using custom taxonomy data. The CLI
26
+ supports various platforms including macOS, Fedora Linux, and Windows.
27
+ - **Synthetic Data Generation**: Enhances LLM training through the
28
+ creation of synthetic datasets.
29
+ - **Taxonomy Repository**: A structured repository where users can
30
+ submit and manage their contributions of skills and knowledge.
31
+
32
+ **Core Components**:
33
+ 1. **ilab CLI Tool**: Facilitates model interaction, training, and
34
+ data generation.
35
+ 2. **Taxonomy Tree**: Organizes skills and knowledge contributions for
36
+ model tuning.
37
+ 3. **Community Collaboration**: Encourages open-source contributions,
38
+ including new features, bug fixes, and documentation improvements.
39
+
40
+ **Granite and Merlinite Models**:
41
+ - **Merlinite**: Merlinite is instruct-tuned from the Mistral model,
42
+ providing overall better accuracy than Mistral. It is continuously
43
+ improved using user-submitted data from the taxonomy repository,
44
+ incorporating both skills and knowledge.
45
+
46
+
47
+ - **Granite**: [Granite](https://huggingface.co/ibm-granite/granite-7b-base)
48
+ is a base model developed from scratch by IBM Research, trained on 2 trillion
49
+ tokens. The datasets the model was trained on are openly cited in [its
50
+ HuggingFace model card](https://huggingface.co/ibm-granite/granite-7b-base).
51
+
52
+ **Installation and Usage**:
53
+ - [Detailed instructions are available for setting up the `ilab` CLI
54
+ tool](https://github.com/instructlab/instructlab) on various operating systems. Key steps include installing
55
+ necessary dependencies, creating a virtual environment, and
56
+ initializing the `ilab` tool.
57
+ - The CLI supports commands for chatting with models, generating
58
+ synthetic data, downloading pre-trained models, and training models
59
+ with user-generated data.
60
+
61
+ **Community and Contribution**:
62
+ - InstructLab welcomes contributions from the open-source community.
63
+ Users can submit pull requests to the taxonomy repository, participate
64
+ in discussions, and contribute to ongoing development.
65
+ - The project maintains [a comprehensive guide for contributors](https://github.com/instructlab/community),
66
+ outlining best practices and governance.
67
+
68
+ **Getting Started**:
69
+ 1. **Install ilab CLI**: Follow the installation instructions specific
70
+ to your operating system.
71
+ 2. **Initialize ilab**: Set up the local environment and clone the
72
+ taxonomy repository.
73
+ 3. **Contribute**: Create and submit new skills and knowledge to improve LLMs.
74
+
75
+ **Repository Links**:
76
+ - [InstructLab Main Repository](https://github.com/instructlab/instructlab)
77
+ - [Taxonomy Repository](https://github.com/instructlab/taxonomy)
78
+ - [Community Repository](https://github.com/instructlab/community)
79
+
80
+ **Contact and Support**:
81
+ - Join the InstructLab community on
82
+ [Slack](https://instruct-lab.slack.com.) for support and
83
+ collaboration.
84
+ - Refer to the [documentation](https://github.com/instructlab/instructlab)
85
+ for detailed guides and troubleshooting tips.
86
+
87
+ **Licenses**:
88
+ - InstructLab is released under the Apache-2.0 license.
89
+
90
+ For more details and to get involved, visit the [InstructLab GitHub
91
+ page](https://github.com/instructlab).
92
+