Alessamo commited on
Commit
35d27a5
·
verified ·
1 Parent(s): 7805963

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -3
README.md CHANGED
@@ -1,3 +1,67 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - Qwen/Qwen2.5-7B
7
+ tags:
8
+ - capability-tagging
9
+ - qwen
10
+ - domain
11
+ ---
12
+ # Model Card for CDT-Task-Tagger
13
+ This model is a component of the **Cognition-Domain-Task (CDT) framework**, a comprehensive capability framework for Large Language Models presented in our paper CDT: A Comprehensive Capability Framework for Large Language Models Across Cognition, Domain, and Task. It has been specifically fine-tuned to classify a given instruction into one of 33 domains as defined by the CDT framework.
14
+
15
+ ## Model Details
16
+
17
+ ### Model Description
18
+ This model categorizes any given instruction into one of 33 predefined knowledge domains, pinpointing the subject area of the request.
19
+
20
+ - **Model type:** Qwen2ForCausalLM
21
+ - **Language(s) (NLP):** English
22
+ - **License:** Apache 2.0
23
+ - **Finetuned from model:** Qwen2.5-7B-Base
24
+
25
+ ### Model Sources
26
+
27
+ <!-- Provide the basic links for the model. -->
28
+
29
+ - **Repository:** https://github.com/Alessa-mo/CDT
30
+ - **Paper Link:**
31
+
32
+ ### Basic Usage
33
+ Please refer to https://github.com/Alessa-mo/CDT. You can run the following scripts to tag the cognition labels.
34
+ ```bash
35
+ cd tag_annotate
36
+ export CUDA_VISIBLE_DEVICES=0
37
+ python annotate.py \
38
+ --data_path path/to/your/data \
39
+ --output_dir path/to/output/dir \
40
+ --model_path CDT-Domain-Tagger \
41
+ --prompt_file ./prompt/annotation_prompt.jsonl \
42
+ --cognition_skill_file ./prompt/cognition.json \
43
+ --domain_skill_file ./prompt/domain.json \
44
+ --task_skill_file ./prompt/task.json \
45
+ --tag_type "Domain" \
46
+ --batch_size 32
47
+ ```
48
+ **Note**: Make sure your data is a JSON file and has the following format:
49
+ ```json
50
+ [
51
+ {
52
+ "messages": [
53
+ {
54
+ "role": "user",
55
+ "content": "xxxx"
56
+ },
57
+ {
58
+ "role": "assistant",
59
+ "content": "xxxx"
60
+ }
61
+ ]
62
+ },
63
+ ]
64
+ ```
65
+ ## Citation
66
+ If you find this model useful, please cite:
67
+ ```bash