albertan017 commited on
Commit
476ec8a
·
1 Parent(s): ecd84e8

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +103 -0
README.md ADDED
@@ -0,0 +1,103 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ # For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
3
+ # Doc / guide: https://huggingface.co/docs/hub/model-cards
4
+ {}
5
+ ---
6
+
7
+ # Model Card for Model ID
8
+
9
+ <!-- Provide a quick summary of what the model is/does. -->
10
+
11
+ #Encoder from HICL: Hashtag-Driven In-Context Learning for Social Media Natural Language Understanding.
12
+
13
+ ## Model Details
14
+
15
+ #Encoder was pre-trained on 179M Twitter posts, each containing a hashtag.
16
+ It was based on pairwise posts, and contrastive learning guided them to learn topic relevance via learning to identify posts with the same hashtag.
17
+ We randomly noise the hashtags to avoid trivial representation.
18
+ Please refers to https://github.com/albertan017/HICL for more details.
19
+
20
+ ### Model Description
21
+
22
+ <!-- Provide a longer summary of what this model is. -->
23
+
24
+ - **Developed by:** Hanzhuo Tan, Department of Computing, the Hong Kong Polytechnic University
25
+ - **Model type:** Roberta
26
+ - **Language(s) (NLP):** English
27
+ - **License:** n.a
28
+ - **Finetuned from model [optional]:** Bertweet
29
+
30
+ ### Model Sources [optional]
31
+
32
+ <!-- Provide the basic links for the model. -->
33
+
34
+ - **Repository:** https://github.com/albertan017/HICL
35
+ - **Paper [optional]:** HICL: Hashtag-Driven In-Context Learning for Social Media Natural Language Understanding
36
+
37
+ ## Uses
38
+
39
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
40
+
41
+ ```
42
+ from transformers import AutoModel, AutoTokenizer
43
+
44
+ hashencoder = AutoModel.from_pretrained("albertan017/hashencoder")
45
+
46
+ tokenizer = AutoTokenizer.from_pretrained("albertan017/hashencoder")
47
+
48
+ tweet = "here's a sample tweet for encoding"
49
+
50
+ input_ids = torch.tensor([tokenizer.encode(tweet)])
51
+
52
+ with torch.no_grad():
53
+ features = hashencoder(input_ids) # Models outputs are now tuples
54
+
55
+ ```
56
+ ## Bias, Risks, and Limitations
57
+
58
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
59
+
60
+ We do not inforce semantic similarity.
61
+
62
+ ### Recommendations
63
+
64
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
65
+
66
+ N.A.
67
+
68
+ ## Training Details
69
+
70
+ ### Training Data
71
+
72
+ <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
73
+
74
+ [More Information Needed]
75
+
76
+ ### Training Procedure
77
+
78
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
79
+
80
+ #### Preprocessing [optional]
81
+
82
+ [More Information Needed]
83
+
84
+
85
+ #### Training Hyperparameters
86
+
87
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
88
+
89
+
90
+ ## Citation [optional]
91
+
92
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
93
+
94
+ **BibTeX:**
95
+
96
+ [More Information Needed]
97
+
98
+ **APA:**
99
+
100
+ [More Information Needed]
101
+
102
+
103
+