doberst commited on
Commit
a8fac66
1 Parent(s): 292bb7e

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -101
README.md CHANGED
@@ -1,14 +1,11 @@
1
  ---
2
  license: apache-2.0
3
  ---
4
-
5
  # Model Card for Model ID
6
 
7
  <!-- Provide a quick summary of what the model is/does. -->
8
- industry-bert-insurance-v0.1 is part of a series of industry-fine-tuned sentence_transformer embedding models.
9
 
10
- BERT-based 768-parameter drop-in substitute for non-industry-specific embeddings model. This model was trained on a wide range of
11
- publicly available materials related to the Insurance industry.
12
 
13
  ## Model Details
14
 
@@ -16,121 +13,41 @@ publicly available materials related to the Insurance industry.
16
 
17
  <!-- Provide a longer summary of what this model is. -->
18
 
 
 
 
19
  - **Developed by:** llmware
20
- - **Shared by [optional]:** Darren Oberst
21
  - **Model type:** BERT-based Industry domain fine-tuned Sentence Transformer architecture
22
  - **Language(s) (NLP):** English
23
  - **License:** Apache 2.0
24
  - **Finetuned from model [optional]:** BERT-based model, fine-tuning methodology described below.
25
 
26
- ### Model Sources [optional]
27
-
28
- <!-- Provide the basic links for the model. -->
29
-
30
- - **Repository:** [More Information Needed]
31
- - **Paper [optional]:** [More Information Needed]
32
- - **Demo [optional]:** [More Information Needed]
33
-
34
- ## Uses
35
-
36
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
37
-
38
- ### Direct Use
39
-
40
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
41
- This model is intended to be used as a sentence embedding model, specifically for the Asset Management and financial industries.
42
-
43
- ### Downstream Use [optional]
44
-
45
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
46
-
47
- [More Information Needed]
48
 
49
- ### Out-of-Scope Use
50
 
51
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
 
52
 
53
- [More Information Needed]
54
 
55
  ## Bias, Risks, and Limitations
56
 
57
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
58
-
59
- [More Information Needed]
60
-
61
- ### Recommendations
62
-
63
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
64
-
65
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
66
-
67
- ## How to Get Started with the Model
68
-
69
- Use the code below to get started with the model.
70
-
71
- [More Information Needed]
72
-
73
- ## Training Details
74
-
75
- ### Training Data
76
-
77
- <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
78
-
79
- [More Information Needed]
80
 
81
  ### Training Procedure
82
 
83
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
84
 
85
- This model was fine-tuned using a custom self-supervised procedure that combined contrastive techniques with stochastic injections of
86
- distortions in the samples. The methodology was derived, adapted and inspired primarily from three research papers cited below:
87
- TSDAE (Reimers), DeClutr (Giorgi), and Contrastive Tension (Carlsson).
88
 
89
- #### Summary
90
-
91
-
92
-
93
- ## Model Examination [optional]
94
-
95
- <!-- Relevant interpretability work for the model goes here -->
96
-
97
- [More Information Needed]
98
-
99
- ## Environmental Impact
100
-
101
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
102
-
103
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
104
-
105
- - **Hardware Type:** [More Information Needed]
106
- - **Hours used:** [More Information Needed]
107
- - **Cloud Provider:** [More Information Needed]
108
- - **Compute Region:** [More Information Needed]
109
- - **Carbon Emitted:** [More Information Needed]
110
-
111
- ## Technical Specifications [optional]
112
-
113
- ### Model Architecture and Objective
114
-
115
- [More Information Needed]
116
-
117
- ### Compute Infrastructure
118
-
119
- [More Information Needed]
120
-
121
- #### Hardware
122
-
123
- [More Information Needed]
124
-
125
- #### Software
126
-
127
- [More Information Needed]
128
 
129
  ## Citation [optional]
130
 
131
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
132
-
133
- Custom training protocol used to train the model, which was derived and inspired by the following papers:
134
 
135
  @article{wang-2021-TSDAE,
136
  title = "TSDAE: Using Transformer-based Sequential Denoising Auto-Encoderfor Unsupervised Sentence Embedding Learning",
@@ -162,12 +79,11 @@ Custom training protocol used to train the model, which was derived and inspired
162
  Published: 12 Jan 2021, Last Modified: 05 May 2023
163
  }
164
 
165
- ## Model Card Authors [optional]
166
 
167
- [More Information Needed]
168
 
169
  ## Model Card Contact
170
 
171
- [More Information Needed]
172
 
173
 
 
1
  ---
2
  license: apache-2.0
3
  ---
 
4
  # Model Card for Model ID
5
 
6
  <!-- Provide a quick summary of what the model is/does. -->
 
7
 
8
+ industry-bert-insurance-v0.1 is part of a series of industry-fine-tuned sentence_transformer embedding models.
 
9
 
10
  ## Model Details
11
 
 
13
 
14
  <!-- Provide a longer summary of what this model is. -->
15
 
16
+ industry-bert-insurance-v0.1 is a domain fine-tuned BERT-based 768-parameter Sentence Transformer model, intended to as a "drop-in"
17
+ substitute for embeddings in the insurance industry domain. This model was trained on a wide range of publicly available documents on the insurance industry.
18
+
19
  - **Developed by:** llmware
 
20
  - **Model type:** BERT-based Industry domain fine-tuned Sentence Transformer architecture
21
  - **Language(s) (NLP):** English
22
  - **License:** Apache 2.0
23
  - **Finetuned from model [optional]:** BERT-based model, fine-tuning methodology described below.
24
 
25
+ ## Model Use
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
+ from transformers import AutoTokenizer, AutoModel
28
 
29
+ tokenizer = AutoTokenizer.from_pretrained("llmware/industry-bert-insurance-v0.1")
30
+ model = AutoModel.from_pretrained("llmware/industry-bert-insurance-v0.1")
31
 
 
32
 
33
  ## Bias, Risks, and Limitations
34
 
35
+ This is a semantic embedding model, fine-tuned on public domain SEC filings and regulatory documents. Results may vary if used outside of this
36
+ domain, and like any embedding model, there is always the potential for anomalies in the vector embedding space. No specific safeguards have
37
+ put in place for safety or mitigate potential bias in the dataset.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
 
39
  ### Training Procedure
40
 
41
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
42
 
43
+ This model was fine-tuned using a custom self-supervised procedure and custom dataset that combined contrastive techniques
44
+ with stochastic injections of distortions in the samples. The methodology was derived, adapted and inspired primarily from
45
+ three research papers cited below: TSDAE (Reimers), DeClutr (Giorgi), and Contrastive Tension (Carlsson).
46
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
 
48
  ## Citation [optional]
49
 
50
+ Custom self-supervised training protocol used to train the model, which was derived and inspired by the following papers:
 
 
51
 
52
  @article{wang-2021-TSDAE,
53
  title = "TSDAE: Using Transformer-based Sequential Denoising Auto-Encoderfor Unsupervised Sentence Embedding Learning",
 
79
  Published: 12 Jan 2021, Last Modified: 05 May 2023
80
  }
81
 
82
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
83
 
 
84
 
85
  ## Model Card Contact
86
 
87
+ Darren Oberst @ llmware
88
 
89