llmware
/

industry-bert-insurance-v0.1

Feature Extraction

Transformers

PyTorch

bert

text-embeddings-inference

Model card Files Files and versions Community

doberst commited on Sep 30, 2023

Commit

a8fac66

•

1 Parent(s): 292bb7e

Upload README.md

Browse files

Files changed (1) hide show

README.md +17 -101

README.md CHANGED Viewed

@@ -1,14 +1,11 @@
 ---
 license: apache-2.0
 ---
 # Model Card for Model ID
 <!-- Provide a quick summary of what the model is/does. -->
-industry-bert-insurance-v0.1 is part of a series of industry-fine-tuned sentence_transformer embedding models.
-BERT-based 768-parameter drop-in substitute for non-industry-specific embeddings model.   This model was trained on a wide range of
-publicly available materials related to the Insurance industry.
 ## Model Details
@@ -16,121 +13,41 @@ publicly available materials related to the Insurance industry.
 <!-- Provide a longer summary of what this model is. -->
 - **Developed by:** llmware
-- **Shared by [optional]:** Darren Oberst
 - **Model type:** BERT-based Industry domain fine-tuned Sentence Transformer architecture
 - **Language(s) (NLP):** English
 - **License:** Apache 2.0
 - **Finetuned from model [optional]:** BERT-based model, fine-tuning methodology described below.
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-This model is intended to be used as a sentence embedding model, specifically for the Asset Management and financial industries.
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
 ## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-This model was fine-tuned using a custom self-supervised procedure that combined contrastive techniques with stochastic injections of
-distortions in the samples.  The methodology was derived, adapted and inspired primarily from three research papers cited below:
-TSDAE (Reimers), DeClutr (Giorgi), and Contrastive Tension (Carlsson).
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
 ## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-Custom training protocol used to train the model, which was derived and inspired by the following papers:
 @article{wang-2021-TSDAE,
     title = "TSDAE: Using Transformer-based Sequential Denoising Auto-Encoderfor Unsupervised Sentence Embedding Learning",
@@ -162,12 +79,11 @@ Custom training protocol used to train the model, which was derived and inspired
     Published: 12 Jan 2021, Last Modified: 05 May 2023
 }
-## Model Card Authors [optional]
-[More Information Needed]
 ## Model Card Contact
-[More Information Needed]

 ---
 license: apache-2.0
 ---
 # Model Card for Model ID
 <!-- Provide a quick summary of what the model is/does. -->
+industry-bert-insurance-v0.1 is part of a series of industry-fine-tuned sentence_transformer embedding models.
 ## Model Details
 <!-- Provide a longer summary of what this model is. -->
+industry-bert-insurance-v0.1 is a domain fine-tuned BERT-based 768-parameter Sentence Transformer model, intended to as a "drop-in"
+substitute for embeddings in the insurance industry domain.   This model was trained on a wide range of publicly available documents on the insurance industry.
 - **Developed by:** llmware
 - **Model type:** BERT-based Industry domain fine-tuned Sentence Transformer architecture
 - **Language(s) (NLP):** English
 - **License:** Apache 2.0
 - **Finetuned from model [optional]:** BERT-based model, fine-tuning methodology described below.
+## Model Use
+from transformers import AutoTokenizer, AutoModel
+tokenizer = AutoTokenizer.from_pretrained("llmware/industry-bert-insurance-v0.1")
+model = AutoModel.from_pretrained("llmware/industry-bert-insurance-v0.1")
 ## Bias, Risks, and Limitations
+This is a semantic embedding model, fine-tuned on public domain SEC filings and regulatory documents.   Results may vary if used outside of this
+domain, and like any embedding model, there is always the potential for anomalies in the vector embedding space.   No specific safeguards have
+put in place for safety or mitigate potential bias in the dataset.
 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+This model was fine-tuned using a custom self-supervised procedure and custom dataset that combined contrastive techniques
+with stochastic injections of distortions in the samples.  The methodology was derived, adapted and inspired primarily from
+three research papers cited below:  TSDAE (Reimers), DeClutr (Giorgi), and Contrastive Tension (Carlsson).
 ## Citation [optional]
+Custom self-supervised training protocol used to train the model, which was derived and inspired by the following papers:
 @article{wang-2021-TSDAE,
     title = "TSDAE: Using Transformer-based Sequential Denoising Auto-Encoderfor Unsupervised Sentence Embedding Learning",
     Published: 12 Jan 2021, Last Modified: 05 May 2023
 }
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
 ## Model Card Contact
+Darren Oberst @ llmware