llmware
/

industry-bert-contracts-v0.1

Feature Extraction

Transformers

PyTorch

bert

text-embeddings-inference

Model card Files Files and versions Community

doberst commited on Sep 30, 2023

Commit

983cda3

•

1 Parent(s): 836d197

Update README.md

Browse files

Files changed (1) hide show

README.md +8 -71

README.md CHANGED Viewed

@@ -13,46 +13,16 @@ industry-bert-contracts-v0.1 is part of a series of industry-fine-tuned sentence
 <!-- Provide a longer summary of what this model is. -->
-BERT-based 768-parameter drop-in substitute for non-industry-specific embeddings model.   This model was trained on a wide range of
-publicly available commercial contracts, including open source contract datasets.
 - **Developed by:** llmware
-- **Shared by [optional]:** Darren Oberst
 - **Model type:** BERT-based Industry domain fine-tuned Sentence Transformer architecture
 - **Language(s) (NLP):** English
 - **License:** Apache 2.0
 - **Finetuned from model [optional]:** BERT-based model, fine-tuning methodology described below.
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-This model is intended to be used as a sentence embedding model, specifically for contracts use cases.
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
 [More Information Needed]
 ## Bias, Risks, and Limitations
@@ -65,35 +35,14 @@ This model is intended to be used as a sentence embedding model, specifically fo
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-This model was fine-tuned using a custom self-supervised procedure that combined contrastive techniques with stochastic injections of
-distortions in the samples.  The methodology was derived, adapted and inspired primarily from three research papers cited below:
-TSDAE (Reimers), DeClutr (Giorgi), and Contrastive Tension (Carlsson).
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-### Model Architecture and Objective
-[More Information Needed]
 ## Citation [optional]
-Custom training protocol used to train the model, which was derived and inspired by the following papers:
 @article{wang-2021-TSDAE,
     title = "TSDAE: Using Transformer-based Sequential Denoising Auto-Encoderfor Unsupervised Sentence Embedding Learning",
@@ -127,21 +76,9 @@ Custom training protocol used to train the model, which was derived and inspired
 <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
 ## Model Card Contact
-[More Information Needed]

 <!-- Provide a longer summary of what this model is. -->
+industry-bert-contracts-v0.1 is a domain fine-tuned BERT-based 768-parameter Sentence Transformer model, intended to as a "drop-in"
+substitute for contractual and legal domains.   This model was trained on a wide range of publicly available commercial contracts,
+including open source contract datasets.
 - **Developed by:** llmware
 - **Model type:** BERT-based Industry domain fine-tuned Sentence Transformer architecture
 - **Language(s) (NLP):** English
 - **License:** Apache 2.0
 - **Finetuned from model [optional]:** BERT-based model, fine-tuning methodology described below.
 [More Information Needed]
 ## Bias, Risks, and Limitations
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+This model was fine-tuned using a custom self-supervised procedure and custom dataset that combined contrastive techniques
+with stochastic injections of distortions in the samples.  The methodology was derived, adapted and inspired primarily from
+three research papers cited below:  TSDAE (Reimers), DeClutr (Giorgi), and Contrastive Tension (Carlsson).
 ## Citation [optional]
+Custom self-supervised training protocol used to train the model, which was derived and inspired by the following papers:
 @article{wang-2021-TSDAE,
     title = "TSDAE: Using Transformer-based Sequential Denoising Auto-Encoderfor Unsupervised Sentence Embedding Learning",
 <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
 ## Model Card Contact
+Darren Oberst @ llmware