---
language: []
library_name: sentence-transformers
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:800
- loss:TripletLoss
base_model: sentence-transformers/all-mpnet-base-v2
datasets: []
widget:
- source_sentence: What is the advice given about the use of color in dataviz?
sentences:
- Don't use color if they communicate nothing.
- Four problems with Pie Charts are detailed in a guide by iCharts.net.
- Always use bright colors for highlighting important data.
- source_sentence: What is the effect of a large sample size on the use of jitter
in a boxplot?
sentences:
- A large sample size will enhance the use of jitter in a boxplot.
- If you have a large sample size, using jitter is not an option anymore since dots
will overlap, making the figure uninterpretable.
- It is a good practice to use small multiples.
- source_sentence: What is a suitable usage of pie charts in data visualization?
sentences:
- If you have a single series to display and all quantitative variables have the
same scale, then use a barplot or a lollipop plot, ranking the variables.
- Pie charts rapidly show parts to a whole better than any other plot. They are
most effective when used to compare parts to the whole.
- Pie charts are a flawed chart which can sometimes be justified if the differences
between groups are large.
- source_sentence: Where can a note on long labels be found?
sentences:
- https://www.data-to-viz.com/caveat/hard_label.html
- A pie chart can tell a story very well; that all the data points as a percentage
of the whole are very similar.
- https://twitter.com/r_graph_gallery?lang=en
- source_sentence: What is the reason pie plots can work as well as bar plots in some
scenarios?
sentences:
- Pie plots can work well for comparing portions a whole or portions one another,
especially when dealing with a single digit count of items.
- https://www.r-graph-gallery.com/line-plot/ and https://python-graph-gallery.com/line-chart/
- Thanks for your comment Tom, I do agree with you.
pipeline_tag: sentence-similarity
---
# SentenceTransformer based on sentence-transformers/all-mpnet-base-v2
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2)
- **Maximum Sequence Length:** 384 tokens
- **Output Dimensionality:** 768 tokens
- **Similarity Function:** Cosine Similarity
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("edubm/vis-sim-triplets-mpnet")
# Run inference
sentences = [
'What is the reason pie plots can work as well as bar plots in some scenarios?',
'Pie plots can work well for comparing portions a whole or portions one another, especially when dealing with a single digit count of items.',
'Thanks for your comment Tom, I do agree with you.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
## Training Details
### Training Dataset
#### Unnamed Dataset
* Size: 800 training samples
* Columns: anchor
, positive
, and negative
* Approximate statistics based on the first 1000 samples:
| | anchor | positive | negative |
|:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
| type | string | string | string |
| details |
Did you ever figure out a solution to the error message problem when using your own data?
| Yes, a solution was found. You have to add ' group = name ' inside the ' ggplot(aes())' like ggplot(aes(x=year, y=n,group=name)).
| I recommend sorting by some feature of the data, instead of in alphabetical order of the names.
|
| Why should you consider reordering your data when building a chart?
| Reordering your data can help in better visualization. Sometimes the order of groups must be set by their features and not their values.
| You should reorder your data to clean it.
|
| What is represented on the X-axis of the chart?
| The price ranges cut in several 10 euro bins.
| The number of apartments per bin.
|
* Loss: [TripletLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#tripletloss) with these parameters:
```json
{
"distance_metric": "TripletDistanceMetric.EUCLIDEAN",
"triplet_margin": 5
}
```
### Evaluation Dataset
#### Unnamed Dataset
* Size: 200 evaluation samples
* Columns: anchor
, positive
, and negative
* Approximate statistics based on the first 1000 samples:
| | anchor | positive | negative |
|:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
| type | string | string | string |
| details | What can be inferred about group C and B from the jittered boxplot?
| Group C has a small sample size compared to the other groups. Group B seems to have a bimodal distribution with dots distributed in 2 groups: around y=18 and y=13.
| Group C has the largest sample size and Group B has dots evenly distributed.
|
| What can cause a reduction in computing time and help avoid overplotting when dealing with data?
| Plotting only a fraction of your data can cause a reduction in computing time and help avoid overplotting.
| Plotting all of your data is the best method to reduce computing time.
|
| How can area charts be used for data visualization?
| Area charts can be used to give a more general overview of the dataset, especially when used in combination with small multiples.
| Area charts make it obvious to spot a particular group in a crowded data visualization.
|
* Loss: [TripletLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#tripletloss) with these parameters:
```json
{
"distance_metric": "TripletDistanceMetric.EUCLIDEAN",
"triplet_margin": 5
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `eval_strategy`: steps
- `per_device_train_batch_size`: 16
- `per_device_eval_batch_size`: 16
- `num_train_epochs`: 1
- `warmup_ratio`: 0.1
- `fp16`: True
#### All Hyperparameters