File size: 2,926 Bytes
e21940e
440466a
9068ea3
440466a
 
e21940e
0c87ed0
e21940e
 
440466a
9068ea3
 
ccc7ff5
 
 
 
 
 
 
 
1d15da1
9068ea3
440466a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
---
title: Mahalanobis Distance
emoji: 🤗 
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 3.19.1
app_file: app.py
pinned: false
tags:
- evaluate
- metric
description: >-
  Compute the Mahalanobis Distance
  
  Mahalonobis distance is the distance between a point and a distribution.
  And not between two distinct points. It is effectively a multivariate equivalent of the Euclidean distance.
  It was introduced by Prof. P. C. Mahalanobis in 1936
  and has been used in various statistical applications ever since
  [source: https://www.machinelearningplus.com/statistics/mahalanobis-distance/]
---

# Metric Card for Mahalanobis Distance

## Metric Description
Mahalonobis distance is the distance between a point and a distribution (as opposed to the distance between two points), making it the multivariate equivalent of the Euclidean distance.

It is often used in multivariate anomaly detection, classification on highly imbalanced datasets and one-class classification. 

## How to Use
At minimum, this metric requires two `list`s of datapoints: 

```python
>>> mahalanobis_metric = evaluate.load("mahalanobis")
>>> results = mahalanobis_metric.compute(reference_distribution=[[0, 1], [1, 0]], X=[[0, 1]])
```

### Inputs
- `X` (`list`): data points to be compared with the `reference_distribution`.
- `reference_distribution` (`list`): data points from the reference distribution that we want to compare to.
                    
### Output Values
`mahalanobis` (`array`): the Mahalonobis distance for each data point in `X`.

```python
>>> print(results)
{'mahalanobis': array([0.5])}
```

#### Values from Popular Papers
*N/A*

### Example

```python
>>> mahalanobis_metric = evaluate.load("mahalanobis")
>>> results = mahalanobis_metric.compute(reference_distribution=[[0, 1], [1, 0]], X=[[0, 1]])
>>> print(results)
{'mahalanobis': array([0.5])}
```

## Limitations and Bias

The Mahalanobis distance is only able to capture linear relationships between the variables, which means it cannot capture all types of outliers. Mahalanobis distance also fails to faithfully represent data that is highly skewed or multimodal.

## Citation
```bibtex
@inproceedings{mahalanobis1936generalized,
  title={On the generalized distance in statistics},
  author={Mahalanobis, Prasanta Chandra},
  year={1936},
  organization={National Institute of Science of India}
}
```

```bibtex
@article{de2000mahalanobis,
  title={The Mahalanobis distance},
  author={De Maesschalck, Roy and Jouan-Rimbaud, Delphine and Massart, D{\'e}sir{\'e} L},
  journal={Chemometrics and intelligent laboratory systems},
  volume={50},
  number={1},
  pages={1--18},
  year={2000},
  publisher={Elsevier}
}
```

## Further References
-[Wikipedia -- Mahalanobis Distance](https://en.wikipedia.org/wiki/Mahalanobis_distance)

-[Machine Learning Plus -- Mahalanobis Distance](https://www.machinelearningplus.com/statistics/mahalanobis-distance/)