Spaces:
Build error
Build error
Update Space (evaluate main: eb281894)
Browse files- README.md +3 -3
- requirements.txt +1 -1
README.md
CHANGED
@@ -35,7 +35,7 @@ This measurement requires a list of strings as input:
|
|
35 |
|
36 |
### Output Values
|
37 |
- **duplicate_fraction**(`float`): the fraction of duplicates in the input string(s).
|
38 |
-
- **
|
39 |
|
40 |
By default, this measurement outputs a dictionary containing the fraction of duplicates in the input string(s) (`duplicate_fraction`):
|
41 |
)
|
@@ -46,7 +46,7 @@ By default, this measurement outputs a dictionary containing the fraction of dup
|
|
46 |
With the `list_duplicates=True` option, this measurement will also output a dictionary of tuples with duplicate strings and their counts.
|
47 |
|
48 |
```python
|
49 |
-
{'duplicate_fraction': 0.33333333333333337, '
|
50 |
```
|
51 |
|
52 |
Warning: the `list_duplicates=True` function can be memory-intensive for large datasets.
|
@@ -69,7 +69,7 @@ Example with multiple duplicates and `list_duplicates=True`:
|
|
69 |
>>> duplicates = evaluate.load("text_duplicates")
|
70 |
>>> results = duplicates.compute(data=data, list_duplicates=True)
|
71 |
>>> print(results)
|
72 |
-
{'duplicate_fraction': 0.4, '
|
73 |
```
|
74 |
|
75 |
## Citation(s)
|
|
|
35 |
|
36 |
### Output Values
|
37 |
- **duplicate_fraction**(`float`): the fraction of duplicates in the input string(s).
|
38 |
+
- **duplicates_dict**(`list`): (optional) a list of tuples with the duplicate strings and the number of times they are repeated.
|
39 |
|
40 |
By default, this measurement outputs a dictionary containing the fraction of duplicates in the input string(s) (`duplicate_fraction`):
|
41 |
)
|
|
|
46 |
With the `list_duplicates=True` option, this measurement will also output a dictionary of tuples with duplicate strings and their counts.
|
47 |
|
48 |
```python
|
49 |
+
{'duplicate_fraction': 0.33333333333333337, 'duplicates_dict': {'hello sun': 2}}
|
50 |
```
|
51 |
|
52 |
Warning: the `list_duplicates=True` function can be memory-intensive for large datasets.
|
|
|
69 |
>>> duplicates = evaluate.load("text_duplicates")
|
70 |
>>> results = duplicates.compute(data=data, list_duplicates=True)
|
71 |
>>> print(results)
|
72 |
+
{'duplicate_fraction': 0.4, 'duplicates_dict': {'hello sun': 2, 'foo bar': 2}}
|
73 |
```
|
74 |
|
75 |
## Citation(s)
|
requirements.txt
CHANGED
@@ -1 +1 @@
|
|
1 |
-
git+https://github.com/huggingface/evaluate.git@
|
|
|
1 |
+
git+https://github.com/huggingface/evaluate.git@eb281894ce23f68902c4b12040dd5b1a9cb32f90
|