adlumal commited on
Commit
bcb1ccd
1 Parent(s): d9171f8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -22,9 +22,11 @@ This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentence
22
  This model is a fine-tune of [BAAI/bge-small-en](https://huggingface.co/BAAI/bge-small-en) using the HCA case law in the [Open Australian Legal Corpus](https://huggingface.co/datasets/umarbutler/open-australian-legal-corpus) by Umar Butler. The PDF/OCR cases were not used.
23
 
24
  The cases were split into < 512 context chunks using the bge-small-en tokeniser and [semchunk](https://github.com/umarbutler/semchunk).
 
25
  [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) was used to generate a legal question for each context chunk.
26
 
27
  129,137 context-question pairs were used for training.
 
28
  14,348 context-question pairs were used for evaluation (see the table below for results).
29
 
30
  Using a 10% subset of the val dataset the following hit-rate performance was reached and is compared to the base model and OpenAI's default ada embedding model.
 
22
  This model is a fine-tune of [BAAI/bge-small-en](https://huggingface.co/BAAI/bge-small-en) using the HCA case law in the [Open Australian Legal Corpus](https://huggingface.co/datasets/umarbutler/open-australian-legal-corpus) by Umar Butler. The PDF/OCR cases were not used.
23
 
24
  The cases were split into < 512 context chunks using the bge-small-en tokeniser and [semchunk](https://github.com/umarbutler/semchunk).
25
+
26
  [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) was used to generate a legal question for each context chunk.
27
 
28
  129,137 context-question pairs were used for training.
29
+
30
  14,348 context-question pairs were used for evaluation (see the table below for results).
31
 
32
  Using a 10% subset of the val dataset the following hit-rate performance was reached and is compared to the base model and OpenAI's default ada embedding model.