ilyankou commited on
Commit
2a68202
·
verified ·
1 Parent(s): 365a23c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -4
README.md CHANGED
@@ -22,14 +22,20 @@ base_model: BAAI/bge-small-en-v1.5
22
 
23
  # Spatial Web Search Query Classifier
24
 
25
- A binary [SetFit](https://github.com/huggingface/setfit) classifier that distinguishes spatial from non-spatial web search queries. Trained on a gold-annotated sample of [MS MARCO](https://microsoft.github.io/msmarco/) and used to identify 104,288 spatial queries (10.3%) across the full 1.01M-query corpus.
 
 
 
26
 
27
  **Accuracy / F1: 0.986** on a held-out balanced test set (76 negative, 72 positive).
28
 
29
 
30
  ## What counts as spatial?
31
 
32
- A query is spatial if its answer is geographically variant and requires reasoning about geographic primitives (location, distance, or direction) or topological relationships (adjacency, containment, or connectivity). This includes implicitly spatial queries such as costs and prices in a specific area — not just those containing a toponym.
 
 
 
33
 
34
  ## Model details
35
 
@@ -44,10 +50,15 @@ A query is spatial if its answer is geographically variant and requires reasonin
44
  from setfit import SetFitModel
45
 
46
  model = SetFitModel.from_pretrained("TODO")
47
- preds = model(["weather in erlanger ky", "what is symptom of bipolar disorder"])
 
 
 
48
  # => [1, 0]
49
  ```
50
 
51
  ## Training
52
 
53
- Weak labels were generated by running Llama 3.1 five times per query at temperature 0.2, then manually verified. The SetFit model was trained for one epoch with batch size 64 and learning rate 1e-5, then retrained on the full gold dataset for production inference.
 
 
 
22
 
23
  # Spatial Web Search Query Classifier
24
 
25
+ A binary [SetFit](https://github.com/huggingface/setfit) classifier that distinguishes spatial
26
+ from non-spatial web search queries. Trained on a gold-annotated sample
27
+ of [MS MARCO](https://microsoft.github.io/msmarco/) and used to identify 104,288 spatial
28
+ queries (10.3%) across the full 1.01M-query corpus.
29
 
30
  **Accuracy / F1: 0.986** on a held-out balanced test set (76 negative, 72 positive).
31
 
32
 
33
  ## What counts as spatial?
34
 
35
+ A query is spatial if its answer is geographically variant and requires reasoning
36
+ about geographic primitives (location, distance, or direction) or topological
37
+ relationships (adjacency, containment, or connectivity). This includes implicitly
38
+ spatial queries such as costs and prices in a specific area, not just those containing a toponym.
39
 
40
  ## Model details
41
 
 
50
  from setfit import SetFitModel
51
 
52
  model = SetFitModel.from_pretrained("TODO")
53
+ preds = model([
54
+ "weather in erlanger ky",
55
+ "what is symptom of bipolar disorder"
56
+ ])
57
  # => [1, 0]
58
  ```
59
 
60
  ## Training
61
 
62
+ Weak labels were generated by running Llama 3.1 five times per query at temperature 0.2,
63
+ then manually verified. The SetFit model was trained for one epoch with batch size 64
64
+ and learning rate 1e-5, then retrained on the full gold dataset for production inference.