File size: 2,974 Bytes
aafb4a6
38b5692
aafb4a6
 
 
 
 
 
38b5692
 
 
aafb4a6
 
 
 
38b5692
aafb4a6
38b5692
aafb4a6
38b5692
aafb4a6
38b5692
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aafb4a6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
---
base_model: Llama-3.2-3B-Instruct
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
- climate-policy
- query-interpretation
- lora
license: apache-2.0
language:
- en
---
# CLEAR Query Interpreter

This is the official implementation of the query interpretation model from our paper "CLEAR: Climate Policy Retrieval and Summarization Using LLMs" (WWW Companion '25).

## Model Description

The model is a LoRA adapter fine-tuned on Llama-3.2-3B to decompose natural language queries about climate policies into structured components for precise information retrieval.

### Task
Query interpretation for climate policy retrieval, decomposing natural queries into:
- Location (L): Geographic identification
- Topics (T): Climate-related themes
- Intent (I): Specific policy inquiries



### Training Details
- Base Model: Llama-3.2-3B
- Training Data: 330 manually annotated queries
- Annotators: Four Australia-based experts with media communication backgrounds
- Hardware: NVIDIA A100 GPU
- Parameters:
  - Batch size: 6
  - Sequence length: 1024
  - Optimizer: AdamW (weight decay 0.05)
  - Learning rate: 5e-5
  - Epochs: 10

## Usage 

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
device = "cuda" if torch.cuda.is_available() else "cpu"
model_name = "oscarwu/Llama-3.2-3B-CLEAR"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
   model_name,
   torch_dtype=torch.float16
).to(device)




# Example query
query = "I live in Burwood (Vic) and want details on renewable energy initiatives. Are solar farms planned?"

# Format prompt
prompt = f"""Below is an instruction that describes a task, paired with an input that provides further context. Your response must be a valid JSON object, strictly following the requested format.
### Instruction:
Extract location, topics, and search queries from Australian climate policy questions. Your response must be a valid JSON object with the following structure:
{{
 "rag_queries": ["query1", "query2", "query3"],  // 1-3 policy search queries
 "topics": ["topic1", "topic2", "topic3"],       // 1-3 climate/environment topics
 "location": {{                                   
   "query_suburb": "suburb_name or null",
   "query_state": "state_code or null", 
   "query_lga": "lga_name or null"
 }}
}}
### Input:
{query}
### Response (valid JSON only):
"""




# Generate response
inputs = tokenizer(prompt, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_new_tokens=220)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)

```json

{
  "rag_queries": [
    "What renewable energy projects are planned for Burwood?",
    "Are there solar farm initiatives in Burwood Victoria?"
  ],
  "topics": [
    "renewable energy",
    "solar power"
  ],
  "location": {
    "query_suburb": "Burwood",
    "query_state": "VIC",
    "query_lga": null
  }
}
```