FutureMa commited on
Commit
d995207
·
verified ·
1 Parent(s): 98ffe92

Upload 2 files

Browse files
Files changed (2) hide show
  1. README.md +173 -0
  2. top5_performance.svg +56 -0
README.md ADDED
@@ -0,0 +1,173 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ pipeline_tag: text-classification
7
+ tags:
8
+ - finance
9
+ - earnings-calls
10
+ - evasion-detection
11
+ - nlp
12
+ - qwen3
13
+ base_model: Qwen/Qwen3-4B-Instruct-2507
14
+ datasets:
15
+ - FutureMa/EvasionBench
16
+ ---
17
+
18
+ # Eva-4B-V2
19
+
20
+ <p align="center">
21
+ <a href="https://huggingface.co/FutureMa/Eva-4B-V2"><img src="https://img.shields.io/badge/🤗-Model-yellow?style=for-the-badge" alt="Model"></a>
22
+ <a href="https://huggingface.co/datasets/FutureMa/EvasionBench"><img src="https://img.shields.io/badge/🤗-Dataset-orange?style=for-the-badge" alt="Dataset"></a>
23
+ <a href="https://github.com/IIIIQIIII/EvasionBench"><img src="https://img.shields.io/badge/GitHub-Repo-blue?style=for-the-badge" alt="GitHub"></a>
24
+ <a href="https://iiiiqiiii.github.io/EvasionBench"><img src="https://img.shields.io/badge/Project-Page-green?style=for-the-badge" alt="Project Page"></a>
25
+ </p>
26
+
27
+ <p align="center">
28
+ <b>A 4B parameter model fine-tuned for detecting evasive answers in earnings call Q&A sessions.</b>
29
+ </p>
30
+
31
+ ## Model Description
32
+
33
+ - **Base Model:** [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)
34
+ - **Task:** Text Classification (Evasion Detection)
35
+ - **Language:** English
36
+ - **License:** Apache 2.0
37
+
38
+ ## Performance
39
+
40
+ Eva-4B-V2 achieves **84.9% Macro-F1** on the EvasionBench evaluation set, outperforming frontier LLMs:
41
+
42
+ <p align="center">
43
+ <img src="top5_performance.svg" alt="Top 5 Model Performance" width="100%">
44
+ </p>
45
+
46
+ | Rank | Model | Macro-F1 |
47
+ |------|-------|----------|
48
+ | 1 | **Eva-4B-V2** | **84.9%** |
49
+ | 2 | Gemini 3 Flash | 84.6% |
50
+ | 3 | Claude Opus 4.5 | 84.4% |
51
+ | 4 | GLM-4.7 | 82.9% |
52
+ | 5 | GPT-5.2 | 80.9% |
53
+
54
+ ### Per-Class Performance
55
+
56
+ | Class | Precision | Recall | F1 |
57
+ |-------|-----------|--------|-----|
58
+ | Direct | 90.6% | 75.1% | 82.1% |
59
+ | Intermediate | 73.7% | 87.7% | 80.1% |
60
+ | Fully Evasive | 93.3% | 91.6% | 92.4% |
61
+
62
+ ## Label Definitions
63
+
64
+ | Label | Definition |
65
+ |-------|------------|
66
+ | `direct` | The core question is directly and explicitly answered |
67
+ | `intermediate` | The response provides related context but sidesteps the specific core |
68
+ | `fully_evasive` | The question is ignored, explicitly refused, or entirely off-topic |
69
+
70
+ ## Training
71
+
72
+ ### Two-Stage Training Pipeline
73
+
74
+ ```
75
+ Qwen3-4B-Instruct-2507
76
+
77
+ ▼ Stage 1: 60K consensus data
78
+
79
+ Eva-4B-Consensus
80
+
81
+ ▼ Stage 2: 24K three-judge data
82
+
83
+ Eva-4B-V2
84
+ ```
85
+
86
+ ### Training Configuration
87
+
88
+ | Parameter | Stage 1 | Stage 2 |
89
+ |-----------|---------|---------|
90
+ | Dataset | 60K consensus | 24K three-judge |
91
+ | Epochs | 2 | 2 |
92
+ | Learning Rate | 2e-5 | 2e-5 |
93
+ | Batch Size | 32 | 32 |
94
+ | Max Length | 2500 | 2048 |
95
+ | Precision | bfloat16 | bfloat16 |
96
+
97
+ ### Hardware
98
+
99
+ - **Stage 1:** 2x NVIDIA B200 (180GB SXM6)
100
+ - **Stage 2:** 4x NVIDIA H100 (80GB SXM5)
101
+
102
+ ## Usage
103
+
104
+ ### With Transformers
105
+
106
+ ````python
107
+ from transformers import AutoModelForCausalLM, AutoTokenizer
108
+ import torch
109
+
110
+ model_name = "FutureMa/Eva-4B-V2"
111
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
112
+ model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
113
+
114
+ # Prompt template
115
+ prompt = """You are a financial analyst. Your task is to Detect Evasive Answers in Financial Q&A
116
+
117
+ Question: What is the expected margin for Q4?
118
+ Answer: We expect it to be 32%.
119
+
120
+ Response format:
121
+ ```json
122
+ {"label": "direct|intermediate|fully_evasive"}
123
+ ```
124
+
125
+ Answer in ```json content, no other text"""
126
+
127
+ messages = [{"role": "user", "content": prompt}]
128
+ text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=False)
129
+ inputs = tokenizer(text, return_tensors="pt").to(model.device)
130
+
131
+ with torch.no_grad():
132
+ outputs = model.generate(**inputs, max_new_tokens=64, temperature=0.1, do_sample=False)
133
+
134
+ generated = outputs[0][inputs["input_ids"].shape[1]:]
135
+ print(tokenizer.decode(generated, skip_special_tokens=True))
136
+ # Output: ```json
137
+ # {"label": "direct"}
138
+ # ```
139
+ ````
140
+
141
+ ### With vLLM
142
+
143
+ ```python
144
+ from vllm import LLM, SamplingParams
145
+
146
+ llm = LLM(model="FutureMa/Eva-4B-V2")
147
+ sampling_params = SamplingParams(temperature=0, max_tokens=64)
148
+
149
+ outputs = llm.generate([prompt], sampling_params)
150
+ print(outputs[0].outputs[0].text)
151
+ ```
152
+
153
+ ## Links
154
+
155
+ | Resource | URL |
156
+ |----------|-----|
157
+ | **Dataset** | [FutureMa/EvasionBench](https://huggingface.co/datasets/FutureMa/EvasionBench) |
158
+ | **GitHub** | [IIIIQIIII/EvasionBench](https://github.com/IIIIQIIII/EvasionBench) |
159
+
160
+ ## Citation
161
+
162
+ ```bibtex
163
+ @misc{eva4b2025,
164
+ title={Eva-4B: A Fine-tuned Model for Evasion Detection in Earnings Calls},
165
+ author={EvasionBench Team},
166
+ year={2025},
167
+ url={https://github.com/IIIIQIIII/EvasionBench}
168
+ }
169
+ ```
170
+
171
+ ## License
172
+
173
+ Apache 2.0
top5_performance.svg ADDED