File size: 4,591 Bytes
52e89e0 dc53f13 7b53a0f 102070c 45c8bc4 f6314ce 45c8bc4 8935117 45c8bc4 f31b931 102070c f31b931 45c8bc4 5286ed9 102070c 5286ed9 45c8bc4 7b53a0f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 |
---
license: apache-2.0
datasets:
- kejian/ACL-ARC
language:
- en
metrics:
- f1
base_model:
- Qwen/Qwen2.5-14B-Instruct
library_name: transformers
tags:
- scientometrics
- citation_analysis
- citation_intent_classification
pipeline_tag: zero-shot-classification
---
# Qwen2.5-14B-CIC-ACLARC
A fine-tuned model for Citation Intent Classification, based on [Qwen 2.5 14B Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) and trained on the [ACL-ARC](https://huggingface.co/datasets/kejian/ACL-ARC) dataset.
GGUF Version: https://huggingface.co/sknow-lab/Qwen2.5-14B-CIC-ACLARC-GGUF
## ACL-ARC classes
| Class | Description |
| --- | --- |
| Background | The cited paper provides relevant Background information or is part of the body of literature.|
| Motivation | The citing paper is directly motivated by the cited paper. |
| Uses | The citing paper uses the methodology or tools created by the cited paper.|
| Extends | The citing paper extends the methods, tools or data, etc. of the cited paper. |
| Comparison or Contrast | The citing paper expresses similarities or differences to, or disagrees with, the cited paper. |
| Future | *The cited paper may be a potential avenue for future work.|
## Quickstart
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "sknow-lab/Qwen2.5-14B-CIC-ACLARC"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
system_prompt = """
# CONTEXT #
You are an expert researcher tasked with classifying the intent of a citation in a scientific publication.
########
# OBJECTIVE #
You will be given a sentence containing a citation, you must output the appropriate class as an answer.
########
# CLASS DEFINITIONS #
The six (6) possible classes are the following: "BACKGROUND", "MOTIVATION", "USES", "EXTENDS", "COMPARES_CONTRASTS", "FUTURE".
The definitions of the classes are:
1 - BACKGROUND: The cited paper provides relevant Background information or is part of the body of literature.
2 - MOTIVATION: The citing paper is directly motivated by the cited paper.
3 - USES: The citing paper uses the methodology or tools created by the cited paper.
4 - EXTENDS: The citing paper extends the methods, tools or data, etc. of the cited paper.
5 - COMPARES_CONTRASTS: The citing paper expresses similarities or differences to, or disagrees with, the cited paper.
6 - FUTURE: The cited paper may be a potential avenue for future work.
########
# RESPONSE RULES #
- Analyze only the citation marked with the @@CITATION@@ tag.
- Assign exactly one class to each citation.
- Respond only with the exact name of one of the following classes: "BACKGROUND", "MOTIVATION", "USES", "EXTENDS", "COMPARES_CONTRASTS", "FUTURE".
- Do not provide any explanation or elaboration.
"""
test_citing_sentence = "However , the method we are currently using in the ATIS domain ( @@CITATION@@ ) represents our most promising approach to this problem."
user_prompt = f"""
{test_citing_sentence}
### Question: Which is the most likely intent for this citation?
a) BACKGROUND
b) MOTIVATION
c) USES
d) EXTENDS
e) COMPARES_CONTRASTS
f) FUTURE
### Answer:
"""
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
# Response: USES
```
Details about the system prompts and query templates can be found in the paper.
There might be a need for a cleanup function to extract the predicted label from the output. You can find ours on [GitHub](https://github.com/athenarc/CitationIntentOpenLLM/blob/main/citation_intent_classification_experiments.py).
## Citation
```
@misc{koloveas2025llmspredictcitationintent,
title={Can LLMs Predict Citation Intent? An Experimental Analysis of In-context Learning and Fine-tuning on Open LLMs},
author={Paris Koloveas and Serafeim Chatzopoulos and Thanasis Vergoulis and Christos Tryfonopoulos},
year={2025},
eprint={2502.14561},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.14561},
}
``` |