aksw
/

Bike-name

@@ -20,3 +20,126 @@ language:
 This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
+## 📄 Model Card: `aksw/Bike-name`
+### 🧠 Model Overview
+`Bike-name` is a Medium fine-tuned language model designed to **extract biochemical names from scientific text articles**. It is ideal for Information Retrieval systems based on Biohemical Knowledge Extraction.
+---
+### 🔍 Intended Use
+* **Input**: Text from a Biochemical PDF file
+* **Output**: A **single list** containing the corresponding biochemical names from the text.
+---
+### 🧩 Applications
+* Question Answering systems over Biochemical Datasets
+* Biochemical Knowledge graph exploration tools
+* Extraction of biochemical names from scientific text articles
+---
+### ⚙️ Model Details
+* **Base model**: Phi 4 14B (via Unsloth)
+* **Training**: Scientific text articles
+  * 418 unique names
+  * 143 articles
+* **Target Ontology**: NatUke Benchmarking (https://github.com/AKSW/natuke)
+* **Frameworks**: Unsloth, HuggingFace, Transformers
+---
+### 📦 Installation
+Make sure to install `unsloth`, `torch` and CUDA dependencies:
+```bash
+pip install unsloth torch
+```
+---
+### 🧪 Example: Inference Code
+```python
+from unsloth import FastLanguageModel
+import torch
+class SPARQLQueryGenerator:
+    def __init__(self, model_name: str, max_seq_length: int = 32768, load_in_4bit: bool = True):
+        self.model, self.tokenizer = FastLanguageModel.from_pretrained(
+            model_name=model_name,
+            max_seq_length=max_seq_length,
+            load_in_4bit=load_in_4bit
+        )
+        _ = FastLanguageModel.for_inference(self.model)
+    def build_prompt(self, article_text: str) -> list:
+        return [
+            {"role": "system", "content": (
+                "You are a scientist trained in chemistry.\n"
+                "You must extract information from scientific papers identifying relevant properties associated with each natural product discussed in the academic publication.\n"
+                "For each paper, you have to analyze the content (text) to identify the *Compound name*. It can be more than one compound name.\n"
+                "Your output should be a list with the names. Return only the list, without any additional information.\n"
+            )},
+            {"role": "user", "content": article_text}
+        ]
+    def generate_query(self, article_text: str, temperature: float = 0.01, max_new_tokens: int = 1024) -> str:
+        si = "<|im_start|>assistant<|im_sep|>"
+        sf = "<|im_end|>"
+        messages = self.build_prompt(article_text)
+        inputs = self.tokenizer.apply_chat_template(
+            messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
+        ).to("cuda")
+        outputs = self.model.generate(inputs, max_new_tokens=max_new_tokens, use_cache=True, temperature=temperature, min_p=0.1)
+        decoded = self.tokenizer.batch_decode(outputs)[0]
+        parsed = decoded[decoded.find(si):].replace(si, "").replace(sf, "")
+        try:
+            l = eval(parsed)
+        except:
+            l = parsed
+            print('Your output is not a list, you will need one more preprocessing step.')
+        return l
+# --- Using the model ---
+if __name__ == "__main__":
+    generator = SPARQLQueryGenerator(model_name="aksw/Bike-name")
+    text = "Title, Abstract, Introduction, Background, Method, Results, Conclusion, References."
+    list_names = generator.generate_query(text)
+    print(list_names)
+```
+---
+### 🧪 Evaluation
+The model was evaluated using Hits@k on the test sets of the NatUKE Benchmark (do Carmo et al. 2023)
+---
+Do Carmo, Paulo Viviurka, et al. "NatUKE: A Benchmark for Natural Product Knowledge Extraction from Academic Literature." 2023 IEEE 17th International Conference on Semantic Computing (ICSC). IEEE, 2023.
+### 📚 Citation
+If you use this model in your work, please cite it as:
+```
+@inproceedings{ref:doCarmo2025,
+  title={Improving Natural Product Knowledge Extraction from Academic Literature with Enhanced PDF Text Extraction and Large Language Models},
+  author={Viviurka do Carmo, Paulo and Silva G{\^o}lo, Marcos Paulo and Gwozdz, Jonas and Marx, Edgard and Marcondes Marcacini, Ricardo},
+  booktitle={Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing},
+  pages={980--987},
+  year={2025}
+}
+```