deepsodha commited on
Commit
beb5479
·
verified ·
1 Parent(s): f63e0db

Upload 25 files

Browse files
README.md CHANGED
@@ -1,89 +1,13 @@
1
- ---
2
- title: AxionX Digital — AI QA Demo
3
- sdk: gradio
4
- app_file: app.py
5
- emoji: 🧠
6
- colorFrom: purple
7
- colorTo: blue
8
- pinned: false
9
- license: mit
10
- ---
11
-
12
- # 🧠 AxionX Digital — AI Question Answering Demo
13
-
14
- Welcome to **AxionX Digital’s** live demonstration of a fine-tuned **Question Answering Model** built and deployed with [Hugging Face Spaces](https://huggingface.co/spaces).
15
-
16
- This public showcase illustrates our model-training, evaluation, and deployment capabilities.
17
- It runs on pinned dependencies for **1-year guaranteed stability** — perfect for long-term client demos.
18
-
19
- ---
20
-
21
- ## 🚀 Model Overview
22
 
23
- | Property | Details |
24
- |-----------|----------|
25
- | **Base Model** | `distilbert-base-cased-distilled-squad` |
26
- | **Task** | Extractive Question Answering |
27
- | **Framework** | Transformers + Gradio |
28
- | **Deployment** | Hugging Face Spaces (CPU) |
29
- | **Stability** | Version-pinned for 12 months |
30
 
31
- ---
32
-
33
- ## 💡 Try It Yourself
34
-
35
- 1. Paste any paragraph into **Context**.
36
- 2. Ask a natural-language question about it.
37
- 3. Instantly see the extracted **Answer** with confidence score.
38
-
39
- Example Context:
40
- > AxionX Digital builds model-training tools for AI developers.
41
- > We fine-tune open-source LLMs for customer-support, finance, and legal domains.
42
-
43
- Example Question:
44
- > What does AxionX Digital build?
45
 
46
  ---
47
 
48
- ## 🧩 Key Features
49
-
50
- - ⚙️ **End-to-End Training Pipeline** (fine-tuning + evaluation + deployment)
51
- - 🔒 **Privacy-Safe Data Handling** for enterprise use cases
52
- - 🌐 **Hosted Demos & APIs** — deploy anywhere (Spaces, AWS, or on-prem)
53
- - 🧾 **Transparent Metrics** — reproducible and version-controlled
54
-
55
- ---
56
-
57
- ## 🏢 About AxionX Digital
58
-
59
- **AxionX Digital** is a next-generation AI engineering startup specializing in:
60
-
61
- - Custom LLM training and fine-tuning
62
- - Evaluation and benchmarking frameworks
63
- - Agentic workflow automation
64
- - Scalable model deployment pipelines
65
-
66
- 🌍 **Website:** *coming soon*
67
- 📧 **Contact:** hello@axionxdigital.com
68
- 📱 **LinkedIn:** [linkedin.com/company/axionxdigital](https://linkedin.com/company/axionxdigital)
69
-
70
- ---
71
-
72
- ## 🏗 Tech Stack
73
-
74
- | Layer | Tools |
75
- |-------|--------|
76
- | **Training** | 🤗 Transformers / Datasets |
77
- | **Serving** | Gradio UI / FastAPI |
78
- | **Infra** | Hugging Face Spaces / Docker / AWS |
79
- | **Monitoring** | W&B / Prometheus (optional) |
80
-
81
- ---
82
-
83
- ## 💬 License
84
- MIT License — feel free to fork, modify, and explore.
85
-
86
- ---
87
-
88
- ### 🌟 Built with ❤️ by [AxionX Digital](https://huggingface.co/deepsodha)
89
-
 
1
+ # 🚀 AxionX Digital — Model Training & Evaluation Suite
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
+ A collection of open-source LLM fine-tuning and evaluation demos:
 
 
 
 
 
 
4
 
5
+ | Project | Description | Tech |
6
+ |----------|--------------|------|
7
+ | 💰 FinanceGPT | Fine-tuned FLAN-T5 for financial Q&A and summarization | LoRA · HF Transformers |
8
+ | ⚖️ LegalDoc Summarizer | Clause-level summarization using CUAD dataset | FLAN-T5 · PEFT |
9
+ | 🛍️ RetailGPT Evaluator | Benchmarking retail-QA models + leaderboard UI | Evaluation · Streamlit |
 
 
 
 
 
 
 
 
 
10
 
11
  ---
12
 
13
+ ## 🧩 Structure
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
financegpt/README.md ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 💰 FinanceGPT — AxionX Digital
2
+
3
+ **Goal:** Fine-tuned model for financial report Q&A and summarization.
4
+
5
+ ### Features
6
+ - Fine-tunes FLAN-T5-base on financial sentence dataset
7
+ - LoRA configuration for lightweight training
8
+ - Evaluation (ROUGE / BLEU / factuality)
9
+ - Streamlit demo interface
10
+
11
+ ### Run on Hugging Face Notebook
12
+ ```bash
13
+ !python financegpt/dataset_loader.py
14
+ !python financegpt/train.py
15
+ !python financegpt/evaluate.py
financegpt/app.py ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ from shared.hf_helpers import build_pipeline
3
+ import yaml
4
+
5
+ st.set_page_config(page_title="FinanceGPT Demo", page_icon="💰", layout="centered")
6
+
7
+ st.title("💰 FinanceGPT — Financial Q&A Demo")
8
+
9
+ with open("config.yaml") as f:
10
+ cfg = yaml.safe_load(f)
11
+
12
+ model_name = st.selectbox("Select model:", [cfg["base_model"], "models/financegpt"])
13
+
14
+ @st.cache_resource
15
+ def get_pipe(model_name):
16
+ return build_pipeline(model_name)
17
+
18
+ pipe = get_pipe(model_name)
19
+
20
+ prompt = st.text_area("Enter a financial statement or question:")
21
+ if st.button("Generate Answer"):
22
+ if prompt.strip():
23
+ result = pipe(prompt, max_new_tokens=cfg["demo"]["max_new_tokens"])
24
+ st.markdown("### 🧠 Answer")
25
+ st.write(result[0]["generated_text"])
financegpt/config.yaml ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ project: "FinanceGPT"
2
+ base_model: "google/flan-t5-base"
3
+ dataset_name: "AxionX/financegpt-sec-sample"
4
+ train:
5
+ epochs: 3
6
+ batch_size: 4
7
+ lr: 2e-4
8
+ lora_r: 8
9
+ lora_alpha: 16
10
+ lora_dropout: 0.05
11
+ evaluate:
12
+ metrics: ["rouge", "bleu", "factuality"]
13
+ demo:
14
+ max_new_tokens: 256
financegpt/dataset_loader.py ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from datasets import load_dataset
2
+ import pandas as pd
3
+ import os
4
+
5
+ def load_finance_dataset():
6
+ """
7
+ Loads a small sample of SEC 10-K/10-Q Q&A style data.
8
+ Replace with your own dataset or HF dataset ID.
9
+ """
10
+ dataset = load_dataset("Abirate/financial_phrasebank", split="train[:100]")
11
+ df = pd.DataFrame(dataset)
12
+ # Create synthetic QA pairs for demo
13
+ df["question"] = "Summarize this financial statement: " + df["sentence"]
14
+ df["answer"] = df["label"].astype(str)
15
+ dataset_dict = df[["question", "answer"]].to_dict(orient="records")
16
+ os.makedirs("datasets", exist_ok=True)
17
+ pd.DataFrame(dataset_dict).to_json("datasets/financegpt_sample.jsonl", orient="records", lines=True)
18
+ print("✅ Saved dataset to datasets/financegpt_sample.jsonl")
19
+ return dataset_dict
20
+
21
+ if __name__ == "__main__":
22
+ load_finance_dataset()
financegpt/evaluate.py ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ from datasets import load_dataset
3
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
4
+ from shared.metrics import compute_rouge, compute_bleu, factuality_score
5
+ from shared.utils import print_banner
6
+
7
+ def evaluate_model(model_path="models/financegpt"):
8
+ print_banner("Evaluating FinanceGPT")
9
+
10
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
11
+ model = AutoModelForSeq2SeqLM.from_pretrained(model_path)
12
+
13
+ dataset = load_dataset("json", data_files="datasets/financegpt_sample.jsonl", split="train[:50]")
14
+
15
+ preds, refs = [], []
16
+ for row in dataset:
17
+ inputs = tokenizer(row["question"], return_tensors="pt", truncation=True)
18
+ output = model.generate(**inputs, max_new_tokens=64)
19
+ preds.append(tokenizer.decode(output[0], skip_special_tokens=True))
20
+ refs.append(row["answer"])
21
+
22
+ results = {}
23
+ results.update(compute_rouge(preds, refs))
24
+ results.update(compute_bleu(preds, refs))
25
+ results.update(factuality_score(preds, refs))
26
+
27
+ with open("models/financegpt/eval_results.json", "w") as f:
28
+ json.dump(results, f, indent=2)
29
+ print("✅ Evaluation complete:", results)
30
+
31
+ if __name__ == "__main__":
32
+ evaluate_model()
financegpt/train.py ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, Trainer, TrainingArguments
3
+ from peft import LoraConfig, get_peft_model
4
+ from datasets import load_dataset
5
+ from shared.utils import load_yaml_config, ensure_dir, print_banner
6
+
7
+ def main():
8
+ cfg = load_yaml_config("config.yaml")
9
+ print_banner("Training FinanceGPT")
10
+
11
+ tokenizer = AutoTokenizer.from_pretrained(cfg["base_model"])
12
+ model = AutoModelForSeq2SeqLM.from_pretrained(cfg["base_model"])
13
+
14
+ # LoRA configuration
15
+ peft_config = LoraConfig(
16
+ r=cfg["train"]["lora_r"],
17
+ lora_alpha=cfg["train"]["lora_alpha"],
18
+ lora_dropout=cfg["train"]["lora_dropout"],
19
+ bias="none",
20
+ task_type="SEQ_2_SEQ_LM",
21
+ )
22
+ model = get_peft_model(model, peft_config)
23
+
24
+ dataset = load_dataset("json", data_files="datasets/financegpt_sample.jsonl", split="train")
25
+
26
+ def preprocess(batch):
27
+ inputs = tokenizer(batch["question"], truncation=True, padding="max_length", max_length=256)
28
+ labels = tokenizer(batch["answer"], truncation=True, padding="max_length", max_length=256)
29
+ inputs["labels"] = labels["input_ids"]
30
+ return inputs
31
+
32
+ tokenized = dataset.map(preprocess, batched=True)
33
+
34
+ args = TrainingArguments(
35
+ output_dir="models/financegpt",
36
+ per_device_train_batch_size=cfg["train"]["batch_size"],
37
+ learning_rate=cfg["train"]["lr"],
38
+ num_train_epochs=cfg["train"]["epochs"],
39
+ fp16=torch.cuda.is_available(),
40
+ save_strategy="epoch",
41
+ )
42
+
43
+ trainer = Trainer(model=model, args=args, train_dataset=tokenized)
44
+ trainer.train()
45
+
46
+ ensure_dir("models/financegpt")
47
+ model.save_pretrained("models/financegpt")
48
+ tokenizer.save_pretrained("models/financegpt")
49
+ print("✅ Model saved at models/financegpt")
50
+
51
+ if __name__ == "__main__":
52
+ main()
legaldoc_summarizer/README.md ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ⚖️ LegalDoc Summarizer — AxionX Digital
2
+
3
+ **Purpose:** Summarize long legal clauses and judgments into short, factual summaries.
4
+
5
+ ### Key Features
6
+ - Fine-tunes FLAN-T5 on CUAD contract dataset
7
+ - Outputs clause-level summaries with LoRA
8
+ - Evaluates with ROUGE / BLEU / factual overlap
9
+ - Streamlit UI for fast testing
10
+
11
+ ### Usage
12
+ ```bash
13
+ !python legaldoc_summarizer/dataset_loader.py
14
+ !python legaldoc_summarizer/train.py
15
+ !python legaldoc_summarizer/evaluate.py
legaldoc_summarizer/app.py ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ from shared.hf_helpers import build_pipeline
3
+ import yaml
4
+
5
+ st.set_page_config(page_title="LegalDoc Summarizer", page_icon="⚖️", layout="wide")
6
+ st.title("⚖️ LegalDoc Summarizer — AxionX Digital")
7
+
8
+ with open("config.yaml") as f:
9
+ cfg = yaml.safe_load(f)
10
+
11
+ model_name = st.selectbox("Model:", [cfg["base_model"], "models/legaldoc_summarizer"])
12
+
13
+ @st.cache_resource
14
+ def get_pipeline(model_name):
15
+ return build_pipeline(model_name)
16
+
17
+ pipe = get_pipeline(model_name)
18
+
19
+ st.write("Paste a contract clause or judgment text below:")
20
+ text = st.text_area("Clause or Legal Text", height=250)
21
+
22
+ if st.button("Summarize"):
23
+ if text.strip():
24
+ result = pipe(text, max_new_tokens=cfg["demo"]["max_new_tokens"])
25
+ st.markdown("### 🧾 Summary")
26
+ st.write(result[0]["generated_text"])
legaldoc_summarizer/config.yaml ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ project: "LegalDocSummarizer"
2
+ base_model: "google/flan-t5-base"
3
+ dataset_name: "cuad" # Contract Understanding Atticus Dataset
4
+ train:
5
+ epochs: 3
6
+ batch_size: 4
7
+ lr: 2e-4
8
+ lora_r: 8
9
+ lora_alpha: 16
10
+ lora_dropout: 0.05
11
+ evaluate:
12
+ metrics: ["rouge", "bleu", "factuality"]
13
+ demo:
14
+ max_new_tokens: 300
legaldoc_summarizer/dataset_loader.py ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from datasets import load_dataset
2
+ import pandas as pd, os
3
+
4
+ def load_legal_dataset():
5
+ """
6
+ Loads a small portion of the CUAD dataset (contract clauses).
7
+ Converts each clause into (document_text, summary) pairs.
8
+ """
9
+ dataset = load_dataset("cuad", "cuad_v1", split="train[:200]")
10
+ df = pd.DataFrame(dataset)
11
+
12
+ df["question_text"] = "Summarize the key legal clause: " + df["question_text"]
13
+ df["answer"] = df["answers"].apply(lambda a: a[0]["text"][0] if a and a[0]["text"] else "")
14
+
15
+ data = df[["question_text", "answer"]].rename(columns={"question_text": "question"})
16
+ os.makedirs("datasets", exist_ok=True)
17
+ data.to_json("datasets/legal_sample.jsonl", orient="records", lines=True)
18
+ print("✅ Saved sample dataset to datasets/legal_sample.jsonl")
19
+ return data
20
+
21
+ if __name__ == "__main__":
22
+ load_legal_dataset()
legaldoc_summarizer/evaluate.py ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ from datasets import load_dataset
3
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
4
+ from shared.metrics import compute_rouge, compute_bleu, factuality_score
5
+ from shared.utils import print_banner
6
+
7
+ def evaluate_model(model_path="models/legaldoc_summarizer"):
8
+ print_banner("Evaluating LegalDoc Summarizer")
9
+
10
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
11
+ model = AutoModelForSeq2SeqLM.from_pretrained(model_path)
12
+
13
+ dataset = load_dataset("json", data_files="datasets/legal_sample.jsonl", split="train[:100]")
14
+
15
+ preds, refs = [], []
16
+ for row in dataset:
17
+ inputs = tokenizer(row["question"], return_tensors="pt", truncation=True)
18
+ output = model.generate(**inputs, max_new_tokens=256)
19
+ preds.append(tokenizer.decode(output[0], skip_special_tokens=True))
20
+ refs.append(row["answer"])
21
+
22
+ results = {}
23
+ results.update(compute_rouge(preds, refs))
24
+ results.update(compute_bleu(preds, refs))
25
+ results.update(factuality_score(preds, refs))
26
+
27
+ with open("models/legaldoc_summarizer/eval_results.json", "w") as f:
28
+ json.dump(results, f, indent=2)
29
+ print("✅ Evaluation complete:", results)
30
+
31
+ if __name__ == "__main__":
32
+ evaluate_model()
legaldoc_summarizer/train.py ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, Trainer, TrainingArguments
3
+ from peft import LoraConfig, get_peft_model
4
+ from datasets import load_dataset
5
+ from shared.utils import load_yaml_config, ensure_dir, print_banner
6
+
7
+ def main():
8
+ cfg = load_yaml_config("config.yaml")
9
+ print_banner("Training LegalDoc Summarizer")
10
+
11
+ tokenizer = AutoTokenizer.from_pretrained(cfg["base_model"])
12
+ model = AutoModelForSeq2SeqLM.from_pretrained(cfg["base_model"])
13
+
14
+ peft_config = LoraConfig(
15
+ r=cfg["train"]["lora_r"],
16
+ lora_alpha=cfg["train"]["lora_alpha"],
17
+ lora_dropout=cfg["train"]["lora_dropout"],
18
+ task_type="SEQ_2_SEQ_LM",
19
+ )
20
+ model = get_peft_model(model, peft_config)
21
+
22
+ dataset = load_dataset("json", data_files="datasets/legal_sample.jsonl", split="train")
23
+
24
+ def preprocess(batch):
25
+ inputs = tokenizer(batch["question"], truncation=True, padding="max_length", max_length=512)
26
+ labels = tokenizer(batch["answer"], truncation=True, padding="max_length", max_length=256)
27
+ inputs["labels"] = labels["input_ids"]
28
+ return inputs
29
+
30
+ tokenized = dataset.map(preprocess, batched=True)
31
+
32
+ args = TrainingArguments(
33
+ output_dir="models/legaldoc_summarizer",
34
+ per_device_train_batch_size=cfg["train"]["batch_size"],
35
+ learning_rate=cfg["train"]["lr"],
36
+ num_train_epochs=cfg["train"]["epochs"],
37
+ fp16=torch.cuda.is_available(),
38
+ save_strategy="epoch",
39
+ )
40
+
41
+ trainer = Trainer(model=model, args=args, train_dataset=tokenized)
42
+ trainer.train()
43
+
44
+ ensure_dir("models/legaldoc_summarizer")
45
+ model.save_pretrained("models/legaldoc_summarizer")
46
+ tokenizer.save_pretrained("models/legaldoc_summarizer")
47
+ print("✅ Model saved at models/legaldoc_summarizer")
48
+
49
+ if __name__ == "__main__":
50
+ main()
retailgpt_evaluator/README.md ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🛍️ RetailGPT Evaluator — AxionX Digital
2
+
3
+ **Purpose:** Evaluate and compare multiple retail QA models on the same dataset.
4
+
5
+ ### Includes
6
+ - `evaluate.py` → runs metrics across multiple models
7
+ - `leaderboard.py` → aggregates results into ranking
8
+ - `app.py` → Streamlit UI with leaderboard + live model chat
9
+
10
+ ### Usage
11
+ ```bash
12
+ !python retailgpt_evaluator/dataset_loader.py
13
+ !python retailgpt_evaluator/evaluate.py
retailgpt_evaluator/app.py ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ from shared.hf_helpers import build_pipeline
3
+ from leaderboard import build_leaderboard
4
+ import yaml, pandas as pd, os
5
+
6
+ st.set_page_config(page_title="RetailGPT Evaluator", page_icon="🛍️", layout="wide")
7
+ st.title("🛍️ RetailGPT Evaluator — AxionX Digital")
8
+
9
+ with open("config.yaml") as f:
10
+ cfg = yaml.safe_load(f)
11
+
12
+ if os.path.exists("models/retail_eval_results.json"):
13
+ df = build_leaderboard()
14
+ st.subheader("📊 Model Leaderboard")
15
+ st.dataframe(df, use_container_width=True)
16
+ else:
17
+ st.warning("Run `evaluate.py` first to generate metrics.")
18
+
19
+ model_name = st.selectbox("Choose a model to chat with:", cfg["models"])
20
+ pipe = build_pipeline(model_name)
21
+
22
+ query = st.text_area("Customer query:", "I want to return a damaged product.")
23
+ if st.button("Ask Model"):
24
+ result = pipe(query, max_new_tokens=cfg["demo"]["max_new_tokens"])
25
+ st.markdown("### 🧠 Model Response")
26
+ st.write(result[0]["generated_text"])
retailgpt_evaluator/config.yaml ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ project: "RetailGPT_Evaluator"
2
+ dataset_name: "axionx/retail_chatqa"
3
+ models:
4
+ - "google/flan-t5-base"
5
+ - "tiiuae/falcon-1b"
6
+ - "mistralai/Mistral-7B-Instruct-v0.2"
7
+ evaluate:
8
+ metrics: ["rouge", "bleu", "factuality"]
9
+ demo:
10
+ max_new_tokens: 128
retailgpt_evaluator/dataset_loader.py ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from datasets import load_dataset
2
+ import pandas as pd, os
3
+
4
+ def load_retail_dataset():
5
+ """
6
+ Loads a retail/e-commerce QA dataset from HF (small sample)
7
+ or synthetically creates one for evaluation.
8
+ """
9
+ dataset = load_dataset("amazon_polarity", split="train[:200]")
10
+ df = pd.DataFrame(dataset)
11
+ df["question"] = "Customer asks about this review: " + df["title"]
12
+ df["answer"] = df["content"]
13
+ sample = df[["question", "answer"]]
14
+ os.makedirs("datasets", exist_ok=True)
15
+ sample.to_json("datasets/retail_sample.jsonl", orient="records", lines=True)
16
+ print("✅ Saved datasets/retail_sample.jsonl")
17
+ return sample
18
+
19
+ if __name__ == "__main__":
20
+ load_retail_dataset()
retailgpt_evaluator/evaluate.py ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ from datasets import load_dataset
3
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
4
+ from shared.metrics import compute_rouge, compute_bleu, factuality_score
5
+ from shared.utils import print_banner
6
+ import torch
7
+
8
+ def run_eval_for_model(model_name, dataset):
9
+ print_banner(f"Evaluating {model_name}")
10
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
11
+ model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
12
+ preds, refs = [], []
13
+ for row in dataset:
14
+ inputs = tokenizer(row["question"], return_tensors="pt", truncation=True)
15
+ with torch.no_grad():
16
+ outputs = model.generate(**inputs, max_new_tokens=128)
17
+ preds.append(tokenizer.decode(outputs[0], skip_special_tokens=True))
18
+ refs.append(row["answer"])
19
+ r = compute_rouge(preds, refs)
20
+ b = compute_bleu(preds, refs)
21
+ f = factuality_score(preds, refs)
22
+ return {"model": model_name, **r, **b, **f}
23
+
24
+ def evaluate_all():
25
+ from shared.utils import load_yaml_config
26
+ cfg = load_yaml_config("config.yaml")
27
+ dataset = load_dataset("json", data_files="datasets/retail_sample.jsonl", split="train[:50]")
28
+ results = [run_eval_for_model(m, dataset) for m in cfg["models"]]
29
+ json.dump(results, open("models/retail_eval_results.json", "w"), indent=2)
30
+ print("✅ Saved results to models/retail_eval_results.json")
31
+ return results
32
+
33
+ if __name__ == "__main__":
34
+ evaluate_all()
retailgpt_evaluator/leaderboard.py ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ import pandas as pd, json
2
+
3
+ def build_leaderboard(path="models/retail_eval_results.json"):
4
+ data = json.load(open(path))
5
+ df = pd.DataFrame(data)
6
+ # create composite score
7
+ df["score"] = (df["rougeL"] + df["bleu"] + df["factuality"]) / 3
8
+ df = df.sort_values("score", ascending=False)
9
+ return df[["model", "rougeL", "bleu", "factuality", "score"]]
shared/config.yaml ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ default_model: "google/flan-t5-base"
2
+ default_dataset_path: "./datasets/sample.jsonl"
3
+ train:
4
+ batch_size: 4
5
+ lr: 2e-4
6
+ epochs: 3
7
+ lora_r: 8
8
+ lora_alpha: 16
9
+ lora_dropout: 0.05
10
+ evaluate:
11
+ metrics: ["rouge", "bleu", "factuality"]
shared/hf_helpers.py ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
2
+ import torch
3
+
4
+ def load_model_and_tokenizer(model_name: str):
5
+ """Load a model and tokenizer for inference."""
6
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
7
+ model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
8
+ return model, tokenizer
9
+
10
+ def generate_answer(model, tokenizer, prompt: str, max_tokens: int = 256):
11
+ """Generate text output from a model given a prompt."""
12
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
13
+ with torch.no_grad():
14
+ outputs = model.generate(**inputs, max_new_tokens=max_tokens)
15
+ return tokenizer.decode(outputs[0], skip_special_tokens=True)
16
+
17
+ def build_pipeline(model_name: str, task="text2text-generation"):
18
+ """Return a Hugging Face pipeline for inference."""
19
+ return pipeline(task, model=model_name)
shared/metrics.py ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from datasets import load_metric
2
+ import numpy as np
3
+
4
+ def compute_rouge(preds, refs):
5
+ rouge = load_metric("rouge")
6
+ return rouge.compute(predictions=preds, references=refs)
7
+
8
+ def compute_bleu(preds, refs):
9
+ bleu = load_metric("bleu")
10
+ refs = [[r] for r in refs] # bleu expects list of lists
11
+ return bleu.compute(predictions=preds, references=refs)
12
+
13
+ def factuality_score(preds, refs):
14
+ """Very simple lexical overlap metric for factual alignment."""
15
+ scores = []
16
+ for p, r in zip(preds, refs):
17
+ p_tokens = set(p.lower().split())
18
+ r_tokens = set(r.lower().split())
19
+ scores.append(len(p_tokens & r_tokens) / max(1, len(r_tokens)))
20
+ return {"factuality": np.mean(scores)}
shared/requirements.txt ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Core ML / NLP
2
+ transformers>=4.44.0
3
+ datasets>=2.21.0
4
+ evaluate>=0.4.2
5
+ peft>=0.12.0
6
+ bitsandbytes>=0.43.0
7
+ accelerate>=0.31.0
8
+ torch>=2.3.0
9
+ sentencepiece
10
+ scipy
11
+ numpy
12
+ pandas
13
+
14
+ # App / Dashboard
15
+ streamlit>=1.37.0
16
+ plotly>=5.22.0
17
+ fastapi>=0.110.0
18
+ uvicorn>=0.29.0
19
+
20
+ # Utility
21
+ pyyaml
22
+ tqdm
shared/utils.py ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import yaml
2
+ import os
3
+ from pathlib import Path
4
+
5
+ def load_yaml_config(path: str):
6
+ """Load YAML config file safely."""
7
+ with open(path, "r") as f:
8
+ return yaml.safe_load(f)
9
+
10
+ def ensure_dir(path: str):
11
+ """Create directory if it doesn't exist."""
12
+ Path(path).mkdir(parents=True, exist_ok=True)
13
+
14
+ def print_banner(title: str):
15
+ print("=" * (len(title) + 8))
16
+ print(f"=== {title} ===")
17
+ print("=" * (len(title) + 8))
streamlit_hub.py ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ import importlib
3
+
4
+ st.set_page_config(page_title="AxionX Digital Hub", page_icon="🚀", layout="wide")
5
+ st.title("🚀 AxionX Digital Model Training Suite")
6
+
7
+ st.sidebar.title("🧠 Select Demo")
8
+ demo = st.sidebar.radio(
9
+ "Choose one:",
10
+ ("💰 FinanceGPT", "⚖️ LegalDoc Summarizer", "🛍️ RetailGPT Evaluator"),
11
+ )
12
+
13
+ st.sidebar.markdown("---")
14
+ st.sidebar.markdown("### About AxionX Digital")
15
+ st.sidebar.info(
16
+ "AxionX Digital fine-tunes and evaluates language models for finance, law, and retail. "
17
+ "Each demo below runs a real Hugging Face pipeline using open-source models."
18
+ )
19
+
20
+ def run_app(path):
21
+ module = importlib.import_module(path)
22
+ # When imported directly, Streamlit reruns script blocks,
23
+ # so just tell the user to open individual apps if local.
24
+ st.markdown(
25
+ f"Launching **{demo}**… please run `streamlit run {path.replace('.', '/')}/app.py` "
26
+ "in a separate terminal if you’re offline."
27
+ )
28
+
29
+ if "FinanceGPT" in demo:
30
+ st.header("💰 FinanceGPT")
31
+ st.write("Financial Q&A assistant trained on SEC-style filings.")
32
+ run_app("financegpt")
33
+ elif "LegalDoc" in demo:
34
+ st.header("⚖️ LegalDoc Summarizer")
35
+ st.write("Clause-level summarization of legal documents.")
36
+ run_app("legaldoc_summarizer")
37
+ else:
38
+ st.header("🛍️ RetailGPT Evaluator")
39
+ st.write("Benchmark and chat with multiple retail QA models.")
40
+ run_app("retailgpt_evaluator")
41
+
42
+ st.markdown("---")
43
+ st.caption("© 2025 AxionX Digital — Innovating Tomorrow")