Spaces:

atz21
/

leadib

Sleeping

App Files Files Community

atz21 commited on Sep 16, 2025

Commit

1091f22

verified ·

1 Parent(s): 37398bc

Update app.py

Browse files

Files changed (1) hide show

app.py +57 -125

app.py CHANGED Viewed

@@ -1,5 +1,4 @@
 import os
-import re
 import gradio as gr
 import google.generativeai as genai
 from markdown_pdf import MarkdownPdf, Section
@@ -22,65 +21,59 @@ Your objective is to align three sources per question/sub-question:
 ## Question X [and sub-question if applicable, e.g., ### (b)(ii)]
 *QP:* [Exact question text or [Not found]]
 *MS:* [Relevant markscheme section or [Not found]]
-*AS:* [Final cleaned student answer; use fenced code for mathematics; insert [illegible] or [No response] as required]
 ---
 3. Formatting requirements:
 - Use '##' for main questions, '###' for sub-questions.
-- Maintain section order: QP | MS | AS (always in that sequence).
 - Enclose all mathematical expressions in Markdown fenced code blocks (``` triple backticks).
-- If a diagram/graph is omitted, write [Graph omitted] in its place.
-- For unreadable portions of the student's answer, insert [illegible]; if the answer is wholly unreadable, set AS to [illegible].
-- If a question is skipped or unanswered, AS must be exactly [No response].
 - Keep MS annotations (e.g., M1, A1, R1) verbatim.
-- Diagrams/graphs are not to be recreated.
-- If any QP, MS, or AS content is missing, specify [Not found] for that section.
-- Ensure consistency and determinism in formatting so subsequent models can grade directly from this aligned format.
-- List all main questions and sub-questions in their original order, clearly denoting sub-questions (e.g., '### (b)(i)', '### (b)(ii)').
-After each alignment action, briefly validate that the content for QP, MS, and AS matches expectations and alignments are correct. If validation fails, self-correct or flag the issue.
 ## Example
 ---
 ## Question 1
-*QP:* Expand (1+x)^3
 *MS:* M1 for binomial expansion, A1 for coefficients, A1 for final form
-*AS:* x^3 + 3x^2 + 3x + 1
 ---
-## Output Format
-Generate a single Markdown document. For each (sub-)question, output a structured block exactly in the prescribed format.
 """
     },
-  "GRADING_PROMPT": {
-    "role": "system",
-    "content": """Developer: You are an official examiner. Apply the following grading rules precisely.
 ## Grading Checklist
 - Assess each question part against the provided markscheme.
-- Award marks for correct methods (M), accurate answers (A), and clear reasoning (R) as specified.
-- Use Follow Through (FT) for correctly applied subsequent working that uses a previously incorrect answer.
-- Clearly indicate any lost marks and provide a concise reason for the deduction.
-- Always state both:
-  1. **What was wrong** (explain the error).
-  2. **What is right** (give the correct value, method, or reasoning from the markscheme).
-- Check for alternative valid methods and award marks accordingly.
-- Summarize the total marks and classify the types of errors made by the student.
-- Ensure the final output adheres to the specified Markdown table format.
 ### Abbreviations:
-- **M**: Marks for demonstrating a correct Method.
-- **A**: Marks for providing an accurate Answer.
-- **R**: Marks for clear Reasoning.
-- **AG**: Answer is given in the question—no marks awarded.
-- **FT**: Follow Through; award marks when candidates continue with their own previous (possibly incorrect) answers.
 ---
 ## Grading Instructions
 1. Award marks using official annotations (M1, A1, etc.).
 2. A marks generally require valid M marks.
-3. Allow FT unless results are nonsensical.
 4. Accept valid alternative forms.
-5. Apply accuracy requirements.
 6. Ignore crossed-out work unless requested otherwise.
 7. Mark only the first full solution unless otherwise indicated.
 8. Assume graphs/diagrams are correct if required.
@@ -94,21 +87,17 @@ Produce a GitHub-flavored Markdown table:
 Rules:
 - Each row matches a markable step.
-- For blanks, write “(no answer)” and indicate the lost mark(s).
-- Lost marks: wrap in red with `<span style="color:red">A0</span>` (or M0, R0) and make Reason column red.
 - Awarded marks remain plain text.
-- For partial awards (e.g., M1A0A1), only highlight lost marks.
-- **When a mark is lost, the Reason column must explain both the error AND the correct answer or method from the markscheme.**
-**New Rule (Per-Question Total):**
-- After each question (including all subparts), show marks obtained vs total in square brackets, e.g.:
-  `[2/4]`
 ---
 ### Examiner’s Report
-At the end of the grading, provide a summary report in this format:
-Use codes:
 - A : All Good
 - B : Silly Mistake
 - C : Conceptual Error
@@ -120,18 +109,19 @@ Use codes:
 | 1               | 6/9   | C      |
 | 2               | 7/7   | A      |
 | 3               | 8/14  | D      |
-| ...             | ...   | ...    |
-At the end, display the grand total like:
 `Total: 40/61`
-Optionally, if reasons are available, extend with:
 | Question Number | Marks | Remark | Reason |
 |-----------------|-------|--------|--------|
-⚠️ Do NOT add any "Validation" or meta commentary. End the output after the Examiner’s Report.
 """
 }
 # -------------------- CONFIG --------------------
@@ -146,13 +136,12 @@ def save_as_pdf(text, filename="output.pdf"):
 # ---------- HELPER: Compress PDF ----------
 def compress_pdf(input_path, output_path=None, max_size=20*1024*1024):
-    """Compress PDF using Ghostscript if larger than max_size (default 20MB)."""
     if output_path is None:
         base, ext = os.path.splitext(input_path)
         output_path = f"{base}_compressed{ext}"
     if os.path.getsize(input_path) <= max_size:
-        return input_path  # No compression needed
     try:
         gs_cmd = [
@@ -164,8 +153,10 @@ def compress_pdf(input_path, output_path=None, max_size=20*1024*1024):
         ]
         subprocess.run(gs_cmd, check=True)
         if os.path.getsize(output_path) <= max_size:
             return output_path
         else:
             return input_path
     except Exception as e:
         print(f"⚠️ Compression error: {e}")
@@ -174,114 +165,55 @@ def compress_pdf(input_path, output_path=None, max_size=20*1024*1024):
 # ---------- HELPER: Create Model with Fallback ----------
 def create_model():
     try:
         return genai.GenerativeModel("gemini-2.5-pro", generation_config={"temperature": 0})
     except Exception:
         return genai.GenerativeModel("gemini-2.5-flash", generation_config={"temperature": 0})
-# ---------- NEW: Pretty math conversion ----------
-_SUPER_MAP = {
-    "0": "⁰", "1": "¹", "2": "²", "3": "³", "4": "⁴",
-    "5": "⁵", "6": "⁶", "7": "⁷", "8": "⁸", "9": "⁹",
-    "+": "⁺", "-": "⁻", "(": "⁽", ")": "⁾",
-    "n": "ⁿ", "i": "ⁱ", "a": "ᵃ", "b": "ᵇ", "c": "ᶜ", "d": "ᵈ", "e": "ᵉ",
-    "o": "ᵒ", "r": "ʳ", "t": "ᵗ", "u": "ᵘ", "v": "ᵛ", "w": "ʷ", "x": "ˣ", "y": "ʸ"
-}
-def _to_superscript(s: str) -> str:
-    """Map characters to available Unicode superscripts; fallback to original char if unavailable."""
-    out = []
-    for ch in s:
-        out.append(_SUPER_MAP.get(ch, _SUPER_MAP.get(ch.lower(), ch)))
-    return "".join(out)
-def pretty_math(text: str) -> str:
-    """
-    Convert caret-notation exponents to Unicode superscripts.
-    Handles:
-      x^2, x^{12}, x^(12), 10^4, (3x10^4)^3, and similar patterns.
-    Only characters with known superscripts are converted (digits, +-(), some letters).
-    """
-    if not text:
-        return text
-    new = text
-    # Convert instances like ^{...}
-    new = re.sub(r'\^\{\s*([^}]+)\s*\}', lambda m: _to_superscript(m.group(1)), new)
-    # Convert instances like ^(...)
-    new = re.sub(r'\^\(\s*([^\)]+)\s*\)', lambda m: _to_superscript(m.group(1)), new)
-    # Convert caret followed by a simple integer (e.g., ^12)
-    new = re.sub(r'\^([+-]?\d+)', lambda m: _to_superscript(m.group(1)), new)
-    # Convert caret followed by single non-space token (e.g., x^n)
-    new = re.sub(r'\^([A-Za-z0-9\+\-\(\)]+)', lambda m: _to_superscript(m.group(1)), new)
-    # Replace common scientific notation '3x10^4' -> '3×10⁴' when 'x' is used as multiplication
-    # Only apply when the 'x' sits between digits and '10' (heuristic)
-    new = re.sub(r'(\d)\s*[xX]\s*(10)', r'\1×\2', new)
-    # Also compact spaced forms: '3 x 10^4' -> '3×10⁴' (keeps previously superscripted exponent)
-    new = re.sub(r'(\d)\s*×\s*(10)', r'\1×\2', new)
-    return new
-# -------------------- PIPELINE: ALIGN + GRADE --------------------
 def align_and_grade(qp_file, ms_file, ans_file):
     try:
-        # Step 0: Compress if needed
         qp_file = compress_pdf(qp_file, "qp_compressed.pdf")
         ms_file = compress_pdf(ms_file, "ms_compressed.pdf")
         ans_file = compress_pdf(ans_file, "ans_compressed.pdf")
-        # Step 1: Uploads
         qp_uploaded = genai.upload_file(path=qp_file, display_name="Question Paper")
         ms_uploaded = genai.upload_file(path=ms_file, display_name="Markscheme")
         ans_uploaded = genai.upload_file(path=ans_file, display_name="Answer Sheet")
         model = create_model()
-        # Step 2: Alignment (raw)
         resp = model.generate_content([
             PROMPTS["ALIGNMENT_PROMPT"]["content"],
             qp_uploaded,
             ms_uploaded,
             ans_uploaded
         ])
-        aligned_text_raw = getattr(resp, "text", None)
-        if not aligned_text_raw and getattr(resp, "candidates", None):
-            aligned_text_raw = resp.candidates[0].content.parts[0].text
-        # Pretty version for display/PDF (does NOT affect the raw text used for grading)
-        aligned_text_pretty = pretty_math(aligned_text_raw) if aligned_text_raw else aligned_text_raw
-        aligned_pdf_path = save_as_pdf(aligned_text_pretty or "[No aligned text produced]", "aligned_qp_ms_as.pdf")
-        # Step 3: Grading (use raw aligned text as input to grader)
         response = model.generate_content([
             PROMPTS["GRADING_PROMPT"]["content"],
-            aligned_text_raw or "[No aligned text produced]"
         ])
-        grading_raw = getattr(response, "text", None)
-        if not grading_raw and getattr(response, "candidates", None):
-            grading_raw = response.candidates[0].content.parts[0].text
-        # Pretty version of grading output for display/PDF
-        grading_pretty = pretty_math(grading_raw) if grading_raw else grading_raw
         base_name = os.path.splitext(os.path.basename(ans_file))[0]
-        grading_pdf_path = save_as_pdf(grading_pretty or "[No grading produced]", f"{base_name}_graded.pdf")
-        # Return pretty/display versions (raws remain unmodified for internal processing)
-        return aligned_text_pretty or "", aligned_pdf_path, grading_pretty or "", grading_pdf_path
     except Exception as e:
         return f"❌ Error: {e}", None, None, None
 # ---------- GRADIO APP ----------
 with gr.Blocks(title="LeadIB AI Grading (Alignment + Auto-Grading)") as demo:
-    gr.Markdown("## LeadIB AI Grading\nUpload Question Paper, Markscheme, and Student Answer Sheet.\nThe system will align and grade automatically. Mathematical exponents (e.g. `x^2`) will be converted to Unicode superscripts (e.g. `x²`) in the displayed text and PDFs.")
     with gr.Row():
         qp_file = gr.File(label="Upload Question Paper (PDF)", type="filepath")

 import os
 import gradio as gr
 import google.generativeai as genai
 from markdown_pdf import MarkdownPdf, Section
 ## Question X [and sub-question if applicable, e.g., ### (b)(ii)]
 *QP:* [Exact question text or [Not found]]
 *MS:* [Relevant markscheme section or [Not found]]
+*AS:* [Final cleaned student answer; use fenced code for mathematics with superscripts/subscripts; insert [illegible] or [No response] as required]
 ---
 3. Formatting requirements:
 - Use '##' for main questions, '###' for sub-questions.
+- Maintain section order: QP | MS | AS.
 - Enclose all mathematical expressions in Markdown fenced code blocks (``` triple backticks).
+- Use proper superscripts/subscripts (x² not x^2, H₂O not H2O).
+- If a diagram/graph is omitted, write [Graph omitted].
+- For unreadable portions, insert [illegible].
+- If a question is skipped or unanswered, AS must be [No response].
 - Keep MS annotations (e.g., M1, A1, R1) verbatim.
+- Do not recreate diagrams/graphs.
+- If any QP, MS, or AS content is missing, specify [Not found].
+- List all main questions and sub-questions in original order.
+- After each alignment action, validate that QP, MS, and AS match expectations; if not, self-correct.
 ## Example
 ---
 ## Question 1
+*QP:* Expand (1+x)³
 *MS:* M1 for binomial expansion, A1 for coefficients, A1 for final form
+*AS:* x³ + 3x² + 3x + 1
 ---
 """
     },
+    "GRADING_PROMPT": {
+        "role": "system",
+        "content": """Developer: You are an official examiner. Apply the following grading rules precisely.
 ## Grading Checklist
 - Assess each question part against the provided markscheme.
+- Award marks for correct methods (M), accurate answers (A), and clear reasoning (R).
+- Use Follow Through (FT) for correctly applied subsequent working using a previous error.
+- Always state BOTH:
+  1. What was wrong (the error).
+  2. What is right (the correct method/answer from markscheme).
+- Summarize total marks and classify error types.
+- End with an Examiner’s Report table.
 ### Abbreviations:
+- **M**: Method
+- **A**: Accuracy/Answer
+- **R**: Reasoning
+- **AG**: Answer given (no marks)
+- **FT**: Follow Through
 ---
 ## Grading Instructions
 1. Award marks using official annotations (M1, A1, etc.).
 2. A marks generally require valid M marks.
+3. Allow FT unless result is nonsensical.
 4. Accept valid alternative forms.
+5. Apply accuracy requirements (default 3 s.f. if not stated).
 6. Ignore crossed-out work unless requested otherwise.
 7. Mark only the first full solution unless otherwise indicated.
 8. Assume graphs/diagrams are correct if required.
 Rules:
 - Each row matches a markable step.
+- For blanks, write “(no answer)” and indicate lost mark(s).
+- Lost marks: wrap in red with `<span style="color:red">A0</span>` (or M0, R0) and make Reason column red. Always also show the correct method/answer.
 - Awarded marks remain plain text.
+- For partial awards (M1A0A1), highlight only lost marks.
+- After each question, show total in square brackets: `[2/4]`.
 ---
 ### Examiner’s Report
+At the very end, provide a summary table:
+Codes:
 - A : All Good
 - B : Silly Mistake
 - C : Conceptual Error
 | 1               | 6/9   | C      |
 | 2               | 7/7   | A      |
 | 3               | 8/14  | D      |
+| …               | …     | …      |
+Then show total clearly:
 `Total: 40/61`
+Optionally, if reasons are available, extend with:
 | Question Number | Marks | Remark | Reason |
 |-----------------|-------|--------|--------|
+⚠️ Do NOT add any "Validation" or meta commentary. End the output after Examiner’s Report.
 """
+    }
 }
 # -------------------- CONFIG --------------------
 # ---------- HELPER: Compress PDF ----------
 def compress_pdf(input_path, output_path=None, max_size=20*1024*1024):
     if output_path is None:
         base, ext = os.path.splitext(input_path)
         output_path = f"{base}_compressed{ext}"
     if os.path.getsize(input_path) <= max_size:
+        return input_path
     try:
         gs_cmd = [
         ]
         subprocess.run(gs_cmd, check=True)
         if os.path.getsize(output_path) <= max_size:
+            print(f"✅ Compressed {input_path} → {output_path}")
             return output_path
         else:
+            print(f"⚠️ Compression failed to reduce below {max_size/1024/1024} MB")
             return input_path
     except Exception as e:
         print(f"⚠️ Compression error: {e}")
 # ---------- HELPER: Create Model with Fallback ----------
 def create_model():
     try:
+        print("⚡ Using gemini-2.5-pro model")
         return genai.GenerativeModel("gemini-2.5-pro", generation_config={"temperature": 0})
     except Exception:
+        print("⚡ Falling back to gemini-2.5-flash model")
         return genai.GenerativeModel("gemini-2.5-flash", generation_config={"temperature": 0})
+# ---------- PIPELINE: ALIGN + GRADE ----------
 def align_and_grade(qp_file, ms_file, ans_file):
     try:
         qp_file = compress_pdf(qp_file, "qp_compressed.pdf")
         ms_file = compress_pdf(ms_file, "ms_compressed.pdf")
         ans_file = compress_pdf(ans_file, "ans_compressed.pdf")
         qp_uploaded = genai.upload_file(path=qp_file, display_name="Question Paper")
         ms_uploaded = genai.upload_file(path=ms_file, display_name="Markscheme")
         ans_uploaded = genai.upload_file(path=ans_file, display_name="Answer Sheet")
         model = create_model()
         resp = model.generate_content([
             PROMPTS["ALIGNMENT_PROMPT"]["content"],
             qp_uploaded,
             ms_uploaded,
             ans_uploaded
         ])
+        aligned_text = getattr(resp, "text", None)
+        if not aligned_text and resp.candidates:
+            aligned_text = resp.candidates[0].content.parts[0].text
+        aligned_pdf_path = save_as_pdf(aligned_text, "aligned_qp_ms_as.pdf")
         response = model.generate_content([
             PROMPTS["GRADING_PROMPT"]["content"],
+            aligned_text
         ])
+        grading = getattr(response, "text", None)
+        if not grading and response.candidates:
+            grading = response.candidates[0].content.parts[0].text
         base_name = os.path.splitext(os.path.basename(ans_file))[0]
+        grading_pdf_path = save_as_pdf(grading, f"{base_name}_graded.pdf")
+        return aligned_text, aligned_pdf_path, grading, grading_pdf_path
     except Exception as e:
         return f"❌ Error: {e}", None, None, None
 # ---------- GRADIO APP ----------
 with gr.Blocks(title="LeadIB AI Grading (Alignment + Auto-Grading)") as demo:
+    gr.Markdown("## LeadIB AI Grading\nUpload Question Paper, Markscheme, and Student Answer Sheet.\nThe system will align and grade automatically.")
     with gr.Row():
         qp_file = gr.File(label="Upload Question Paper (PDF)", type="filepath")