Spaces:
Running
Running
File size: 4,976 Bytes
e80d020 0ac4fe1 4adfff6 c330994 4adfff6 c330994 4adfff6 0ac4fe1 e80d020 4adfff6 e80d020 4adfff6 e80d020 4adfff6 e80d020 4adfff6 e80d020 4adfff6 c330994 e80d020 4adfff6 e80d020 4adfff6 e80d020 4adfff6 e80d020 4adfff6 e80d020 4adfff6 e80d020 4adfff6 e80d020 4adfff6 e80d020 4adfff6 e80d020 4adfff6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>mtDNA Tool – System Overview</title>
<style>
.custom-container {
background-color: #ffffff !important;
color: #222222 !important;
font-family: Arial, sans-serif !important;
line-height: 1.6 !important;
padding: 2rem !important;
max-width: 900px !important;
margin: auto !important;
}
.custom-container h1,
.custom-container h2,
.custom-container h3,
.custom-container strong,
.custom-container b,
.custom-container p,
.custom-container li,
.custom-container ol,
.custom-container ul,
.custom-container span {
color: #222222 !important;
font-weight: normal !important;
}
.custom-container h1,
.custom-container h2 {
font-weight: bold !important;
}
.custom-container img {
max-width: 100%;
border: 1px solid #ccc;
padding: 5px;
background: #fff;
}
.custom-container code {
background: none !important;
color: #222 !important;
font-family: inherit !important;
font-size: inherit !important;
padding: 0 !important;
border-radius: 0 !important;
}
.custom-container .highlight {
background: #ffffcc;
padding: 4px 8px;
border-left: 4px solid #ffcc00;
margin: 1rem 0;
color: #333 !important;
}
</style>
</head>
<body>
<div class="custom-container">
<h1>mtDNA Location Classifier – Brief System Pipeline and Usage Guide</h1>
<p>The <strong>mtDNA Tool</strong> is a lightweight pipeline designed to help researchers extract metadata such as geographic origin, sample type (ancient/modern), and optional niche labels (e.g., ethnicity, specific location) from mtDNA GenBank accession numbers. It supports batch input and produces structured Excel summaries.</p>
<h2>System Overview Diagram</h2>
<p>The figure below shows the core execution flow—from input accession to final output.</p>
<img src="https://huggingface.co/spaces/VyLala/mtDNALocation/resolve/main/flowchart.png" alt="mtDNA Pipeline Flowchart">
<h2>Key Steps</h2>
<ol>
<li><strong>Input</strong>: One or more GenBank accession numbers are submitted (e.g., via UI, CSV, or text).</li>
<li><strong>Metadata Collection</strong>: Using <code>fetch_ncbi_metadata</code>, the pipeline retrieves metadata like country, isolate, collection date, and reference title. If available, supplementary material and full-text articles are parsed using DOI, PubMed, or Google Custom Search.</li>
<li><strong>Text Extraction & Preprocessing</strong>:
<ul>
<li>All available documents are parsed and cleaned (tables, paragraphs, overlapping sections).</li>
<li>Text is merged into two formats: a smaller <code>chunk</code> and a full <code>all_output</code>.</li>
</ul>
</li>
<li><strong>LLM-based Inference (Gemini + RAG)</strong>:
<ul>
<li>Chunks are embedded with FAISS and stored for reuse.</li>
<li>The Gemini model answers specific queries like predicted country, sample type, and any niche label requested by the user.</li>
</ul>
</li>
<li><strong>Result Structuring</strong>:
<ul>
<li>Each output includes predicted fields + explanation text (methods used, quotes, sources).</li>
<li>Summarized and saved using <code>save_to_excel</code>.</li>
</ul>
</li>
</ol>
<h2>Output Format</h2>
<p>The final output is an Excel file with the following fields:</p>
<ul>
<li><code>Sample ID</code></li>
<li><code>Predicted Country</code> and <code>Country Explanation</code></li>
<li><code>Predicted Sample Type</code> and <code>Sample Type Explanation</code></li>
<li><code>Sources</code> (links to articles)</li>
<li><code>Time Cost</code></li>
</ul>
<h2>System Highlights</h2>
<ul>
<li>RAG + Gemini integration for improved explanation and transparency</li>
<li>Excel export for structured research use</li>
<li>Optional ethnic/location/language inference using isolate names</li>
<li>Quality check (e.g., fallback on short explanations, low token count)</li>
<li>Report Button – After results are displayed, users can submit errors or mismatches using the report text box below the output table</li>
</ul>
<h2>Citation</h2>
<div class="highlight">
Phung, V. (2025). mtDNA Location Classifier. HuggingFace Spaces. https://huggingface.co/spaces/VyLala/mtDNALocation
</div>
<h2>Contact</h2>
<p>If you are a researcher working with historical mtDNA data or edge-case accessions and need scalable inference or logging, reach out through the HuggingFace space or email provided in the repo README.</p>
</div>
</body>
</html>
|