Analytics_for_Managers / Analytics_Modeling_Sandbox_User_Guide.html
Ashish-K's picture
Create webiste
5585962 verified
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Analytics Modeling Sandbox - User Guide</title>
<style>
:root {
--primary-color: #276749;
--primary-light: #38a169;
--secondary-color: #2c5282;
--accent-color: #ed8936;
--warning-color: #c53030;
--background-color: #f7fafc;
--text-color: #2d3748;
--text-light: #718096;
--border-color: #e2e8f0;
--card-bg: #ffffff;
}
* {
box-sizing: border-box;
}
body {
font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
line-height: 1.7;
color: var(--text-color);
max-width: 900px;
margin: 0 auto;
padding: 20px 40px;
background-color: var(--background-color);
}
h1 {
color: var(--primary-color);
border-bottom: 3px solid var(--primary-light);
padding-bottom: 15px;
margin-top: 40px;
}
h2 {
color: var(--primary-color);
border-bottom: 2px solid var(--border-color);
padding-bottom: 10px;
margin-top: 50px;
}
h3 {
color: var(--primary-light);
margin-top: 30px;
}
.header-section {
text-align: center;
padding: 40px 0;
border-bottom: 2px solid var(--border-color);
margin-bottom: 40px;
background: linear-gradient(135deg, var(--primary-color) 0%, var(--primary-light) 100%);
margin: -20px -40px 40px -40px;
padding: 60px 40px;
color: white;
}
.header-section h1 {
border: none;
margin: 0;
font-size: 2.5em;
color: white;
}
.subtitle {
color: rgba(255,255,255,0.9);
font-size: 1.2em;
margin-top: 10px;
}
table {
width: 100%;
border-collapse: collapse;
margin: 20px 0;
background: white;
box-shadow: 0 1px 3px rgba(0,0,0,0.1);
}
th, td {
padding: 12px 15px;
text-align: left;
border: 1px solid var(--border-color);
}
th {
background-color: var(--primary-color);
color: white;
font-weight: 600;
}
tr:nth-child(even) {
background-color: #f8f9fa;
}
blockquote {
border-left: 4px solid var(--primary-light);
margin: 25px 0;
padding: 15px 25px;
background-color: #f0fff4;
font-style: italic;
}
.warning-box {
background-color: #fff5f5;
border: 1px solid #fc8181;
border-left: 4px solid var(--warning-color);
border-radius: 5px;
padding: 20px;
margin: 25px 0;
}
.warning-box h4 {
color: var(--warning-color);
margin-top: 0;
}
.info-box {
background-color: #ebf8ff;
border: 1px solid #90cdf4;
border-left: 4px solid var(--secondary-color);
border-radius: 5px;
padding: 20px;
margin: 25px 0;
}
.step-box {
background: white;
border: 1px solid var(--border-color);
border-radius: 10px;
padding: 25px;
margin: 20px 0;
box-shadow: 0 2px 4px rgba(0,0,0,0.05);
border-left: 5px solid var(--primary-light);
}
.step-box h3 {
margin-top: 0;
display: flex;
align-items: center;
}
.step-number {
display: inline-flex;
align-items: center;
justify-content: center;
width: 35px;
height: 35px;
background-color: var(--primary-light);
color: white;
border-radius: 50%;
font-weight: bold;
margin-right: 12px;
flex-shrink: 0;
}
.output-section {
background: #f8f9fa;
border-radius: 8px;
padding: 20px;
margin: 20px 0;
}
.output-section h4 {
color: var(--primary-color);
margin-top: 0;
}
.trap-warning {
background: #fffaf0;
border-left: 4px solid var(--accent-color);
padding: 15px 20px;
margin: 15px 0;
border-radius: 0 8px 8px 0;
font-size: 0.95em;
}
.trap-warning strong {
color: var(--accent-color);
}
.two-column {
display: grid;
grid-template-columns: 1fr 1fr;
gap: 20px;
margin: 25px 0;
}
.column {
background: white;
padding: 20px;
border-radius: 8px;
border: 1px solid var(--border-color);
}
.do-column {
border-left: 4px solid var(--primary-light);
}
.dont-column {
border-left: 4px solid var(--warning-color);
}
.column h4 {
margin-top: 0;
}
.do-column h4 {
color: var(--primary-light);
}
.dont-column h4 {
color: var(--warning-color);
}
.faq-item {
background: white;
border: 1px solid var(--border-color);
border-radius: 8px;
padding: 20px;
margin: 15px 0;
}
.faq-item h4 {
color: var(--secondary-color);
margin-top: 0;
margin-bottom: 10px;
}
.checklist-table td:first-child {
width: 30%;
font-weight: 600;
color: var(--primary-color);
}
.comparison-table th:nth-child(1) {
background-color: var(--primary-color);
}
.comparison-table th:nth-child(2) {
background-color: var(--secondary-color);
}
.final-reminder {
background: linear-gradient(135deg, var(--primary-color), var(--primary-light));
color: white;
padding: 30px;
border-radius: 10px;
margin: 40px 0;
text-align: center;
}
.final-reminder blockquote {
background: rgba(255,255,255,0.15);
border-left-color: white;
color: white;
}
hr {
border: none;
border-top: 1px solid var(--border-color);
margin: 40px 0;
}
footer {
text-align: center;
padding: 30px;
color: #666;
border-top: 1px solid var(--border-color);
margin-top: 50px;
}
code {
background-color: #edf2f7;
padding: 2px 6px;
border-radius: 4px;
font-family: 'Consolas', 'Monaco', monospace;
font-size: 0.9em;
}
ul, ol {
margin: 15px 0;
padding-left: 25px;
}
li {
margin: 8px 0;
}
@media (max-width: 768px) {
body {
padding: 15px 20px;
}
.header-section {
margin: -15px -20px 30px -20px;
padding: 40px 20px;
}
.two-column {
grid-template-columns: 1fr;
}
table {
font-size: 0.9em;
}
th, td {
padding: 8px 10px;
}
}
</style>
</head>
<body>
<div class="header-section">
<h1>Analytics Modeling Sandbox</h1>
<p class="subtitle">User Guide</p>
</div>
<h2>What Is the Analytics Modeling Sandbox?</h2>
<p>The Analytics Modeling Sandbox is a practical analytics tool designed for users who have learned analytical concepts from the <em>Analytics for Managers</em> book and want to apply those techniques to their own data.</p>
<p>Unlike the Analytics Reasoning Companion (which focuses on developing reasoning skills using curated datasets), the Sandbox is built for <strong>doing real analysis</strong> — running regression, classification, and clustering on data you provide.</p>
<h3>What It Does</h3>
<ul>
<li><strong>Executes analyses</strong> on your uploaded data (CSV, Excel)</li>
<li><strong>Shows code</strong> so you can see exactly what's being done</li>
<li><strong>Produces outputs</strong> including coefficients, metrics, and visualizations</li>
<li><strong>Provides interpretation guidance</strong> to prevent common analytical mistakes</li>
<li><strong>Warns about traps</strong> like accuracy illusions, threshold fallacies, and omitted variable bias</li>
</ul>
<h3>What It Does NOT Do</h3>
<ul>
<li><strong>Make decisions for you</strong> — it provides evidence, you decide</li>
<li><strong>Certify models as "good"</strong> — it shows you results, not approval stamps</li>
<li><strong>Establish causation</strong> — all findings are associations unless you have experimental data</li>
<li><strong>Store your data</strong> — nothing is retained between sessions</li>
<li><strong>Replace professional judgment</strong> — this is an educational tool, not professional services</li>
</ul>
<hr>
<h2>Important Notices</h2>
<div class="warning-box">
<h4>Data Privacy</h4>
<p><strong>You are responsible for ensuring you have proper authorization to analyze the data you upload.</strong></p>
<p>Do not upload:</p>
<ul>
<li>Personally identifiable information (PII) without consent</li>
<li>Protected health information (PHI)</li>
<li>Confidential business data you're not authorized to share</li>
<li>Data subject to regulatory restrictions (GDPR, HIPAA, etc.)</li>
</ul>
<p>The Sandbox does not store your data between sessions, but you remain responsible for compliance with applicable privacy laws and organizational policies.</p>
</div>
<div class="info-box">
<h4>Disclaimer</h4>
<p>The Analytics Modeling Sandbox provides analytical assistance for educational purposes. Outputs are statistical estimates based on the data you provide. They do not constitute predictions, guarantees, or professional advice.</p>
<p>All findings describe patterns and associations. They do not establish causal relationships unless derived from controlled experiments.</p>
<p>Consult qualified professionals before making significant business, financial, legal, or operational decisions based on these results.</p>
</div>
<hr>
<h2>Getting Started</h2>
<h3>Step 1: Access the Sandbox</h3>
<p>Visit the Sandbox at: <strong>[Link to be provided]</strong></p>
<h3>Step 2: Prepare Your Data</h3>
<p>Before uploading, ensure your data:</p>
<ul>
<li>Is in CSV or Excel format</li>
<li>Is under 5MB (recommended)</li>
<li>Has clear column headers</li>
<li>Has a defined outcome variable (for regression/classification)</li>
</ul>
<h3>Step 3: Upload and Describe</h3>
<p>When you upload your file, tell the Sandbox:</p>
<ul>
<li>What decision this analysis will inform</li>
<li>Which column is your outcome variable</li>
<li>What type of analysis you want (regression, classification, or clustering)</li>
</ul>
<hr>
<h2>The 7-Step Workflow</h2>
<p>The Sandbox suggests a structured workflow but allows you to skip steps if needed. Skipping steps increases interpretation risk — the Sandbox will warn you but won't block you.</p>
<div class="step-box">
<h3><span class="step-number">1</span> Business Context</h3>
<p><strong>Purpose:</strong> Establish what decision this analysis informs.</p>
<p><strong>What happens:</strong> The Sandbox asks about your goals before diving into data.</p>
<p><strong>Why it matters:</strong> Analysis without context produces technically correct but practically useless results.</p>
<p><strong>If you skip:</strong> <em>"Proceeding without clear goals increases interpretation risk."</em></p>
</div>
<div class="step-box">
<h3><span class="step-number">2</span> Data Overview</h3>
<p><strong>Purpose:</strong> Understand what you're working with before modeling.</p>
<p><strong>What happens:</strong> The Sandbox shows dataset shape, column types, missing value summary, and basic distributions.</p>
<p><strong>Key question:</strong> <em>"Who might be excluded from this dataset? Could they differ systematically?"</em></p>
</div>
<div class="step-box">
<h3><span class="step-number">3</span> Data Preparation</h3>
<p><strong>Purpose:</strong> Handle missing values, encode categories, scale features.</p>
<p><strong>What happens:</strong> The Sandbox shows what preparation steps are applied, why, and the trade-offs involved.</p>
<p><strong>Transparency:</strong> You'll see the code so you know exactly what's being done.</p>
</div>
<div class="step-box">
<h3><span class="step-number">4</span> Analysis</h3>
<p><strong>Purpose:</strong> Run the model.</p>
<p><strong>What happens:</strong> The Sandbox executes regression, classification, or clustering using standard sklearn libraries.</p>
<p><strong>Defaults shown explicitly:</strong></p>
<ul>
<li>Train/test split: 70/30</li>
<li>Random state: 42</li>
<li>Classification threshold: 0.5 (with alternatives shown)</li>
<li>Clustering: K values 3-6 tested</li>
</ul>
</div>
<div class="step-box">
<h3><span class="step-number">5</span> Results</h3>
<p><strong>Purpose:</strong> Present outputs with context.</p>
<p><strong>For Regression:</strong> Coefficients, R-squared, MAE, RMSE, residual plots</p>
<p><strong>For Classification:</strong> Confusion matrix, Precision/Recall/F1/AUC, threshold table</p>
<p><strong>For Clustering:</strong> Cluster sizes, feature means, silhouette scores, elbow plot</p>
<p>Interpretation notes are embedded with each output.</p>
</div>
<div class="step-box">
<h3><span class="step-number">6</span> Interpretation Check</h3>
<p><strong>Purpose:</strong> Ensure you're not over-interpreting.</p>
<p><strong>What happens:</strong> The Sandbox prompts:</p>
<ul>
<li>"What assumptions must hold for these results to be actionable?"</li>
<li>"What could mislead us here?"</li>
<li>"Who might be missing from this data?"</li>
</ul>
</div>
<div class="step-box">
<h3><span class="step-number">7</span> Limitations & Next Steps</h3>
<p><strong>Purpose:</strong> Acknowledge what the analysis cannot tell you.</p>
<p><strong>What happens:</strong> The Sandbox helps you articulate what remains uncertain, what additional data would help, and what tests would increase confidence.</p>
</div>
<hr>
<h2>Understanding Your Outputs</h2>
<div class="output-section">
<h4>Regression Outputs</h4>
<p><strong>Coefficients Table:</strong></p>
<table>
<tr><th>Feature</th><th>Coefficient</th></tr>
<tr><td>Feature_A</td><td>2.34</td></tr>
<tr><td>Feature_B</td><td>-1.56</td></tr>
<tr><td>Feature_C</td><td>0.89</td></tr>
</table>
<p><strong>How to read:</strong> A coefficient of 2.34 means: among otherwise similar cases in your data, a one-unit increase in Feature_A is associated with a 2.34-unit increase in the outcome, on average.</p>
<div class="trap-warning">
<strong>Caution:</strong> This is an association, not a causal effect. Unobserved factors might influence both the feature and the outcome.
</div>
<p><strong>Metrics:</strong></p>
<ul>
<li><strong>R-squared:</strong> Proportion of variance explained (0-1). Higher isn't always better.</li>
<li><strong>MAE:</strong> Average prediction error in outcome units.</li>
<li><strong>RMSE:</strong> Like MAE but penalizes large errors more.</li>
</ul>
</div>
<div class="output-section">
<h4>Classification Outputs</h4>
<p><strong>Confusion Matrix:</strong></p>
<table>
<tr><th></th><th>Predicted: No</th><th>Predicted: Yes</th></tr>
<tr><td><strong>Actual: No</strong></td><td>True Negative</td><td>False Positive</td></tr>
<tr><td><strong>Actual: Yes</strong></td><td>False Negative</td><td>True Positive</td></tr>
</table>
<p><strong>Metrics:</strong></p>
<ul>
<li><strong>Accuracy:</strong> Can be misleading with imbalanced classes</li>
<li><strong>Precision:</strong> Of those predicted positive, how many are correct?</li>
<li><strong>Recall:</strong> Of actual positives, how many did we catch?</li>
<li><strong>ROC AUC:</strong> Model's ability to rank positives above negatives</li>
</ul>
<div class="trap-warning">
<strong>Threshold Table:</strong> Shows how precision and recall change at different thresholds. Use this to choose a threshold that matches your cost trade-offs — don't just accept 0.5.
</div>
</div>
<div class="output-section">
<h4>Clustering Outputs</h4>
<p><strong>Cluster Profiles:</strong></p>
<table>
<tr><th>Cluster</th><th>Size</th><th>Feature_A (mean)</th><th>Feature_B (mean)</th></tr>
<tr><td>0</td><td>150</td><td>2.3</td><td>-0.5</td></tr>
<tr><td>1</td><td>200</td><td>-1.1</td><td>0.8</td></tr>
<tr><td>2</td><td>100</td><td>0.5</td><td>1.2</td></tr>
</table>
<p><strong>How to read:</strong> Each row shows average feature values for cases in that cluster. Use these to develop descriptive labels.</p>
<div class="trap-warning">
<strong>Caution:</strong> Clusters are analytical groupings, not inherent types. Different features or scaling would produce different segments.
</div>
</div>
<hr>
<h2>Embedded Trap Warnings</h2>
<p>The Sandbox automatically includes warnings after outputs to prevent common mistakes.</p>
<div class="trap-warning">
<strong>After Regression:</strong> "Coefficients describe associations, not causal effects. Consider what unobserved factors might influence both predictor and outcome. Large effects may be driven by outliers—check residual plots."
</div>
<div class="trap-warning">
<strong>After Classification:</strong> "Accuracy can mislead with imbalanced classes. Check: what would accuracy be predicting the majority class always? The 0.5 threshold is arbitrary—consider the relative costs of false positives vs. false negatives."
</div>
<div class="trap-warning">
<strong>After Clustering:</strong> "Clusters depend on feature selection and scaling. Different choices produce different segments. These are analytical groupings, not fixed types—validate stability before building strategy."
</div>
<div class="trap-warning">
<strong>For All Analyses:</strong> "Selection Bias Check: Who might be missing from this data? Could excluded cases differ systematically from those included?"
</div>
<hr>
<h2>Tips for Effective Use</h2>
<div class="two-column">
<div class="column do-column">
<h4>Do:</h4>
<ol>
<li><strong>Start with clear goals.</strong> Know what decision the analysis will inform.</li>
<li><strong>Review the data summary.</strong> Check for issues before modeling.</li>
<li><strong>Examine the code.</strong> Understanding what's done helps interpretation.</li>
<li><strong>Use the threshold table</strong> (classification). Choose based on your costs.</li>
<li><strong>Check cluster stability</strong> (clustering). Be cautious if results vary.</li>
<li><strong>Read the interpretation notes.</strong> They prevent common mistakes.</li>
<li><strong>Acknowledge limitations.</strong> Stating them is a sign of rigor.</li>
</ol>
</div>
<div class="column dont-column">
<h4>Don't:</h4>
<ol>
<li><strong>Don't upload sensitive data</strong> without authorization.</li>
<li><strong>Don't skip business context.</strong> Analysis without purpose is just math.</li>
<li><strong>Don't treat coefficients as causal.</strong> Association ≠ causation.</li>
<li><strong>Don't celebrate accuracy alone.</strong> Check against the naive baseline.</li>
<li><strong>Don't reify clusters.</strong> They're groupings, not fixed types.</li>
<li><strong>Don't ignore who's missing.</strong> Selection bias can invalidate analysis.</li>
</ol>
</div>
</div>
<hr>
<h2>When to Use the Reasoning Companion Instead</h2>
<p>The Sandbox is for <strong>doing analysis</strong>. The Reasoning Companion is for <strong>developing judgment</strong>.</p>
<table class="comparison-table">
<thead>
<tr>
<th>Use the Sandbox when...</th>
<th>Use the Reasoning Companion when...</th>
</tr>
</thead>
<tbody>
<tr>
<td>You have your own data to analyze</td>
<td>You're learning concepts from the book</td>
</tr>
<tr>
<td>You need actual outputs and code</td>
<td>You want structured reasoning practice</td>
</tr>
<tr>
<td>You're a practitioner applying techniques</td>
<td>You're a student building fundamentals</td>
</tr>
<tr>
<td>You want efficiency with guidance</td>
<td>You want Socratic questioning</td>
</tr>
</tbody>
</table>
<p><strong>Handoff:</strong> After running analysis in the Sandbox, consider working through similar analyses in the Reasoning Companion using the book's curated datasets. The structured critique will strengthen your interpretation skills.</p>
<hr>
<h2>Frequently Asked Questions</h2>
<div class="faq-item">
<h4>Q: What file formats can I upload?</h4>
<p><strong>A:</strong> CSV and Excel files (.csv, .xlsx, .xls). Keep files under 5MB for best performance.</p>
</div>
<div class="faq-item">
<h4>Q: Does the Sandbox store my data?</h4>
<p><strong>A:</strong> No. Data is processed during your session only and is not retained afterward.</p>
</div>
<div class="faq-item">
<h4>Q: Can I run advanced models like XGBoost or neural networks?</h4>
<p><strong>A:</strong> The Sandbox defaults to interpretable models. You can request advanced models, but the Sandbox will note that complexity often reduces interpretability.</p>
</div>
<div class="faq-item">
<h4>Q: Why does the Sandbox show me code?</h4>
<p><strong>A:</strong> Transparency. Seeing the code helps you understand exactly what's being done, catch issues, and reproduce the analysis elsewhere.</p>
</div>
<div class="faq-item">
<h4>Q: The Sandbox warned me about something. Did I do something wrong?</h4>
<p><strong>A:</strong> Not necessarily. Warnings are educational — they flag potential interpretation risks. Consider them, but you decide whether to proceed.</p>
</div>
<div class="faq-item">
<h4>Q: Why doesn't the Sandbox tell me which model is "best"?</h4>
<p><strong>A:</strong> Because "best" depends on your goals, costs, and context — things the Sandbox can't know. It provides evidence; you make the judgment.</p>
</div>
<hr>
<h2>Quick Reference: Output Checklist</h2>
<p>Before acting on any Sandbox output, verify:</p>
<table class="checklist-table">
<tr><td>Business Context</td><td>Does this analysis answer the right question?</td></tr>
<tr><td>Data Quality</td><td>Were there missing values, outliers, or anomalies?</td></tr>
<tr><td>Selection Bias</td><td>Who might be excluded from this data?</td></tr>
<tr><td>Causation</td><td>Am I treating associations as causal levers?</td></tr>
<tr><td>Baseline Comparison</td><td>How does this model compare to a naive baseline?</td></tr>
<tr><td>Threshold Choice</td><td>(Classification) Is 0.5 the right threshold for my costs?</td></tr>
<tr><td>Feature Dominance</td><td>(Clustering) Which features are driving similarity?</td></tr>
<tr><td>Stability</td><td>Would results hold with different data or settings?</td></tr>
<tr><td>Limitations</td><td>What can this analysis NOT tell me?</td></tr>
</table>
<hr>
<div class="final-reminder">
<blockquote>
<p>"These results describe patterns in your data. Before acting, consider: (1) what assumptions must hold, (2) who might be excluded from this data, and (3) what additional evidence would increase confidence."</p>
</blockquote>
<p>The Sandbox gives you analytical power. <strong>Use it with discipline.</strong></p>
</div>
<footer>
<p><em>Analytics Modeling Sandbox — A companion to "Analytics for Managers"</em></p>
</footer>
</body>
</html>