Spaces:

Ashish-K
/

Analytics_for_Managers

Running

App Files Files Community

Analytics_for_Managers / Analytics_Modeling_Sandbox_User_Guide.html

Ashish-K

Create webiste

5585962 verified 3 months ago

raw

history blame contribute delete

25.8 kB

	<!DOCTYPE html>
	<html lang="en">
	<head>
	<meta charset="UTF-8">
	<meta name="viewport" content="width=device-width, initial-scale=1.0">
	<title>Analytics Modeling Sandbox - User Guide</title>
	<style>
	:root {
	--primary-color: #276749;
	--primary-light: #38a169;
	--secondary-color: #2c5282;
	--accent-color: #ed8936;
	--warning-color: #c53030;
	--background-color: #f7fafc;
	--text-color: #2d3748;
	--text-light: #718096;
	--border-color: #e2e8f0;
	--card-bg: #ffffff;
	}

	* {
	box-sizing: border-box;
	}

	body {
	font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
	line-height: 1.7;
	color: var(--text-color);
	max-width: 900px;
	margin: 0 auto;
	padding: 20px 40px;
	background-color: var(--background-color);
	}

	h1 {
	color: var(--primary-color);
	border-bottom: 3px solid var(--primary-light);
	padding-bottom: 15px;
	margin-top: 40px;
	}

	h2 {
	color: var(--primary-color);
	border-bottom: 2px solid var(--border-color);
	padding-bottom: 10px;
	margin-top: 50px;
	}

	h3 {
	color: var(--primary-light);
	margin-top: 30px;
	}

	.header-section {
	text-align: center;
	padding: 40px 0;
	border-bottom: 2px solid var(--border-color);
	margin-bottom: 40px;
	background: linear-gradient(135deg, var(--primary-color) 0%, var(--primary-light) 100%);
	margin: -20px -40px 40px -40px;
	padding: 60px 40px;
	color: white;
	}

	.header-section h1 {
	border: none;
	margin: 0;
	font-size: 2.5em;
	color: white;
	}

	.subtitle {
	color: rgba(255,255,255,0.9);
	font-size: 1.2em;
	margin-top: 10px;
	}

	table {
	width: 100%;
	border-collapse: collapse;
	margin: 20px 0;
	background: white;
	box-shadow: 0 1px 3px rgba(0,0,0,0.1);
	}

	th, td {
	padding: 12px 15px;
	text-align: left;
	border: 1px solid var(--border-color);
	}

	th {
	background-color: var(--primary-color);
	color: white;
	font-weight: 600;
	}

	tr:nth-child(even) {
	background-color: #f8f9fa;
	}

	blockquote {
	border-left: 4px solid var(--primary-light);
	margin: 25px 0;
	padding: 15px 25px;
	background-color: #f0fff4;
	font-style: italic;
	}

	.warning-box {
	background-color: #fff5f5;
	border: 1px solid #fc8181;
	border-left: 4px solid var(--warning-color);
	border-radius: 5px;
	padding: 20px;
	margin: 25px 0;
	}

	.warning-box h4 {
	color: var(--warning-color);
	margin-top: 0;
	}

	.info-box {
	background-color: #ebf8ff;
	border: 1px solid #90cdf4;
	border-left: 4px solid var(--secondary-color);
	border-radius: 5px;
	padding: 20px;
	margin: 25px 0;
	}

	.step-box {
	background: white;
	border: 1px solid var(--border-color);
	border-radius: 10px;
	padding: 25px;
	margin: 20px 0;
	box-shadow: 0 2px 4px rgba(0,0,0,0.05);
	border-left: 5px solid var(--primary-light);
	}

	.step-box h3 {
	margin-top: 0;
	display: flex;
	align-items: center;
	}

	.step-number {
	display: inline-flex;
	align-items: center;
	justify-content: center;
	width: 35px;
	height: 35px;
	background-color: var(--primary-light);
	color: white;
	border-radius: 50%;
	font-weight: bold;
	margin-right: 12px;
	flex-shrink: 0;
	}

	.output-section {
	background: #f8f9fa;
	border-radius: 8px;
	padding: 20px;
	margin: 20px 0;
	}

	.output-section h4 {
	color: var(--primary-color);
	margin-top: 0;
	}

	.trap-warning {
	background: #fffaf0;
	border-left: 4px solid var(--accent-color);
	padding: 15px 20px;
	margin: 15px 0;
	border-radius: 0 8px 8px 0;
	font-size: 0.95em;
	}

	.trap-warning strong {
	color: var(--accent-color);
	}

	.two-column {
	display: grid;
	grid-template-columns: 1fr 1fr;
	gap: 20px;
	margin: 25px 0;
	}

	.column {
	background: white;
	padding: 20px;
	border-radius: 8px;
	border: 1px solid var(--border-color);
	}

	.do-column {
	border-left: 4px solid var(--primary-light);
	}

	.dont-column {
	border-left: 4px solid var(--warning-color);
	}

	.column h4 {
	margin-top: 0;
	}

	.do-column h4 {
	color: var(--primary-light);
	}

	.dont-column h4 {
	color: var(--warning-color);
	}

	.faq-item {
	background: white;
	border: 1px solid var(--border-color);
	border-radius: 8px;
	padding: 20px;
	margin: 15px 0;
	}

	.faq-item h4 {
	color: var(--secondary-color);
	margin-top: 0;
	margin-bottom: 10px;
	}

	.checklist-table td:first-child {
	width: 30%;
	font-weight: 600;
	color: var(--primary-color);
	}

	.comparison-table th:nth-child(1) {
	background-color: var(--primary-color);
	}

	.comparison-table th:nth-child(2) {
	background-color: var(--secondary-color);
	}

	.final-reminder {
	background: linear-gradient(135deg, var(--primary-color), var(--primary-light));
	color: white;
	padding: 30px;
	border-radius: 10px;
	margin: 40px 0;
	text-align: center;
	}

	.final-reminder blockquote {
	background: rgba(255,255,255,0.15);
	border-left-color: white;
	color: white;
	}

	hr {
	border: none;
	border-top: 1px solid var(--border-color);
	margin: 40px 0;
	}

	footer {
	text-align: center;
	padding: 30px;
	color: #666;
	border-top: 1px solid var(--border-color);
	margin-top: 50px;
	}

	code {
	background-color: #edf2f7;
	padding: 2px 6px;
	border-radius: 4px;
	font-family: 'Consolas', 'Monaco', monospace;
	font-size: 0.9em;
	}

	ul, ol {
	margin: 15px 0;
	padding-left: 25px;
	}

	li {
	margin: 8px 0;
	}

	@media (max-width: 768px) {
	body {
	padding: 15px 20px;
	}

	.header-section {
	margin: -15px -20px 30px -20px;
	padding: 40px 20px;
	}

	.two-column {
	grid-template-columns: 1fr;
	}

	table {
	font-size: 0.9em;
	}

	th, td {
	padding: 8px 10px;
	}
	}
	</style>
	</head>
	<body>

	<div class="header-section">
	<h1>Analytics Modeling Sandbox</h1>
	<p class="subtitle">User Guide</p>
	</div>

	<h2>What Is the Analytics Modeling Sandbox?</h2>

	<p>The Analytics Modeling Sandbox is a practical analytics tool designed for users who have learned analytical concepts from the <em>Analytics for Managers</em> book and want to apply those techniques to their own data.</p>

	<p>Unlike the Analytics Reasoning Companion (which focuses on developing reasoning skills using curated datasets), the Sandbox is built for <strong>doing real analysis</strong> — running regression, classification, and clustering on data you provide.</p>

	<h3>What It Does</h3>
	<ul>
	<li><strong>Executes analyses</strong> on your uploaded data (CSV, Excel)</li>
	<li><strong>Shows code</strong> so you can see exactly what's being done</li>
	<li><strong>Produces outputs</strong> including coefficients, metrics, and visualizations</li>
	<li><strong>Provides interpretation guidance</strong> to prevent common analytical mistakes</li>
	<li><strong>Warns about traps</strong> like accuracy illusions, threshold fallacies, and omitted variable bias</li>
	</ul>

	<h3>What It Does NOT Do</h3>
	<ul>
	<li><strong>Make decisions for you</strong> — it provides evidence, you decide</li>
	<li><strong>Certify models as "good"</strong> — it shows you results, not approval stamps</li>
	<li><strong>Establish causation</strong> — all findings are associations unless you have experimental data</li>
	<li><strong>Store your data</strong> — nothing is retained between sessions</li>
	<li><strong>Replace professional judgment</strong> — this is an educational tool, not professional services</li>
	</ul>

	<hr>

	<h2>Important Notices</h2>

	<div class="warning-box">
	<h4>Data Privacy</h4>
	<p><strong>You are responsible for ensuring you have proper authorization to analyze the data you upload.</strong></p>
	<p>Do not upload:</p>
	<ul>
	<li>Personally identifiable information (PII) without consent</li>
	<li>Protected health information (PHI)</li>
	<li>Confidential business data you're not authorized to share</li>
	<li>Data subject to regulatory restrictions (GDPR, HIPAA, etc.)</li>
	</ul>
	<p>The Sandbox does not store your data between sessions, but you remain responsible for compliance with applicable privacy laws and organizational policies.</p>
	</div>

	<div class="info-box">
	<h4>Disclaimer</h4>
	<p>The Analytics Modeling Sandbox provides analytical assistance for educational purposes. Outputs are statistical estimates based on the data you provide. They do not constitute predictions, guarantees, or professional advice.</p>
	<p>All findings describe patterns and associations. They do not establish causal relationships unless derived from controlled experiments.</p>
	<p>Consult qualified professionals before making significant business, financial, legal, or operational decisions based on these results.</p>
	</div>

	<hr>

	<h2>Getting Started</h2>

	<h3>Step 1: Access the Sandbox</h3>
	<p>Visit the Sandbox at: <strong>[Link to be provided]</strong></p>

	<h3>Step 2: Prepare Your Data</h3>
	<p>Before uploading, ensure your data:</p>
	<ul>
	<li>Is in CSV or Excel format</li>
	<li>Is under 5MB (recommended)</li>
	<li>Has clear column headers</li>
	<li>Has a defined outcome variable (for regression/classification)</li>
	</ul>

	<h3>Step 3: Upload and Describe</h3>
	<p>When you upload your file, tell the Sandbox:</p>
	<ul>
	<li>What decision this analysis will inform</li>
	<li>Which column is your outcome variable</li>
	<li>What type of analysis you want (regression, classification, or clustering)</li>
	</ul>

	<hr>

	<h2>The 7-Step Workflow</h2>

	<p>The Sandbox suggests a structured workflow but allows you to skip steps if needed. Skipping steps increases interpretation risk — the Sandbox will warn you but won't block you.</p>

	<div class="step-box">
	<h3><span class="step-number">1</span> Business Context</h3>
	<p><strong>Purpose:</strong> Establish what decision this analysis informs.</p>
	<p><strong>What happens:</strong> The Sandbox asks about your goals before diving into data.</p>
	<p><strong>Why it matters:</strong> Analysis without context produces technically correct but practically useless results.</p>
	<p><strong>If you skip:</strong> <em>"Proceeding without clear goals increases interpretation risk."</em></p>
	</div>

	<div class="step-box">
	<h3><span class="step-number">2</span> Data Overview</h3>
	<p><strong>Purpose:</strong> Understand what you're working with before modeling.</p>
	<p><strong>What happens:</strong> The Sandbox shows dataset shape, column types, missing value summary, and basic distributions.</p>
	<p><strong>Key question:</strong> <em>"Who might be excluded from this dataset? Could they differ systematically?"</em></p>
	</div>

	<div class="step-box">
	<h3><span class="step-number">3</span> Data Preparation</h3>
	<p><strong>Purpose:</strong> Handle missing values, encode categories, scale features.</p>
	<p><strong>What happens:</strong> The Sandbox shows what preparation steps are applied, why, and the trade-offs involved.</p>
	<p><strong>Transparency:</strong> You'll see the code so you know exactly what's being done.</p>
	</div>

	<div class="step-box">
	<h3><span class="step-number">4</span> Analysis</h3>
	<p><strong>Purpose:</strong> Run the model.</p>
	<p><strong>What happens:</strong> The Sandbox executes regression, classification, or clustering using standard sklearn libraries.</p>
	<p><strong>Defaults shown explicitly:</strong></p>
	<ul>
	<li>Train/test split: 70/30</li>
	<li>Random state: 42</li>
	<li>Classification threshold: 0.5 (with alternatives shown)</li>
	<li>Clustering: K values 3-6 tested</li>
	</ul>
	</div>

	<div class="step-box">
	<h3><span class="step-number">5</span> Results</h3>
	<p><strong>Purpose:</strong> Present outputs with context.</p>
	<p><strong>For Regression:</strong> Coefficients, R-squared, MAE, RMSE, residual plots</p>
	<p><strong>For Classification:</strong> Confusion matrix, Precision/Recall/F1/AUC, threshold table</p>
	<p><strong>For Clustering:</strong> Cluster sizes, feature means, silhouette scores, elbow plot</p>
	<p>Interpretation notes are embedded with each output.</p>
	</div>

	<div class="step-box">
	<h3><span class="step-number">6</span> Interpretation Check</h3>
	<p><strong>Purpose:</strong> Ensure you're not over-interpreting.</p>
	<p><strong>What happens:</strong> The Sandbox prompts:</p>
	<ul>
	<li>"What assumptions must hold for these results to be actionable?"</li>
	<li>"What could mislead us here?"</li>
	<li>"Who might be missing from this data?"</li>
	</ul>
	</div>

	<div class="step-box">
	<h3><span class="step-number">7</span> Limitations & Next Steps</h3>
	<p><strong>Purpose:</strong> Acknowledge what the analysis cannot tell you.</p>
	<p><strong>What happens:</strong> The Sandbox helps you articulate what remains uncertain, what additional data would help, and what tests would increase confidence.</p>
	</div>

	<hr>

	<h2>Understanding Your Outputs</h2>

	<div class="output-section">
	<h4>Regression Outputs</h4>

	<p><strong>Coefficients Table:</strong></p>
	<table>
	<tr><th>Feature</th><th>Coefficient</th></tr>
	<tr><td>Feature_A</td><td>2.34</td></tr>
	<tr><td>Feature_B</td><td>-1.56</td></tr>
	<tr><td>Feature_C</td><td>0.89</td></tr>
	</table>

	<p><strong>How to read:</strong> A coefficient of 2.34 means: among otherwise similar cases in your data, a one-unit increase in Feature_A is associated with a 2.34-unit increase in the outcome, on average.</p>

	<div class="trap-warning">
	<strong>Caution:</strong> This is an association, not a causal effect. Unobserved factors might influence both the feature and the outcome.
	</div>

	<p><strong>Metrics:</strong></p>
	<ul>
	<li><strong>R-squared:</strong> Proportion of variance explained (0-1). Higher isn't always better.</li>
	<li><strong>MAE:</strong> Average prediction error in outcome units.</li>
	<li><strong>RMSE:</strong> Like MAE but penalizes large errors more.</li>
	</ul>
	</div>

	<div class="output-section">
	<h4>Classification Outputs</h4>

	<p><strong>Confusion Matrix:</strong></p>
	<table>
	<tr><th></th><th>Predicted: No</th><th>Predicted: Yes</th></tr>
	<tr><td><strong>Actual: No</strong></td><td>True Negative</td><td>False Positive</td></tr>
	<tr><td><strong>Actual: Yes</strong></td><td>False Negative</td><td>True Positive</td></tr>
	</table>

	<p><strong>Metrics:</strong></p>
	<ul>
	<li><strong>Accuracy:</strong> Can be misleading with imbalanced classes</li>
	<li><strong>Precision:</strong> Of those predicted positive, how many are correct?</li>
	<li><strong>Recall:</strong> Of actual positives, how many did we catch?</li>
	<li><strong>ROC AUC:</strong> Model's ability to rank positives above negatives</li>
	</ul>

	<div class="trap-warning">
	<strong>Threshold Table:</strong> Shows how precision and recall change at different thresholds. Use this to choose a threshold that matches your cost trade-offs — don't just accept 0.5.
	</div>
	</div>

	<div class="output-section">
	<h4>Clustering Outputs</h4>

	<p><strong>Cluster Profiles:</strong></p>
	<table>
	<tr><th>Cluster</th><th>Size</th><th>Feature_A (mean)</th><th>Feature_B (mean)</th></tr>
	<tr><td>0</td><td>150</td><td>2.3</td><td>-0.5</td></tr>
	<tr><td>1</td><td>200</td><td>-1.1</td><td>0.8</td></tr>
	<tr><td>2</td><td>100</td><td>0.5</td><td>1.2</td></tr>
	</table>

	<p><strong>How to read:</strong> Each row shows average feature values for cases in that cluster. Use these to develop descriptive labels.</p>

	<div class="trap-warning">
	<strong>Caution:</strong> Clusters are analytical groupings, not inherent types. Different features or scaling would produce different segments.
	</div>
	</div>

	<hr>

	<h2>Embedded Trap Warnings</h2>

	<p>The Sandbox automatically includes warnings after outputs to prevent common mistakes.</p>

	<div class="trap-warning">
	<strong>After Regression:</strong> "Coefficients describe associations, not causal effects. Consider what unobserved factors might influence both predictor and outcome. Large effects may be driven by outliers—check residual plots."
	</div>

	<div class="trap-warning">
	<strong>After Classification:</strong> "Accuracy can mislead with imbalanced classes. Check: what would accuracy be predicting the majority class always? The 0.5 threshold is arbitrary—consider the relative costs of false positives vs. false negatives."
	</div>

	<div class="trap-warning">
	<strong>After Clustering:</strong> "Clusters depend on feature selection and scaling. Different choices produce different segments. These are analytical groupings, not fixed types—validate stability before building strategy."
	</div>

	<div class="trap-warning">
	<strong>For All Analyses:</strong> "Selection Bias Check: Who might be missing from this data? Could excluded cases differ systematically from those included?"
	</div>

	<hr>

	<h2>Tips for Effective Use</h2>

	<div class="two-column">
	<div class="column do-column">
	<h4>Do:</h4>
	<ol>
	<li><strong>Start with clear goals.</strong> Know what decision the analysis will inform.</li>
	<li><strong>Review the data summary.</strong> Check for issues before modeling.</li>
	<li><strong>Examine the code.</strong> Understanding what's done helps interpretation.</li>
	<li><strong>Use the threshold table</strong> (classification). Choose based on your costs.</li>
	<li><strong>Check cluster stability</strong> (clustering). Be cautious if results vary.</li>
	<li><strong>Read the interpretation notes.</strong> They prevent common mistakes.</li>
	<li><strong>Acknowledge limitations.</strong> Stating them is a sign of rigor.</li>
	</ol>
	</div>
	<div class="column dont-column">
	<h4>Don't:</h4>
	<ol>
	<li><strong>Don't upload sensitive data</strong> without authorization.</li>
	<li><strong>Don't skip business context.</strong> Analysis without purpose is just math.</li>
	<li><strong>Don't treat coefficients as causal.</strong> Association ≠ causation.</li>
	<li><strong>Don't celebrate accuracy alone.</strong> Check against the naive baseline.</li>
	<li><strong>Don't reify clusters.</strong> They're groupings, not fixed types.</li>
	<li><strong>Don't ignore who's missing.</strong> Selection bias can invalidate analysis.</li>
	</ol>
	</div>
	</div>

	<hr>

	<h2>When to Use the Reasoning Companion Instead</h2>

	<p>The Sandbox is for <strong>doing analysis</strong>. The Reasoning Companion is for <strong>developing judgment</strong>.</p>

	<table class="comparison-table">
	<thead>
	<tr>
	<th>Use the Sandbox when...</th>
	<th>Use the Reasoning Companion when...</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td>You have your own data to analyze</td>
	<td>You're learning concepts from the book</td>
	</tr>
	<tr>
	<td>You need actual outputs and code</td>
	<td>You want structured reasoning practice</td>
	</tr>
	<tr>
	<td>You're a practitioner applying techniques</td>
	<td>You're a student building fundamentals</td>
	</tr>
	<tr>
	<td>You want efficiency with guidance</td>
	<td>You want Socratic questioning</td>
	</tr>
	</tbody>
	</table>

	<p><strong>Handoff:</strong> After running analysis in the Sandbox, consider working through similar analyses in the Reasoning Companion using the book's curated datasets. The structured critique will strengthen your interpretation skills.</p>

	<hr>

	<h2>Frequently Asked Questions</h2>

	<div class="faq-item">
	<h4>Q: What file formats can I upload?</h4>
	<p><strong>A:</strong> CSV and Excel files (.csv, .xlsx, .xls). Keep files under 5MB for best performance.</p>
	</div>

	<div class="faq-item">
	<h4>Q: Does the Sandbox store my data?</h4>
	<p><strong>A:</strong> No. Data is processed during your session only and is not retained afterward.</p>
	</div>

	<div class="faq-item">
	<h4>Q: Can I run advanced models like XGBoost or neural networks?</h4>
	<p><strong>A:</strong> The Sandbox defaults to interpretable models. You can request advanced models, but the Sandbox will note that complexity often reduces interpretability.</p>
	</div>

	<div class="faq-item">
	<h4>Q: Why does the Sandbox show me code?</h4>
	<p><strong>A:</strong> Transparency. Seeing the code helps you understand exactly what's being done, catch issues, and reproduce the analysis elsewhere.</p>
	</div>

	<div class="faq-item">
	<h4>Q: The Sandbox warned me about something. Did I do something wrong?</h4>
	<p><strong>A:</strong> Not necessarily. Warnings are educational — they flag potential interpretation risks. Consider them, but you decide whether to proceed.</p>
	</div>

	<div class="faq-item">
	<h4>Q: Why doesn't the Sandbox tell me which model is "best"?</h4>
	<p><strong>A:</strong> Because "best" depends on your goals, costs, and context — things the Sandbox can't know. It provides evidence; you make the judgment.</p>
	</div>

	<hr>

	<h2>Quick Reference: Output Checklist</h2>

	<p>Before acting on any Sandbox output, verify:</p>

	<table class="checklist-table">
	<tr><td>Business Context</td><td>Does this analysis answer the right question?</td></tr>
	<tr><td>Data Quality</td><td>Were there missing values, outliers, or anomalies?</td></tr>
	<tr><td>Selection Bias</td><td>Who might be excluded from this data?</td></tr>
	<tr><td>Causation</td><td>Am I treating associations as causal levers?</td></tr>
	<tr><td>Baseline Comparison</td><td>How does this model compare to a naive baseline?</td></tr>
	<tr><td>Threshold Choice</td><td>(Classification) Is 0.5 the right threshold for my costs?</td></tr>
	<tr><td>Feature Dominance</td><td>(Clustering) Which features are driving similarity?</td></tr>
	<tr><td>Stability</td><td>Would results hold with different data or settings?</td></tr>
	<tr><td>Limitations</td><td>What can this analysis NOT tell me?</td></tr>
	</table>

	<hr>

	<div class="final-reminder">
	<blockquote>
	<p>"These results describe patterns in your data. Before acting, consider: (1) what assumptions must hold, (2) who might be excluded from this data, and (3) what additional evidence would increase confidence."</p>
	</blockquote>
	<p>The Sandbox gives you analytical power. <strong>Use it with discipline.</strong></p>
	</div>

	<footer>
	<p><em>Analytics Modeling Sandbox — A companion to "Analytics for Managers"</em></p>
	</footer>

	</body>
	</html>