Week 2: Data, ML, and How Models Learn
Evaluation, Leakage, and GDPR Boundaries
A bad evaluation pipeline can make a useless model look great.
Week 2: Data, ML, and How Models Learn
A bad evaluation pipeline can make a useless model look great.
Objective
Describe data leakage, basic evaluation metrics, and why sensitive data handling matters in training flows.The lesson is public. The pressure loop lives inside the app where submissions, revision, and review happen.
Deliverable
A simple ML pipeline with evaluation and a leakage audit.Each lesson contributes to a week-level artifact and eventually to the shipped AI-native SaaS.
Preview
Lesson Preview
A bad evaluation pipeline can make a useless model look great.
This lesson teaches you how to distrust a flattering metric until the evaluation design has earned your trust.
Leaky evaluation produces false confidence, which is one of the fastest ways to launch a bad model with executive approval. Privacy mistakes add legal and reputational cost on top.
Evaluation is a claim about future usefulness. Leakage and privacy failures invalidate that claim by corrupting either the data boundary or the legal boundary.
What This Is
This lesson teaches you how to distrust a flattering metric until the evaluation design has earned your trust.
Why This Matters in Production
Leaky evaluation produces false confidence, which is one of the fastest ways to launch a bad model with executive approval. Privacy mistakes add legal and reputational cost on top.
Mental Model
Evaluation is a claim about future usefulness. Leakage and privacy failures invalidate that claim by corrupting either the data boundary or the legal boundary.
Deep Dive
Leakage happens when information from outside the legitimate training context slips into features, preprocessing, or label construction. It often hides in time-aware data, aggregated statistics, or human-generated features. At the same time, privacy boundaries matter because model training can easily absorb identifiers that should have been removed, masked, or minimized. Maturity means treating metrics and privacy controls as one coherent quality system.
Worked Example
A support-ticket model uses a field that is only filled after escalation, but the target is escalation itself. Accuracy looks excellent. In reality, the model learned to detect a future artifact. That is leakage, not intelligence.
Common Failure Modes
Frequent failures include splitting after feature engineering, using test-set information in normalization, and assuming anonymization happened because someone said “the data is safe.”
References
official-doc
Use official docs for metric definitions and tradeoffs.
Open referencelaw
Anchor privacy handling in a real legal principle.
Open referenceofficial-doc
Supplement the lesson with a practical framing of leakage risk.
Open reference