Why Linear Regression Actually Works: Simulating the Central Limit Theorem
Why Linear Regression Actually Works: Simulating the Central Limit Theorem In Linear Regression, we almost always use Mean Squared Error (MSE) as our loss function. When we apply Gradient Descent, we assume this function forms a nice, convex shape. We find the bottom of this bowl, minimize the error, and assume our parameters are optimal. But there is a hidden assumption here: We assume the noise (error) in our data follows a Normal Distribution. In the real world, data is messy. It rarely follows a perfect Bell Curve. It might be uniform, skewed, or exponential. This leads to a critical question: Why does minimizing squared errors work even when the original data is NOT normal? The answer lies in a statistical law so powerful it feels like a cheat code: The Central Limit Theorem (CLT) . What is the Central Limit Theorem? The Central Limit Theorem states that if you extract sufficiently large samples (ideally n ≥ 30) fr...