Posts

Why we use Mean Squared Error: Maximum Likelihood Estimation (Python Simulation)

Image
Why We Use Mean Squared Error: Decoding Maximum Likelihood Estimation In our previous exploration of the Central Limit Theorem , we established that the error in Linear Regression follows a Normal Distribution. This is a huge win—it gives us the shape of the error. But there is no single distribution that fits every dataset. There are infinite Bell Curves—some wide, some narrow, some shifted to the left or right. This raises a critical question: How do we find the specific distribution parameters that best fit our data? In simple words, we need to find the parameters (like Mean or Slope) that minimize the error. We do this using a technique called Maximum Likelihood Estimation (MLE) . Likelihood vs. Probability: What's the Difference? In English, "Likelihood" and "Probability" are synonyms. In Mathematics, they are opposites. Probability is the art of estimating an event (or data) o...

Why Linear Regression Actually Works: Simulating the Central Limit Theorem

Image
Why Linear Regression Actually Works: Simulating the Central Limit Theorem In Linear Regression, we almost always use Mean Squared Error (MSE) as our loss function. When we apply Gradient Descent, we assume this function forms a nice, convex shape. We find the bottom of this bowl, minimize the error, and assume our parameters are optimal. But there is a hidden assumption here: We assume the noise (error) in our data follows a Normal Distribution. In the real world, data is messy. It rarely follows a perfect Bell Curve. It might be uniform, skewed, or exponential. This leads to a critical question: Why does minimizing squared errors work even when the original data is NOT normal? The answer lies in a statistical law so powerful it feels like a cheat code: The Central Limit Theorem (CLT) . What is the Central Limit Theorem? The Central Limit Theorem states that if you extract sufficiently large samples (ideally n ≥ 30) fr...

Probability isn't about Coin Flips: The Math of Uncertainty (For AI)

Image
Probability isn't about Coin Flips: The Math of Uncertainty (For AI) When we talk about Probability, most people think of gambling. Flipping coins, rolling dice, or picking cards. This is the Frequentist view: the idea that if you repeat an event infinite times, the frequency converges to a number (e.g., 50% Heads). But in Artificial Intelligence, we don't care about flipping coins. When a self-driving car sees a shadow on the road, it can't replay that scenario a million times to see if it crashes. It has to make a decision NOW , based on incomplete information. In Machine Learning, Probability isn't about frequency. It is a Calculus of Belief . It is a way to quantify how confident the machine is in its own view of the world. Welcome to Season 2 of Decoding Complexities. We have mastered the Geometry of Data (Linear Algebra); now we master the Logic of Uncertainty. The Core Shift: Fixed Data vs. Uncertain ...

The "Fundamental Theorem" of Linear Algebra: SVD Explained

Image
The Fundamental Theorem of Linear Algebra: SVD (Singular Value Decomposition) We have spent the last few weeks mastering Eigenvectors and PCA. We learned how to find the hidden axes of data. But there was always a catch: Eigenvectors only work on Square Matrices. But look at the real world. A dataset of Users vs. Movies is a rectangular matrix. An image is a rectangular matrix of pixels. A spreadsheet of stock prices is rectangular. If you try to calculate the Eigenvectors of a rectangular matrix, the math breaks. So, how do we find the hidden structure of any matrix, of any shape? The answer is the Singular Value Decomposition (SVD) . It is arguably the most important theorem in all of Linear Algebra. The Intuition: Breaking Down the Rectangle SVD states that any matrix A can be broken down into three clean components: A = U Σ Vᵀ Let's use a "Netflix" analogy where Matrix A represents Use...

How to Crush Big Data: The Math of PCA (Principal Component Analysis)

Image
How to Crush Big Data: The Math of PCA (Principal Component Analysis) In the real world, data is massive. A single image has millions of pixels. A financial model has thousands of market indicators. We call this the Curse of Dimensionality . When you have 10,000 features, your data becomes sparse, distance calculations break down, and models become incredibly slow. Visualization? Impossible. So, how do we fix this? How do we take a massive, high-dimensional monster and squash it down to just 2 or 3 dimensions without losing the important information ? The answer is Principal Component Analysis (PCA) . In this post, we will strip away the complexity and build PCA from scratch in Python using a 5-step linear algebra recipe. The Intuition: Information = Variance Before the math, we need to define our goal. What does it mean to "keep information"? In Data Science, Information is Variance (Spread). If data...

The Hidden "DNA" of Matrices: Eigenvalues & Eigenvectors

Image
The Hidden "DNA" of Matrices: A Developer's Guide to Eigenvectors Matrices are machines. They transform space. They stretch it, shear it, and rotate it. If you take a typical vector and multiply it by a matrix, it gets knocked off course. It changes its direction completely. But amidst all this chaos, there are special, rare vectors that refuse to be knocked off course . When the matrix transforms them, they don't rotate. They stay locked on their original span, only getting longer or shorter. These are the Eigenvectors . They represent the "Internal Axes" or the "DNA" of the matrix. Finding them is the key to unlocking the hidden structure of data, and it is the mathematical engine behind algorithms like Principal Component Analysis (PCA) . The Intuition: Av = λv Let's decode the famous equation. "Eigen" is German for "own" or "characteristic." These...

AI isn't "Smart." It's Just Rational. (The Math of Agents)

Image
AI isn't "Smart." It's Just Rational. (The Math of Agents) We often think of Artificial Intelligence as an attempt to mimic the human mind. We use words like "thinking," "understanding," or "smart." But to an engineer, these words are distractions. In the strict mathematical sense, AI doesn't need to be smart. It just needs to be Rational . But what does "Rational" mean? It doesn't mean "sane" or "logical" in the human sense. It means one specific, programmable thing: Maximizing Expected Utility . In this post, we will decode the concept of the Rational Agent. We’ll strip away the sci-fi hype and look at the mathematical framework—Sensors, Actuators, and Performance Measures—that defines everything from a Roomba to ChatGPT. The Agent Function: f(P) -> A Let's define our terms. In AI, an Agent is simply anything that perceive...