The Gradient Vector: The One Mathematical Object That Trains Every Neural Network
If you are standing on a mountain, you check the slope and start walking to reach the bottom of the mountain. How does the machine do it? If the input data contains just one feature, then the machine will adjust its weight to minimize the loss. It means it either has to move right or left depending upon the slope. What if there are more than one feature? ChatGPT has 175 billion parameters. Training it means finding the lowest point in a 175 billion dimensional space. That is the mathematical problem every AI trainer faces. And, there is a solution for this. The Single Parameter Case If we just have a single parameter, the problem gets easy — we calculate the slope and decide which way to move. The moment we have another parameter, just slope doesn't work anymore. Imagine standing on a mountain or somewhere on a bowl — you have infinite directions to move. And which direction to choose is the key. We can't c...