A Developer's Guide to Vector Embeddings (Text & Images)
A Developer's Guide to Vector Embeddings: The Secret to "Meaning" in AI
How can a computer, a machine that only understands numbers, possibly solve an analogy like King - Man + Woman = Queen? It seems like magic. It's not. It's the result of one of the most powerful and important concepts in modern machine learning: Vector Embeddings.
Embeddings are the engine behind everything from ChatGPT's language understanding to your Netflix recommendations.
In this post, we are going to decode embeddings from first principles. We'll build the developer's mental model for what they are, how they are created, and why they are so fundamental to modern AI systems.
The Big Idea: Translating Meaning into Geometry
At its core, a vector embedding is a translator. It takes a discrete, non-mathematical item (like a word, a movie, or a product) and translates it into a list of numbers—a vector—in a high-dimensional space.
The Developer's Mental Model: The Universal Map
Imagine a giant, multi-dimensional map. An embedding's job is to assign a unique coordinate (a vector) to every single item you care about. The revolutionary insight is that the process for creating these embeddings forces semantically similar items to be placed close to each other on the map.
- The vector for "Dog" will have a high cosine similarity (a small angle) to the vector for "Puppy."
- The vector for "France" will be geometrically close to the vector for "Paris."
- Conversely, the vector for "Dog" will be very far from (or nearly orthogonal to) the vector for "Astrophysics."
This transformation of 'semantic similarity' into 'geometric proximity' is the key. It turns abstract meaning into something a computer can measure, compare, and manipulate using the tools of linear algebra.
How Are Embeddings Created? The Language Teacher Analogy
We don't design these complex maps by hand. We force a neural network to learn them for us by giving it a simple, self-supervised task. The classic example is the Word2Vec model (specifically, the Skip-gram architecture).
The Developer's Mental Model: The Neural Network as a Language Teacher
Imagine you give a neural network a simple objective: predict the context words that are likely to appear near a given input word. For example, if you give it the word "brown," you want it to predict words like "quick," "fox," "dog," "jumps," etc. The network's architecture is surprisingly simple:
- An input layer that takes a single word (e.g., "fox").
- A single hidden layer, which is our embedding layer. The size of this layer (e.g., 100 or 300 neurons) defines the dimension of our vector embeddings. The weights of this layer are the embeddings.
- An output layer that predicts the probability of every other word in the vocabulary appearing nearby.
The network is then trained on billions of sentences from a corpus like Wikipedia. For every word, it tries to predict its neighbors. When it gets it right, nothing happens. When it gets it wrong, it uses backpropagation to slightly adjust its weights.
The crucial insight is this: the embedding is a byproduct of the training process. The network isn't trying to learn a good "map"; it's just trying to get good at predicting context words. To do this, it is forced to learn that words that appear in similar contexts (like "dog" and "puppy") must have similar internal representations. This process naturally nudges their embedding vectors closer together in the high-dimensional space.
Embeddings in Action: Python Code Examples
Theory is great, but let's see this in practice. We can use pre-trained models to explore these learned vector spaces.
Example 1: Word Embeddings with Gensim
The gensim library makes it easy to load and play with pre-trained word embeddings.
# You'll need to install the library gensim: pip install gensim
import numpy as np
import gensim.downloader as api
# Load a pre-trained model (e.g., 100-dimensional vectors trained on Wikipedia)
word_vectors = api.load('glove-wiki-gigaword-100')
# --- Find similar words ---
print("Similar to 'car':", word_vectors.most_similar('car', topn=5))
# --- Solve the famous analogy ---
result = word_vectors.most_similar(positive=['king', 'woman'], negative=['man'], topn=1)
print(f"king - man + woman = {result[0][0]}")
# --- Measure similarity ---
similarity = word_vectors.similarity('cat', 'dog')
print(f"Similarity between 'cat' and 'dog': {similarity:.4f}")
Example 2: Image Embeddings with PyTorch and Timm
The same concept applies to images. We can use a powerful pre-trained Convolutional Neural Network (CNN), like ResNet, as a feature extractor. By removing its final classification layer, we can access the rich, dense vector that represents the network's internal "understanding" of an image.
# You'll need to install the libraries: pip install torch torchvision timm Pillow
import torch
import timm
from PIL import Image
from torchvision import transforms
# 1. Load a pre-trained model (e.g., ResNet50)
# We use it as a feature extractor, not for classification.
model = timm.create_model('resnet50', pretrained=True, num_classes=0)
model.eval()
# 2. Create a standard image transformation pipeline
transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
# 3. Helper function to get an embedding from an image file
def get_image_embedding(image_path, model):
img = Image.open(image_path).convert('RGB')
img_t = transform(img)
batch_t = torch.unsqueeze(img_t, 0)
with torch.no_grad():
embedding = model(batch_t)
return embedding.numpy().flatten()
# 4. Get embeddings for your images (replace with your file paths)
# For this to work, have three images: cat1.jpg, cat2.jpg, and car1.jpg
embedding_cat1 = get_image_embedding('cat1.jpg', model)
embedding_cat2 = get_image_embedding('cat2.jpg', model)
embedding_car1 = get_image_embedding('car1.jpg', model)
# 5. Calculate Cosine Similarity
def cosine_similarity(vec1, vec2):
return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))
sim_cats = cosine_similarity(embedding_cat1, embedding_cat2)
sim_cat_car = cosine_similarity(embedding_cat1, embedding_car1)
print(f"Similarity between two cat images: {sim_cats:.4f}")
print(f"Similarity between a cat and a car: {sim_cat_car:.4f}")
You would see that the similarity score between the two cat images is very high (e.g., > 0.9), while the similarity between the cat and car is much lower. This demonstrates that the network has learned to place similar images close together in its embedding space.
Conclusion: From Magic to Engineering
Vector embeddings are not magic. They are a powerful engineering solution to the problem of representing meaning. By translating abstract concepts into a geometric space, they allow us to apply the rigorous and predictable tools of linear algebra to solve problems that once seemed impossible. They are the secret ingredient that makes modern AI feel so intelligent.
Your Turn...
What is the most surprising or interesting application of vector embeddings you've encountered? Share your thoughts or any questions in the comments below!
This post is part of the "Linear Algebra: The Core of Machine Learning" series. For the previous part, check out: Linear Independence & Dependence Explained (The Key to Feature Engineering).
Comments
Post a Comment