The Distance from Beijing to Shanghai: Understanding LLM Text Embeddings and RAG with Vector Databases

We often say, “Large Language Models are very smart.”

They can write articles, answer questions, code, and even perform some logical reasoning.

But in reality, the principle behind them isn’t mysterious:

👉 It just turns a piece of text into a string of numbers,
👉 and then “calculates the distance” in a high-dimensional space.

This might sound a bit abstract, but with a small example, you’ll understand it right away. 👇

1. Language Can Also Become “Coordinates”

For example, take this sentence:

How did Cao Cao die?

The model doesn’t “understand” this sentence like a human. Instead, it first converts it into a fixed-length vector, for example, a 1536-dimensional coordinate:

[ 0.01234, -0.01891, 0.00023, …, 0.07112 ]

👉 No matter how long your original input sentence is, it’s converted into 1536 numbers.

This means every sentence has an “address” in the semantic space.

This step is called Text Embedding, and it’s the first step for modern large language models to understand language.

2. Using Beijing and Shanghai as an Example 🏙️

Imagine the Earth we live on is a two-dimensional plane.

Every city has its corresponding latitude and longitude coordinates.

Let’s assume Beijing’s “coordinate” is (1, 1) and Shanghai’s “coordinate” is (4, 5).

We can then use the Euclidean distance formula to calculate the “distance” between them. 👇

(4 - 1)^2 + (5 - 1)^2
= 3^2 + 4^2
= 9 + 16
= 25
sqrt(25) = 5

👉 The closer the distance, the more similar the semantics.
👉 The farther the distance, the greater the semantic difference.

This is the basic idea of “vector distance.”

Real large language models use 1536 dimensions, but the mathematical principle is the same.

It’s just that the two-dimensional “Beijing-Shanghai” example is easier to understand intuitively.

3. RAG: The External “Knowledge Base” 🧠📚

Many people mistakenly believe:

Large Model = Knowledge Base

That’s actually incorrect!

Large Language Model: Responsible for understanding language and generating answers.
Vector Database: Responsible for storing and retrieving information.

For example, if you ask:

How did Cao Cao die?

The AI’s workflow is actually:

Vectorize your question.
Find the content with the closest “semantic distance” to this question in the vector database (e.g., “Cao Cao died of illness in Luoyang”).
Pass this information + your question to the large model.
The large model generates a natural language answer.

This technique is called RAG (Retrieval-Augmented Generation). Its advantage is 👉 you can make the AI “aware” of your local knowledge without retraining the model.

For example, corporate knowledge bases, professional documents, and historical archives can all be integrated this way.

4. Transformer: The Model’s “Thinking” Layer 🧮

After receiving input, the inner workings of a large language model aren’t “magic.” They consist of layer after layer of Transformer structures (usually 20 or more).

Each layer refines and abstracts semantics, much like the human brain continuously “processes” information.

Ultimately, the model finds the “knowledge point” in the 1536-dimensional semantic space that is closest to your question and then converts it into a natural language output.

5. Why 1536 Dimensions? 🤔

A two-dimensional space can represent the geographical locations of Beijing and Shanghai;

but language is far more complex than geographical information.

A single piece of text can simultaneously contain:

Time
Place
Subject
Emotion
Grammatical structure
Implied relationships

Two dimensions are simply not enough, so the model opts for a high-dimensional space, like 1536 dimensions.

This allows for a more precise depiction of semantic differences.

Closer distance → More similar semantics
Farther distance → Greater difference in meaning

This is the essence of “Semantic Embedding”.

6. Summary 📝

🧭 The model first converts text into vectors.
📏 Semantic similarity = close vector distance.
📚 Vector databases are responsible for fast retrieval.
🧠 RAG technology gives the model an “external knowledge base”.
🧮 Transformers are responsible for semantic understanding and generation.

📌 So, when you’re chatting with an AI, it’s finding the “closest point” to your question in a 1536-dimensional space and then expressing it in natural language.

1. Language Can Also Become “Coordinates”#

2. Using Beijing and Shanghai as an Example 🏙️#

3. RAG: The External “Knowledge Base” 🧠📚#

4. Transformer: The Model’s “Thinking” Layer 🧮#

5. Why 1536 Dimensions? 🤔#

6. Summary 📝#