How do your favorite LLMs like ChatGPT work, and where is the future for LLMs heading?

For a long time, the world of AI was split. On one side, you had “unstructured” data like images, which deep learning conquered with ease. On the other, you had “structured” data—the spreadsheets and databases that run the world— where traditional algorithms reigned supreme.

Recently, a bridge has been made that allowed neural networks to treat words and categories with the same mathematical elegance as pixels. Maybe some of you know what that is- the Embedding. I’ll give context later on, don’t worry- just read the whole post!

1. The Language Model: Predicting the Future

In Natural Language Processing (NLP), the breakthrough didn’t come from teaching a computer grammar rules. It came from a simple game: Predict the next word. By the way, that’s exactly what all your favorite LLMs like ChatGPT or Gemini do… they don’t magically spit out a logical response out of nowhere.

By training a model on massive amounts of text (like Wikipedia) to simply guess what word comes next, the model accidentally learns “common sense.” It learns that “The capital of France is…” is likely followed by “Paris.” It’s almost a statistics game here if you get my drift. Where I would consider myself in my AI learning journey is definitely at a point where I see how AI is all about statistics, numbers, and procedures rather than some sort of skynet uprising like some people would have you believe.

Going back to the point- this is called a Pretrained Language Model. Once the model understands the structure of language, we can “fine-tune” it for specific tasks—like sentiment analysis or document classification, with surprisingly little data. I’ll probably talk more about fine-tuning later, but it’s basically just as it sounds: shaping or forming the AI brain to become more of an expert on a particular niche.

2. Embeddings: Giving Math to Meaning

How do you explain to a computer that “Monday” and “Tuesday” are similar, but “Monday” and “Blueberry” are not? Think about it, a computer is all about numbers, it cannot reason like you and I.

If we use a simple list (one-hot encoding, another topic I will cover in a later post hopefully!), every category is equidistant from the others. To a computer, Monday is just as different from Tuesday as it is from a fruit. So how will we solve this problem?

The answer is by converting to numbers!

Embeddings solve this by mapping categories into a multi-dimensional space. In this space:

Monday and Tuesday cluster together.
King and Queen maintain the same vector relationship as Man and Woman.
Continuous variables (like price) and Categorical variables (like day of the week) can finally talk to each other in the same neural network.

Obviously these few examples are on a small scale, but we can have a huge, complex space with so many relationships. Anways, changing gears:

3. Deep Learning for the “Boring” Stuff

Most people think deep learning is just for self-driving cars. But by using embeddings for categorical data, neural networks are now outperforming traditional methods on “boring” tabular data—like retail sales forecasting or credit risk assessment.

The trick is treating every category (a store ID, a zip code, a product type) as a “word” in a language. Once the network learns the “meaning” of those categories through an embedding layer, it can spot patterns that human analysts and even simpler models completely miss. Maybe you could use it instead of manual analysis to spot leakage that might slip through the cracks?

The Takeaway

We are moving toward a world where the architecture of the model matters less than how we represent the data. Whether it’s a pixel, a word, or a row in a spreadsheet, once it’s converted into a high-quality embedding, the “intelligence” of the network can take over. That being said, there are some problems we need to consider! In the next article, I’ll cover how to train realistically, and how to determine if your model is actually “accurate”.