Artificial Mathematics | RUBIK

Log in or register to post comments
160 views

amerubik

Mar 23, 2025

Artificial Mathematics: The Language of Generative Models

When we write a prompt in ChatGPT or any other generative model, we use words, phrases, and human linguistic structures. However, the model does not understand letters, meanings, or context as we do. Instead, it operates purely mathematically: each word, phrase, or idea is transformed into numbers and positions in a high-dimensional mathematical space.

From Letters to Numbers

Language processing in models like ChatGPT is based on a fundamental principle: representing text as numbers. This is done through a process called tokenization, where words or word fragments are converted into sequences of numerical tokens. These tokens are associated with vectors in a mathematical space, allowing the model to manipulate them as coordinates in a vast network of meanings.

Original Text	Generated Tokens	Numerical Representation
Hello, how are you?	["Hello", ",", "how", "are", "you", "?"]	[1023, 345, 876, 204, 55]

Learning as Mathematical Positioning

Training a generative model is essentially about assigning and adjusting the positions of these tokens in a multidimensional space. In essence, a model learns probability distributions: how likely it is for one word to follow another, based on vast amounts of previous data. There is no real understanding, only a sophisticated mathematical correlation between word sequences.

Learning Process	Explanation
Training Data	The model is fed with millions of text examples.
Tokenization	Words are converted into numerical tokens.
Embeddings	Tokens are assigned vectors in a multidimensional space.
Parameter Adjustment	Weights and connections are optimized via neural networks.
Text Generation	The next word is predicted based on probabilities.

Text Generation: A Numerical Prediction

When we write a prompt, the model calculates the most probable next word based on its prior training. It uses techniques like transformers and embeddings to evaluate the relationship between the words in the prompt and their vector space of knowledge. The output is a sequence of numbers that, when decoded, results in a human-readable response.

In other words, what appears to be a fluent conversation is actually the sequential selection of numbers optimized by probability functions and neural networks.

Visualization of the Generation Process

A generative model follows a probability-based generation process. We can represent it graphically as follows:

Input: "The sky is"

Neural network analyzes probabilities:

("blue" - 85%) | ("gray" - 10%) | ("red" - 5%)

Generated output: "The sky is blue"

Each generated word is the result of statistical calculations based on millions of previously analyzed words.

Limitations and Challenges of Generative Models

Despite their impressive capabilities, generative models have limitations:

Lack of real understanding: They do not comprehend meaning, only manipulate numerical patterns.
Dependence on training data: If the data contains biases, the model will replicate them.
Difficulty handling long-term context: Although improving, they can still lose coherence in long responses.
Generation of incorrect information or hallucinations: They can produce plausible but incorrect responses.

Limitation	Impact on Text Generation
Lack of understanding	Generates syntactically correct responses without real comprehension.
Bias in data	Can produce biased or distorted responses.
Limited context	May lose coherence in long texts.
Incorrect information	Sometimes fabricates unverified data.

Conclusion: Mathematics in Action

"Artificial mathematics" is the foundation of generative artificial intelligence. There is no interpretation of meanings in human terms, only mathematical calculations that determine patterns and predictions. Every word you see in a ChatGPT response is nothing more than the manifestation of numerical operations on a model trained with millions of previous texts.

Ultimately, generative AI does not understand, it calculates; it does not think, it predicts. What is language to us is pure artificial mathematics to it.