In Generative AI, are the tokens converted real time and remain in cache or MS has converted and saved all of them

Rajeev Gera 0 Reputation points
2025-01-08T23:57:14.4666667+00:00

When training text are converted into tokens - where is it saved. It is a real time change for every query? Are they saved in cache ? What if someone uses the same text for query on a different computer? What happens when you run a new query? What happens when you close the browser and open the AI page again?

Has Microsoft or any encoder or decoder has converted all text and saved them as tokens?

This question is related to the following Learning Module

Azure Training
Azure Training
Azure: A cloud computing platform and infrastructure for building, deploying and managing applications and services through a worldwide network of Microsoft-managed datacenters.Training: Instruction to develop new skills.
1,981 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Rakesh Gurram 11,385 Reputation points Microsoft Vendor
    2025-01-09T08:58:55.49+00:00

    Hi Rajeev Gera,

    Thank you for reaching out to us on the Microsoft Q&A forum.

    In Generative AI systems like GPT, tokenization is the process of breaking text into smaller pieces called tokens, which the model uses to generate responses. This happens in real-time whenever you enter text. The tokenizer follows a fixed method to create tokens, but it doesn’t pre-process or save tokens for all possible inputs because that would require too much storage. Instead, tokenization is done only when needed.

    During training, the model uses tokenized text to learn patterns and relationships. However, these tokens are not stored after training; the model keeps the knowledge in its internal structure. Each time you make a query, the text is tokenized again on the spot and processed right away.

    Sometimes, tokens may be temporarily stored during a session to make things faster, but they are not saved permanently. If you use the same text on a different device, it will be tokenized again in real-time. The tokenizer always works the same way, so the same input will always create the same tokens.

    If you close your browser and reopen it, any temporary tokens are cleared, and the tokenization process starts fresh for new queries. It’s not practical to pre-convert and save tokens for every possible text because there are endless possibilities. Instead, the model processes text dynamically as you use it, ensuring it can handle a wide variety of inputs efficiently.

    Please reach out to us if you have any other queries.

    If the information is helpful, please Accept Answer & Upvote so that it would be helpful to other community members.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.