@Mohan, Tejaswi (Fairfield, OH) When your input is longer than the limit, you will need to chunk and fetch separate encodings for each chunk. The size of the chunk is up to you and should be decided based on your scenario.
Here is some sample code on how you can chunk your input
tokenizer = tiktoken.get_encoding("cl100k_base")
def split_text(text, max_tokens):
tokens = tokenizer.encode(text)
chunks = []
startIndex = 0
while startIndex < len(tokens):
endIndex = startIndex + max_tokens
chunks.append(tokens[startIndex:endIndex])
startIndex = endIndex
return [tokenizer.decode(chunk) for chunk in chunks]
input_text = "input text"
max_tokens = 4000
chunks = split_text(input_text, max_tokens)