Tokenization in Transformers v5: Simpler, Clearer, and More Modular
Models mentioned in this article 2
More Articles from our Blog
[
![]()
transformersv5community
Hot
Transformers v5: Simple model definitions powering the AI ecosystem
311
December 1, 2025
](https://huggingface.co/blog/transformers-v5)
[
![]()
nlpcommunityresearch
Introducing the Ettin Reranker Family
42
May 19, 2026
](https://huggingface.co/blog/ettin-reranker)
Community
![]()
I think it depends on the nature of tools
·
👀
1
1
![]()
Article author Dec 23, 2025
What depends on the nature of tools? 😳
deleted
This comment has been hidden
![]()
Thanks for doing this! I had to train some tokenizers with the v4, it was indeed not straightforward to understand the behavior.
I had two questions:
You said: older model implementations may rely on Python-specific behavior. Curious if you had any example
You sometimes say "fast" (between quotes) is it just to refer to the fastTokenizers backend or can the implementation actually be slower than the python implementation because of some kind of rust overhead?
2 replies
·
![]()
Article author Jan 12
Glad that this was useful to you.
- All the classes that extend the
PreTrainedTokenizer(which is an alias to thePythonBackendwill serve as you examples. (GitHub Search) - The rust backend is faster compare to the other implementations.
🤗
1
1
Expand 1 reply
EditPreview
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Comment
· Sign up or log in to comment
[Upvote
124](https://huggingface.co/login?next=%2Fblog%2Ftokenizers)