Tokenization in Transformers v5: Simpler, Clearer, and More Modular

Models mentioned in this article 2

More Articles from our Blog

[

image

transformersv5community

Hot

Transformers v5: Simple model definitions powering the AI ecosystem

  • image
  • image
  • image
  • image

311

December 1, 2025

](https://huggingface.co/blog/transformers-v5)

[

image

nlpcommunityresearch

Introducing the Ettin Reranker Family

  • image

42

May 19, 2026

](https://huggingface.co/blog/ettin-reranker)

Community

image

FHSEOHub

Dec 20, 2025

I think it depends on the nature of tools

  • image
  • 1 reply

·

👀

1

1

image

ariG23498

Article author Dec 23, 2025

What depends on the nature of tools? 😳

deleted

Dec 21, 2025

This comment has been hidden

image

Sifal

Jan 10

Thanks for doing this! I had to train some tokenizers with the v4, it was indeed not straightforward to understand the behavior.

I had two questions:

  • You said: older model implementations may rely on Python-specific behavior. Curious if you had any example

  • You sometimes say "fast" (between quotes) is it just to refer to the fastTokenizers backend or can the implementation actually be slower than the python implementation because of some kind of rust overhead?

  • image

  • image

  • 2 replies

·

image

ariG23498

Article author Jan 12

Glad that this was useful to you.

  1. All the classes that extend the PreTrainedTokenizer (which is an alias to the PythonBackend will serve as you examples. (GitHub Search)
  2. The rust backend is faster compare to the other implementations.

🤗

1

1

Expand 1 reply

EditPreview

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Comment

· Sign up or log in to comment

[Upvote

124](https://huggingface.co/login?next=%2Fblog%2Ftokenizers)

  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • +112

Models mentioned in this article 2