Parquet Content-Defined Chunking

Datasets mentioned in this article 1

More Articles from our Blog

[

image

datasetsxethub

Streaming datasets: 100x More Efficient

  • image
  • image
  • image
  • image
  • +1

86

October 27, 2025

](https://huggingface.co/blog/streaming-datasets)

[

image

parquetdedupestorage

Improving Parquet Dedupe on Hugging Face Hub

  • image
  • image

41

October 5, 2024

](https://huggingface.co/blog/improve_parquet_dedupe)

Community

image

sfkeller

Jul 28, 2025

Does the Hugging Face Xet Storage also work on top of (self-hosted) Minio?

Reply

image

jsulz

Jul 28, 2025

edited Jul 28, 2025

@sfkeller Not today, but the underlying technology is open source and we're in the process of documenting the backend! We plan to release a Xet protocol later this year, which would open up the possibility to build for other backends as well. cc @rajatarya

👍

3

3

Reply

image

Gheni

Aug 3, 2025

This is great! I will look forward to the release of Xet protocol.

❤️

3

3

Reply

image

wilderbit

Aug 19, 2025

I hope it is written in Rust :)

  • image
  • 1 reply

·

👍

1

1

image

rajatarya

Aug 19, 2025

Protocol + Format will be documentation, but there is the hf-xet implementation in Rust with xet-core.

deleted

Aug 24, 2025

This comment has been hidden

EditPreview

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Comment

· Sign up or log in to comment

[Upvote

75](https://huggingface.co/login?next=%2Fblog%2Fparquet-cdc)

  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • +63

Datasets mentioned in this article 1