`LeRobotDataset:v3.0`: Bringing large-scale datasets to `lerobot`
Any speed benchmarks? Typically what's really important is parallel decoding of videos - the mp4 decoding will bottleneck things immensely
·
Article author
ciao there 👋 we are about to release an addendum to this blogpost where we benchmarked the throughput kinda extensively, in particular against the streaming version of datasets. We are still polishing it up a bit before releasing, but if you're interested you can checkout this PR to read more: https://github.com/huggingface/blog/pull/3084
re: your point on decoding. that's precisely the case! however, torchcodec really does wonders here so we're able not to bottleneck datasets with decoding :)