Introducing the Ettin Reranker Family

Models mentioned in this article 10

Datasets mentioned in this article 5

Collections mentioned in this article 2

Community

This is a fantastic read, thank you for the hard work and for sharing the recipes!

Article author

5 days ago

Gladly! I'm glad you enjoyed it 🤗

Great work! Really learnt a lot from this blog.

Article author

4 days ago

Happy to hear it! 😄

Great work! Love that you used stratified sampling. It's great to see its power on cross-encoders, too!

Article author

3 days ago

Definitely! I tried a few variants, and in my tests, a mix of top and stratified worked best, but I sampled from 2048 docs, so fully stratified means the 2nd doc is already pretty far from the most similar.

Article author

2 days ago

Thanks Maxime! You know, I bet the training script would apply pretty cleanly on LFM models as well 👀 Although maybe the model prefers a generative architecture with a chat template that asks the model to generate either "yes" or "no", and the difference in raw logit scores for those tokens are used as the prediction. That's what's used for the teacher model's Sentence Transformers integration, and it works quite well: https://huggingface.co/mixedbread-ai/mxbai-rerank-large-v2

Fantastic work! In information retrieval, rerankers are often used in real-time pipelines, so a compact reranker with strong performance is exactly what many of us have been waiting for. Thank you for the great work!

Article author

2 days ago

Yess, exactly. In my opinion, rerankers have been getting too big to keep them usable. Hopefully this'll inspire others to make smaller models as well. Plus, the Ettin base models are just amazing, that helps a lot.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images