Training and Finetuning Sparse Embedding Models with Sentence Transformers

Here's a python package that you can use to index, query, and rank your documents with SPLADE models from sentence-transformers.

splade-index: https://github.com/rasyosef/splade-index

SPLADE-Index⚡

SPLADE-Index is an ultrafast index for SPLADE sparse retrieval models implemented in pure Python and powered by Scipy sparse matrices. It is built on top of the BM25s library.

Installation

You can install splade-index with pip:

pip install splade-index

Recommended (but optional) dependencies:

# To speed up the top-k selection process, you can install `jax`
pip install "jax[cpu]"

Quickstart

Here is a simple example of how to use splade-index:

from sentence_transformers import SparseEncoder
from splade_index import SPLADE

# Download a SPLADE model from the 🤗 Hub
model = SparseEncoder("rasyosef/splade-tiny")

# Create your corpus here
corpus = [
"a cat is a feline and likes to purr",
"a dog is the human's best friend and loves to play",
"a bird is a beautiful animal that can fly",
"a fish is a creature that lives in water and swims",
]

# Create the SPLADE retriever and index the corpus
retriever = SPLADE()
retriever.index(model=model, documents=corpus)

# Query the corpus
queries = ["does the fish purr like a cat?"]

# Get top-k results as a tuple of (doc ids, documents, scores). All three are arrays of shape (n_queries, k).
results = retriever.retrieve(queries, k=2)
doc_ids, result_docs, scores = results.doc_ids, results.documents, results.scores

for i in range(doc_ids.shape[1]):
doc_id, doc, score = doc_ids[0, i], result_docs[0, i], scores[0, i]
print(f"Rank {i+1} (score: {score:.2f}) (doc_id: {doc_id}): {doc}")

# You can save the index to a directory
retriever.save("animal_index_splade")

# ...and load it when you need it
import splade_index

reloaded_retriever = splade_index.SPLADE.load("animal_index_splade", model=model)