Skip to content

Gab-Menezes/simdphrase

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simd Phrase Search

Extremely fast phrase search implementation.

Overview

This implementation follows some of the ideas proposed in this blog post by Doug Turnbull. The full explanation on how the internals work can be found in here.

This crate uses the [log] crate for logging during indexing.

It's highly recommended to compile this crate with -C llvm-args=-align-all-functions=6.

Usage

use phrase_search::{CommonTokens, Indexer, SimdIntersect};

// Creates a new indexer that can be reused, it will index 300_000 documents
// in each batch and will use the top 50 most common tokens to speed up the search,
// by merging them.
let indexer = Indexer::new(Some(300_000), Some(CommonTokens::FixedNum(50)));

let docs = vec![
    ("look at my beautiful cat", 0),
    ("this is a document", 50),
    ("look at my dog", 25),
    ("look at my beautiful hamster", 35),
];
let index_name = "./index";
let db_size = 1024 * 1024;

// Indexes the documents returned by the iterator `it`.
// The index will be created at `index_name` with the given `db_size`.
let (searcher, num_indexed_documents) = indexer.index(docs, index_name, db_size)?;

// Search by the string "78"
let result = searcher.search::<SimdIntersect>("at my beautiful")?;
// This should return `[0, 35]`
let documents = result.get_documents()?;

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages