From ebf3ddf86813520c04886b6cad19471dcce6a397 Mon Sep 17 00:00:00 2001 From: Andrew Valencik Date: Sat, 6 Apr 2024 15:01:24 -0400 Subject: [PATCH] Organize docs into directories --- README.md | 13 ++++- docs/01-about-protosearch/01-features.md | 13 +++++ docs/01-about-protosearch/02-design-goals.md | 26 +++++++++ docs/01-about-protosearch/directory.conf | 1 + docs/02-tutorial/01-indexing.md | 54 +++++++++++++++++++ .../02-querying.md} | 46 ++++------------ docs/02-tutorial/directory.conf | 1 + docs/index.md | 34 +++--------- 8 files changed, 122 insertions(+), 66 deletions(-) mode change 120000 => 100644 README.md create mode 100644 docs/01-about-protosearch/01-features.md create mode 100644 docs/01-about-protosearch/02-design-goals.md create mode 100644 docs/01-about-protosearch/directory.conf create mode 100644 docs/02-tutorial/01-indexing.md rename docs/{queries.md => 02-tutorial/02-querying.md} (73%) create mode 100644 docs/02-tutorial/directory.conf diff --git a/README.md b/README.md deleted file mode 120000 index e8923303..00000000 --- a/README.md +++ /dev/null @@ -1 +0,0 @@ -docs/index.md \ No newline at end of file diff --git a/README.md b/README.md new file mode 100644 index 00000000..aac34ec8 --- /dev/null +++ b/README.md @@ -0,0 +1,12 @@ +Protosearch +=========== + +Protosearch is a prototype search library under active development (hence the "proto"). +We're currently focussing on end to end functionality, and not yet worrying too much about API stability or performance throughout. +Protosearch is pre-release software, do not use in production. + +[Check out the site to learn more.][site] + + + +[site]: https://cozydev-pink.github.io/protosearch/ diff --git a/docs/01-about-protosearch/01-features.md b/docs/01-about-protosearch/01-features.md new file mode 100644 index 00000000..aa547a51 --- /dev/null +++ b/docs/01-about-protosearch/01-features.md @@ -0,0 +1,13 @@ +Features +======== + +Protosearch is a prototype search library aimed at providing advanced querying features and supporting multiple platforms (JVM, JS, Native). + +It supports full-text search features like keyword search, phrase search, multiple fields, boolean queries, and regular expressions. + +It currently targets static index scenarios such as: + +- Powering site documentation search +- In memory search over immutable collections + + diff --git a/docs/01-about-protosearch/02-design-goals.md b/docs/01-about-protosearch/02-design-goals.md new file mode 100644 index 00000000..c169994f --- /dev/null +++ b/docs/01-about-protosearch/02-design-goals.md @@ -0,0 +1,26 @@ +Design Goals +============ + +## Goals + +- Provide building blocks for search on Typelevel sites +- Enable indexing on JVM, searching in browser JS +- Cross compile to JVM / JS / Native +- Support full Lucene query syntax +- Be safe, functional, and performant + +## Non Goals + +- Competing with or somehow surpassing Lucene +- Being a distributed search like Elasticsearch +- Heavy write workloads + + +## Lucene Inspired + +It's worth calling out how [Lucene][lucene] inspired this library is. +Lucene is an absolutely incredible piece of software. +It has been optimized and extended by a large community for well over 20 years. +If you are looking for very performant search, with a wide range of language support, flexibility and features, you won't find anything better than Lucene on the JVM. + +[lucene]: https://lucene.apache.org/ diff --git a/docs/01-about-protosearch/directory.conf b/docs/01-about-protosearch/directory.conf new file mode 100644 index 00000000..79990779 --- /dev/null +++ b/docs/01-about-protosearch/directory.conf @@ -0,0 +1 @@ +laika.title = About Protosearch diff --git a/docs/02-tutorial/01-indexing.md b/docs/02-tutorial/01-indexing.md new file mode 100644 index 00000000..986a343d --- /dev/null +++ b/docs/02-tutorial/01-indexing.md @@ -0,0 +1,54 @@ +Indexing Tutorial +================= + +Let's setup a collection of books to search over: + +```scala mdoc:silent +case class Book(author: String, title: String) + +val books: List[Book] = List( + Book("Beatrix Potter", "The Tale of Peter Rabbit"), + Book("Beatrix Potter", "The Tale of Two Bad Mice"), + Book("Dr. Seuss", "One Fish, Two Fish, Red Fish, Blue Fish"), + Book("Dr. Seuss", "Green Eggs and Ham"), +) +``` + +In order to index our domain type `Book`, we'll need a few things: +- An `Analyzer` to convert strings of text into tokens. +- `Field`s to tell the index what kind of data we want to store +- A way to get the values for each of the fields for a given `Book` + +We'll pass all these things to an `IndexBuilder`: + +```scala mdoc:silent +import pink.cozydev.protosearch.{Field, IndexBuilder} +import pink.cozydev.protosearch.analysis.Analyzer + +val analyzer = Analyzer.default.withLowerCasing +val indexBldr = IndexBuilder.of[Book]( + (Field("title", analyzer, stored=true, indexed=true, positions=true), _.title), + (Field("author", analyzer, stored=true, indexed=true, positions=false), _.author), +) +``` + +And then we can finally index our `books` using the builder: + +```scala mdoc:silent +val index = indexBldr.fromList(books) +``` + +Finally we'll then need a `search` function to test out. +We use a `queryAnalyzer` with the same default field here to make sure our queries get the same analysis as our documents did at indexing time. + + +```scala mdoc:silent +val qAnalyzer = index.queryAnalyzer + +def search(q: String): List[Book] = + index.search(q) + .map(hits => hits.map(h => books(h.id))) + .fold(_ => Nil, identity) +``` + +Now we can use our `search` function to explore some different query types! \ No newline at end of file diff --git a/docs/queries.md b/docs/02-tutorial/02-querying.md similarity index 73% rename from docs/queries.md rename to docs/02-tutorial/02-querying.md index 7efd8b53..9eeff288 100644 --- a/docs/queries.md +++ b/docs/02-tutorial/02-querying.md @@ -1,52 +1,24 @@ -# Queries +Querying +======== -Protosearch supports queries using boolean logic and a variety of advanced term queries. - -## Setup - -Let's setup a collection of books to search over: +We'll quickly setup the same index from the [Indexing Tutorial]: ```scala mdoc:silent -case class Book(author: String, title: String) +import pink.cozydev.protosearch.{Field, IndexBuilder} +import pink.cozydev.protosearch.analysis.Analyzer +case class Book(author: String, title: String) val books: List[Book] = List( Book("Beatrix Potter", "The Tale of Peter Rabbit"), Book("Beatrix Potter", "The Tale of Two Bad Mice"), Book("Dr. Seuss", "One Fish, Two Fish, Red Fish, Blue Fish"), - Book("Dr. Seuss", "Green Eggs and Ham"), -) -``` - -In order to index our domain type `Book`, we'll need a few things: -- An `Analyzer` to convert strings of text into tokens. -- `Field`s to tell the index what kind of data we want to store -- A way to get the values for each of the fields for a given `Book` - -We'll pass all these things to an `IndexBuilder`: - -```scala mdoc:silent -import pink.cozydev.protosearch.{Field, IndexBuilder} -import pink.cozydev.protosearch.analysis.Analyzer + Book("Dr. Seuss", "Green Eggs and Ham")) val analyzer = Analyzer.default.withLowerCasing -val indexBldr = IndexBuilder.of[Book]( +val index = IndexBuilder.of[Book]( (Field("title", analyzer, stored=true, indexed=true, positions=true), _.title), (Field("author", analyzer, stored=true, indexed=true, positions=false), _.author), -) -``` - -And then we can finally index our `books` using the builder: - -```scala mdoc:silent -val index = indexBldr.fromList(books) -``` - -Finally we'll then need a `search` function to test out. -We use a `queryAnalyzer` with the same default field here to make sure our queries get the same analysis as our documents did at indexing time. - - -```scala mdoc:silent -val qAnalyzer = index.queryAnalyzer +).fromList(books) def search(q: String): List[Book] = index.search(q) diff --git a/docs/02-tutorial/directory.conf b/docs/02-tutorial/directory.conf new file mode 100644 index 00000000..2c67f8bc --- /dev/null +++ b/docs/02-tutorial/directory.conf @@ -0,0 +1 @@ +laika.title = Tutorial diff --git a/docs/index.md b/docs/index.md index 19442c20..3cd989ae 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,30 +1,8 @@ -# Protosearch +Protosearch +=========== -Protosearch is pre-alpha software, do not use in production. +Protosearch is a prototype search library under active development (hence the "proto"). +We're currently focussing on end to end functionality, and not yet worrying too much about API stability or performance throughout. -Protosearch is a prototype of a [Lucene][lucene] style search library in pure scala. - - -## Goals - -- Provide building blocks for search on Typelevel sites -- Enable indexing on JVM, searching in browser JS -- Cross compile to JVM / JS / Native -- Support full Lucene query syntax -- Be safe, functional, and performant - -## Non Goals - -- Competing with or somehow surpassing Lucene -- Being a distributed search like Elasticsearch -- Heavy write workloads - - -## Lucene Inspired - -It's worth calling out how [Lucene][lucene] inspired this library is. -Lucene is an absolutely incredible piece of software. -It has been optimized and extended by a large community for well over 20 years. -If you are looking for very performant search, with a wide range of language support, flexibility and features, you won't find anything better than Lucene on the JVM. - -[lucene]: https://lucene.apache.org/ +Learn more about Protosearch by reading about our [Features] or [Design Goals]. +Additionally you can follow the [Indexing Tutorial] to get up and running. \ No newline at end of file