-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SOLR-14726: Initial draft of a new quickstart guide #2594
base: branch_8x
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,140 @@ | ||
= Quickstart Guide | ||
:experimental: | ||
// Licensed to the Apache Software Foundation (ASF) under one | ||
// or more contributor license agreements. See the NOTICE file | ||
// distributed with this work for additional information | ||
// regarding copyright ownership. The ASF licenses this file | ||
// to you under the Apache License, Version 2.0 (the | ||
// "License"); you may not use this file except in compliance | ||
// with the License. You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, | ||
// software distributed under the License is distributed on an | ||
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
// KIND, either express or implied. See the License for the | ||
// specific language governing permissions and limitations | ||
// under the License. | ||
|
||
Here's a quickstart guide to start Solr, add some documents and perform some searches. | ||
|
||
== Starting Solr | ||
|
||
Start a Solr node in cluster mode (SolrCloud mode) | ||
|
||
[source,subs="verbatim,attributes+"] | ||
---- | ||
$ bin/solr -c | ||
|
||
Waiting up to 180 seconds to see Solr running on port 8983 [\] | ||
Started Solr server on port 8983 (pid=34942). Happy searching! | ||
---- | ||
|
||
To start another Solr node and have it join the cluster alongside the first node, | ||
|
||
[source,subs="verbatim,attributes+"] | ||
---- | ||
$ bin/solr -c -z localhost:9983 -p 8984 | ||
---- | ||
|
||
An instance of the cluster coordination service, i.e. Zookeeper, was started on port 9983 when the first node was started. To start Zookeeper separately, please refer to XXXX. | ||
|
||
== Creating a collection | ||
|
||
Like a database system holds data in tables, Solr holds data in collections. A collection can be created as follows: | ||
|
||
[source,subs="verbatim,attributes+"] | ||
---- | ||
$ curl --request POST \ | ||
--url http://localhost:8983/api/collections \ | ||
--header 'Content-Type: application/json' \ | ||
--data '{ | ||
"create": { | ||
"name": "techproducts", | ||
"numShards": 1, | ||
"replicationFactor": 1 | ||
} | ||
}' | ||
---- | ||
|
||
== Indexing documents | ||
|
||
A single document can be indexed as: | ||
[source,subs="verbatim,attributes+"] | ||
---- | ||
$ curl --request POST \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A nit pick is that the collection is tech productions, and we have books..... Maybe we should think (separately) renaming There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, good idea. I just took those docs off the Solr tutorial (which indexes books into techproducts). But, clearly, it is time for a better example. |
||
--url 'http://localhost:8983/api/collections/techproducts/update' \ | ||
--header 'Content-Type: application/json' \ | ||
--data ' { | ||
"id" : "978-0641723445", | ||
"cat" : ["book","hardcover"], | ||
"name" : "The Lightning Thief", | ||
"author" : "Rick Riordan", | ||
"series_t" : "Percy Jackson and the Olympians", | ||
"sequence_i" : 1, | ||
"genre_s" : "fantasy", | ||
"inStock" : true, | ||
"price" : 12.50, | ||
"pages_i" : 384 | ||
}' | ||
---- | ||
|
||
Multiple documents can be indexed in the same request: | ||
[source,subs="verbatim,attributes+"] | ||
---- | ||
$ curl --request POST \ | ||
--url 'http://localhost:8983/api/collections/techproducts/update' \ | ||
--header 'Content-Type: application/json' \ | ||
--data ' [ | ||
{ | ||
"id" : "978-0641723445", | ||
"cat" : ["book","hardcover"], | ||
"name" : "The Lightning Thief", | ||
"author" : "Rick Riordan", | ||
"series_t" : "Percy Jackson and the Olympians", | ||
"sequence_i" : 1, | ||
"genre_s" : "fantasy", | ||
"inStock" : true, | ||
"price" : 12.50, | ||
"pages_i" : 384 | ||
} | ||
, | ||
{ | ||
"id" : "978-1423103349", | ||
"cat" : ["book","paperback"], | ||
"name" : "The Sea of Monsters", | ||
"author" : "Rick Riordan", | ||
"series_t" : "Percy Jackson and the Olympians", | ||
"sequence_i" : 2, | ||
"genre_s" : "fantasy", | ||
"inStock" : true, | ||
"price" : 6.49, | ||
"pages_i" : 304 | ||
} | ||
]' | ||
---- | ||
|
||
A file containing the documents can be indexed as follows: | ||
[source,subs="verbatim,attributes+"] | ||
---- | ||
$ curl -X POST -d @example/exampledocs/books.json http://localhost:8983/api/collections/techproducts/update | ||
---- | ||
|
||
== Commit | ||
After documents are indexed into a collection, they are not immediately available for searching. In order to have them searchable, a commit operation (also called `refresh` in other search engines like OpenSearch etc.) is needed. Commits can be scheduled at periodic intervals using auto-commits as follows. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i don't know if introducing terms used by other search engines is useful... though maybe we want to build up a gloassary that would list "equivalent" terms from other engines? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A glossary sounds like a very good idea, for people coming from different systems. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I feel that those coming from ES / OpenSearch backgrounds might be able to relate better. My main motivation with this document is to cut down on paragraphs of text and have more copy-paste-able snippets, esp. using JSON/V2 apis, to make Solr more appealing to those who find ES easy to use (mainly due to their superior beginner documentation). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That makes sense... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "solr for ES/OS refugees" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makes sense to me as a point of reference. It might be more economical to say "(also called |
||
|
||
[source,subs="verbatim,attributes+"] | ||
---- | ||
$ curl -X POST -H 'Content-type: application/json' -d '{"set-property":{"updateHandler.autoCommit.maxTime":15000}}' http://localhost:8983/api/collections/techproducts/config | ||
---- | ||
|
||
Alternatively, `commit=true` can be passed to calls to `/update` handler (in above examples) to commit immediately after indexing the document. Committing after every document (or a small batch of documents) is not recommended. Here's how one can send a commit: | ||
[source,subs="verbatim,attributes+"] | ||
---- | ||
$ curl -X POST http://localhost:8983/api/collections/techproducts/update?commit=true | ||
---- | ||
|
||
== Basic search queries | ||
|
||
... TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why no
config
attribute ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the same. If the consensus is that we're going away from field guessing, then we should not promote the current _default config, but rather be explicit and reference the bundled
techproducts
configset. Or better, show them how to use Schema Designer to setup a configset for a certain dataset?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For quickstart examples, we don't need the user to use their own configsets. They can start with the default configset, add fields (schema API) and their indexing/searching.
I'm more inclined to remove the techproducts configset. They can be downloaded from some web resource for those who need it.
+1