Indexing a document to a custom search index
In this guide, we will show you how to add a new document to your custom search index. We will be creating a script that indexes movie data, and we will discuss the objects and methods used in said script that enabled indexing. For simplicity's sake, the data we're going to index will be manually entered via service inputs.
Stuff you need to know...
This guide assumes that you have gone through the process of creating a custom Solr core or collection, and you already know how to create services in Gloop or Groovy.
Get the code!
The scripts mentioned in this guide are available in the examples
package.
As bonus, you can find other services in the examples
package that demonstrate
the use of functions from the SolrMethods
class,
as well as other Solr-related functionality.
Preparation
Before we get to indexing documents, we must ensure that our custom Solr core is already setup and connected to the Martini package we're going to use.
Here's the outline of our set-up:
- Our package is called
examples
. This is where our scripts will reside. -
Our target Solr core is embedded and named
movie-core
. As a result, the directory structure of theexamples
package is:1 2 3 4 5 6 7 8 9 10 11
examples ├── classes ├── code ├── conf ├── web └── solr └── movie-core └── core.properties └── conf └── schema.xml └── solrconfig.xml
-
The
examples
package'spackage.xml
file has already been edited to make the embedded Solr core known:1 2 3 4 5 6 7
<package> <!-- ... --> <solr-cores> <solr-core name="movie-core" enabled="true" /> <!-- ... --> </solr-cores> </package>
Creating the model
We need to create a model that can hold the data we want to index. In this case, we need to create a model for holding movie data.
You can manually create your Gloop model from scratch, or
you can extract the fields defined in the schema.xml
file to create a model based from it. In our case, we will do
the latter using the SchemaToGloopModelGenerator
service:
We have placed this script in examples
's code
directory, under solr.customSolrCore.model
. You should
be able to use this script to parse your own schema.xml
file. Depending on your setup, you may need to tweak
it a little more. Here's a breakdown of the Gloop steps it contains:
-
In Line 1, we have a map step that calls
GroovyMethods.getPackage()
to get the Martini package where the script resides. The return value is then stored in a variable calledmartiniPackage
. -
In Line 2, we have another map step that declares and initializes a
Path
variable that points toschema.xml
's location. We'll usemartiniPackage#getHome()
as the base path and from there, we can traverse toschema.xml
's actual location, like so:1
Paths.get(esbPackage.getHome(), 'solr', 'movie-core', 'conf', 'schema.xml')
-
In Line 3, we have added a third map step but this time, we use it to declare and initialize a
String
variable containingschema.xml
's content. We did that this way:1
Files.readAllBytes(movieCorePath);
Gloop conversion
You may have noticed that the last line of code read in a
byte
array, but the variable was a string. This is possible thanks to the GloopObjectToCharSequenceConverter
. -
In Line 4, we create an invoke step that calls
SolrMethods.solrSchemaToGloopModel(String, String, String, String, List<GloopModel>)
. This method will create the Gloop modelMovie
insolr.customSolrCore.model
, based on theschema.xml
file.1
SolrMethods.solrSchemaToGloopModel("MovieDocument", schemaContent, null, "solr.customSolrCore.model", null)
All you have to do now is run the service and voila!
You now have your schema.xml
-based Gloop model! If you're
following through our example, this will produce MovieDocument.model
in solr.customSolrCore.model
. We'll use
this model later.
1 2 3 4 5 6 7 |
|
The MovieDocument
model would have the following fields:
id
(String
)movieTitle
(String
)director
(String
)cast
(String[]
)
In this case, the Groovy bean class MovieDocument.groovy
will hold the movie data we want to index. We'll place it
under the solr.customSolrCore.model
package. Its content will be:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
The @Field
annotations indicate which fields we want to index.
Fields defined in the schema
If you will take a look at movie-core
's schema.xml
file, you will notice that its documents are defined so that
it has six fields: id
, movieTitle
, director
, cast
, _version_
, and text
.
id
is the identifier for our documents and whose value is automatically generated by Solr due to theUpdateRequestProcessorChain
configuration insolrconfig.xml
_version_
is, once again, a property whose value is automatically supplied by Solr and is an internal field used by the partial update procedure, update log process, and by SolrCloud; this field is required to perform optimistic concurrencytext
is a compilation of copied fields, and is used as the default search field when clients do their queries
The other fields are provided by the client.
Indexing the model's data
Since our model is ready, we can now create a service that gets and indexes the model's data. We'll populate our models manually to make things simpler.
Insert in bulk
You can use the SolrMethods.insertMany(...)
functions
to insert documents in bulk.
The MovieIndexer
service will be responsible for indexing our MovieDocument
's data. Here's a preview of
the steps we will have in this service:
MovieIndexer
's sole input parameter is called movieDocument
, based on the MovieDocument
Gloop model we
created earlier. Because of this, we will be prompted to enter four fields when we
run the service: id
, movieTitle
, director
and casts
.
Martini will build the movieDocument
parameter from our inputs and
from there, we can index movieDocument
via SolrMethods.index(String, String, GloopModel)
.
The bullet points below explain each step in the service:
- In Line 1, we have a
try
-catch
block step. This allows Gloop to mirror Java'stry
-catch
where it wraps the code that could possibly throw an exception in atry
block, and perform a "rescue" in thecatch
block. - In Line 3, under the
try
block, we have an invoke step that callsSolrMethods.index(String, String, GloopModel)
. This is where the actual indexing will happen. It'll indexmovieDocument
so that it will be available for querying inexamples
'smovie-core
Solr core later. - In Line 5, we have another invoke step that calls
LoggerMethods.error(String)
; this time, under thecatch
block. This will just log the exception if anything goes wrong whilst indexing.
Running the service will prompt you to populate the required MovieIndexer
model. You can enter whatever values you
want to index. The service, if invoked successfully, should return a response similar to below:
This time, we'll create an endpoint whose parameters are to be mapped to the MovieDocument
bean's fields. We can
just call this Spring-based endpoint and the indexing will take place.
Simply create a Groovy file named MovieSolrAPI
in solr.customSolr
and edit it so that it contains the code
below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
As you may notice, in:
- Line 25, we constructed a
MovieDocument
object (document
variable) from the parameters of our request. - Line 26, we used
SolrMethods.index(String, String, GloopModel)
method, a function, to index the data for us. We subsequently called theGloopMethod#toString()
method so that our endpoint's response is the indexedMovieDocument
model.
With that said, a call to the endpoint will trigger the indexing of your movie data. For example:
1 2 3 4 |
|
Try out the service via the service invoker
You can click on the run button shown at the beginning of the signature of a method to run the method.