Indexing Data

Defining a Schema

Overview

Before creating an index of your data, you must define the data types (string, integer, etc) of the data you want to index. It helps ensures the data stored in an index is consistent.

Each record of a collection must adhere to the defined schema for it to be added to the index.

From the schema section in the Console, you can:

  • View the schema fields
  • Create a schema field

When you create a Site Search collection, a pre-built website schema template is used by default.

You can create new schema fields if you want to index more data. For example, if you are indexing a website that includes documents with specific metadata or content (e.g. "Document number"), you can add a schema field named document_number. The next time our crawler crawls the website, it will also index the document number (if they are tagged correctly). For more information on how to add custom tags for indexing, view Adding Custom Fields.

Schema for an Ecommerce or a Custom Collection

If you created an ecommerce or a custom collection, you need to define a schema from scratch. You can start by creating fields based on your data from the console.

If you want to generate a schema based on CSV or connect it to your database via an API, get in touch via support@sajari.com.

Understanding Schema fields

Each schema field has the following properties.

PropertyDescription
NameThe name of the field. This uniquely defines the field
TypeThe type of data stored in the field, see table below for description of types
Repeated (List)The field can contain a single or multiple values of the specified type. Also referred to as "List" in the Console
IndexedThis field will be matched when performing a search
RequiredThis field should be set for all records in your collection
UniqueThis value of this field should be different for every record in your collection

Field Types

The following field types can be used on the records in your Collection.

NameDescriptionExample
BOOLEANTwo possible values: true or falsetrue
INTEGERWhole numbers42
FLOATNumbers with a fractional value12.34
STRINGString of text"I'm the walrus"
TIMESTAMPDate/time value1234567890 (UNIX timestamp) or 2009-02-13T23:31:30+00:00 (RFC3339)