Optimize Search Results

Reinforcement learning

Background

Reinforcement learning uses feedback from your past searches to improve the results of future searches.

The learning process is statistically driven with some randomization of probability. As a result, the order of results for given searches will move around and find their optimal ordering over time.

See our blog on how reinforcement learning in search works for more details around how this is implemented.

Benefits

The main benefit of this process is that it will improve your search results performance. This should lead to an improvement in click-through rate (CTR) of the search results. But it can also be used to increase purchases, signups, self-service transactions or any event you choose to track.

Considerations

Your result ordering will change from query to query. This change will be largest when the performance data volume is smallest, corresponding to the biggest uncertainty. The movement will stabilize faster for high volume queries.

The impact of other algorithm components is more difficult to understand when reinforcement learning is used. However, we strongly advise focusing on outcomes over output. Meaning that you should measure metrics such as purchases or self-service transactions over manually trying to optimize the search result order.

How to implement reinforcement learning?

There are two main pieces, but both are relatively simple to get up and running.

  1. Implement either Click or PosNeg tracking
  2. Add the index-text-score-instance-boost step to your query pipeline

Configuring the learning step

There are several components to the learning step, which by default in YAML pipeline config looks something like this:

- id: index-text-score-instance-boost
    consts:
      minCount:
        - value: 20
      threshold:
        - value: 0.4

minCount sets a minimum number of queries before the reinforcement learning will be used for a given query.

threshold is a beta distribution mean score target threshold. In practice, this is applied as min(threshold, beta) where beta is the actual score. In simple terms, this means items above this score will all be equally as good.

Notes:

  • a very low minCount will be very noisy as there is not enough to get a good probability distribution.
  • a very high threshold can end up penalizing all results, which will cause lots of experimentation and rotation of results.