Thursday, August 14, 2014

Clarifying elasticsearch TopChildren, "factor" & "estimated hits size"

I found the TopChildren documentation to not be totally clear. So here is my clarification.

The "estimated hits size" (also reffered to in the documentation as "hits expected") referes to the number of child documents hits. That is to say - how many child documents will be looked for in the query on the child docs.

The set of child documents thus found, are then aggregated into parents.

If you asked for 10 parents (query size=10), elasticsearch will use the default factor value of 5, and search for 50 child documents (the "hits expected" as mentioned above). The found documents will then be aggregated into parent documents. 

In case several child docs belong to the same parent, the aggregation may result in less parents than asked for. In this case, if there are additional child documents to query, elasticsearch will expand the query to include more child doc, using the incremental_factor parameter.

The total_hits in the response would not be accurate if the "estimated hits size" is less than the number of child documents which actually match the query. The larger the "estimated hits size" is (controlled by the factor parameter), the larger the potentiall total_hits. But this of course hurts performance.

An additional factor to be aware of, is that the x amount of parent documents is the number of docs returned by the TopChildren query itself. This amount may be further reduced by adjacent or higher -level queries/filters.
If this short explanation clarifyed things for you, please leave a comment and let me know :)

The writer is R&D team leader at Niloosoft Hunter HRMS

No comments:

Post a Comment