-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support HitsLoggerPlugin #695
Conversation
@@ -363,6 +364,7 @@ private void initExtendableComponents( | |||
RescorerCreator.initialize(configuration, plugins); | |||
ScriptService.initialize(configuration, plugins); | |||
SimilarityCreator.initialize(configuration, plugins); | |||
HitsLoggerCreator.initialize(configuration, plugins); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this block should be in alphabetical order
private void register(HitsLoggerProvider<?> hitsLoggerProvider) { | ||
if (this.hitsLoggerProvider != null) { | ||
throw new IllegalArgumentException("Hits logger already exists"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This limitation would mean only one implementation of a logger can be active on a cluster at one time. That is pretty restrictive.
Is there a reason not use a mapping on name -> implementation, like we do for most the other extendable components? The query would then specify the implementation to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I actually thought about the two solutions, but thinking about it again, I don't think being very restrictive here makes sense anymore. I thought that a single logger active on a cluster would be enough but being flexible and allowing multiple logger implementations is more beneficial, especially for clusters with multiple index. I will update the solution to support multiple loggers.
public class HitsLoggerFetchTask implements FetchTask { | ||
private final HitsLogger hitsLogger; | ||
|
||
public HitsLoggerFetchTask(HitsLoggerCreator hitsLoggerCreator, LoggingHits loggingHits) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to pass the HitsLoggerCreator
, it is a static singleton
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I was following what has been done in HighlightFetchTask here and missed that I can just call HitsLoggerCreator.getInstance().createHitsLogger(loggingHits) instead
// number of hits to log. For example, totalHits = 20 and hitsCount = 50 means that 50 documents will be logged | ||
// while 20 documents will be returned in the Search Response | ||
int32 hitsCount = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I see this value being used anywhere. Also, what happens when hitsCount
is less than topHits
? Is there a default value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added hitsCount to have LoggingHits already defined but I will use hitsCount in an upcoming PR. On a second thought, it is better to include hitsCount in the following PR with tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
*/ | ||
public interface HitsLoggerPlugin { | ||
default Map<String, HitsLoggerProvider<? extends HitsLogger>> getHitsLoggers() { | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to return an empty map instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree. I missed this change, thanks for pointing this out!
@@ -614,6 +614,8 @@ message SearchRequest { | |||
map<string, InnerHit> inner_hits = 26; | |||
// Defines runtime fields for this query | |||
repeated RuntimeField runtimeFields = 27; | |||
// Any custom logging that should log hits after ranking | |||
LoggingHits loggingHits = 28; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make this field id 29
, 28 is used in the v1 branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Just one minor comment for future work.
LoggingHits is defined in nrtsearch but it seems it will only work with plugins now. It might be good to have a simple default HitLogger which logs response to console. But this does not affect the ongoing task and could be in future prs
@taoyyu I like your suggestion! I will keep it in mind and potentially add in an upcoming PR. |
Adding a new plugin interface: HitsLoggerPlugin which will enable NRTSearch to log hits during the FetchTask phase (Next step: add implementation for hitsCount which allows the option to return a smaller amount of hits compared to the number of hits being logged)
General request flow is very similar to the HighligherPlugin