Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support HitsLoggerPlugin #695

Merged
merged 14 commits into from
Aug 21, 2024
Merged

Support HitsLoggerPlugin #695

merged 14 commits into from
Aug 21, 2024

Conversation

fragosoluana
Copy link
Member

Adding a new plugin interface: HitsLoggerPlugin which will enable NRTSearch to log hits during the FetchTask phase (Next step: add implementation for hitsCount which allows the option to return a smaller amount of hits compared to the number of hits being logged)

General request flow is very similar to the HighligherPlugin

  • added a new creator - HitsLoggerCreator to initialize the plugins at the startup time and register them, so that later on, the search request processor can select the corresponding HitsLogger out of the registry
  • defined the HitsLogger interface and the HitsLoggerPlugin interface to provide the custom hits logger. Each hits logger plugin must implement getHitsLogger() to log hits

@fragosoluana fragosoluana changed the title Added HitsLoggerPlugin Support HitsLoggerPlugin Aug 16, 2024
@@ -363,6 +364,7 @@ private void initExtendableComponents(
RescorerCreator.initialize(configuration, plugins);
ScriptService.initialize(configuration, plugins);
SimilarityCreator.initialize(configuration, plugins);
HitsLoggerCreator.initialize(configuration, plugins);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this block should be in alphabetical order

Comment on lines 36 to 39
private void register(HitsLoggerProvider<?> hitsLoggerProvider) {
if (this.hitsLoggerProvider != null) {
throw new IllegalArgumentException("Hits logger already exists");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This limitation would mean only one implementation of a logger can be active on a cluster at one time. That is pretty restrictive.

Is there a reason not use a mapping on name -> implementation, like we do for most the other extendable components? The query would then specify the implementation to use.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I actually thought about the two solutions, but thinking about it again, I don't think being very restrictive here makes sense anymore. I thought that a single logger active on a cluster would be enough but being flexible and allowing multiple logger implementations is more beneficial, especially for clusters with multiple index. I will update the solution to support multiple loggers.

public class HitsLoggerFetchTask implements FetchTask {
private final HitsLogger hitsLogger;

public HitsLoggerFetchTask(HitsLoggerCreator hitsLoggerCreator, LoggingHits loggingHits) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to pass the HitsLoggerCreator, it is a static singleton

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I was following what has been done in HighlightFetchTask here and missed that I can just call HitsLoggerCreator.getInstance().createHitsLogger(loggingHits) instead

Comment on lines 1071 to 1073
// number of hits to log. For example, totalHits = 20 and hitsCount = 50 means that 50 documents will be logged
// while 20 documents will be returned in the Search Response
int32 hitsCount = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I see this value being used anywhere. Also, what happens when hitsCount is less than topHits? Is there a default value?

Copy link
Member Author

@fragosoluana fragosoluana Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added hitsCount to have LoggingHits already defined but I will use hitsCount in an upcoming PR. On a second thought, it is better to include hitsCount in the following PR with tests.

Copy link
Contributor

@aprudhomme aprudhomme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

*/
public interface HitsLoggerPlugin {
default Map<String, HitsLoggerProvider<? extends HitsLogger>> getHitsLoggers() {
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to return an empty map instead

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree. I missed this change, thanks for pointing this out!

@@ -614,6 +614,8 @@ message SearchRequest {
map<string, InnerHit> inner_hits = 26;
// Defines runtime fields for this query
repeated RuntimeField runtimeFields = 27;
// Any custom logging that should log hits after ranking
LoggingHits loggingHits = 28;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this field id 29, 28 is used in the v1 branch

Copy link
Contributor

@taoyyu taoyyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.
Just one minor comment for future work.
LoggingHits is defined in nrtsearch but it seems it will only work with plugins now. It might be good to have a simple default HitLogger which logs response to console. But this does not affect the ongoing task and could be in future prs

@fragosoluana
Copy link
Member Author

@taoyyu I like your suggestion! I will keep it in mind and potentially add in an upcoming PR.

@fragosoluana fragosoluana merged commit 67f2893 into master Aug 21, 2024
1 check passed
@fragosoluana fragosoluana deleted the luanafragoso_hits_logger branch August 21, 2024 22:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants