Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ZEPPELIN-6045] Separate Thrift-related code from zeppelin-interpreter to new zeppelin-protocol module #4783

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jhl8109
Copy link

@jhl8109 jhl8109 commented Aug 6, 2024

What is this PR for?

This PR is for creating a new zeppelin-protocol module and moving Thrift-related code from the zeppelin-interpretermodule to improve modularity and maintainability.
Additionally, it includes updates to imports and dependencies required by the module separation.

What type of PR is it?

Improvement

Todos

  • - Create zeppelin-protocol module
  • - Move Thrift related code to zeppelin-protocol
  • - Update imports in various modules
  • - Update pom.xml files to include new dependencies for zeppelin-protocol

What is the Jira issue?

How should this be tested?

  • CI
  • zeppelin-protocol Maven Build

Questions:

  • Does the license files need to update? YES
  • Is there breaking changes for older versions? NO
  • Does this needs documentation? NO

@jhl8109 jhl8109 changed the title [ZEPPELIN-6045] Create zeppelin-protocol module and move Thrift-related code from zeppelin-interpreter [ZEPPELIN-6045] [ZEPPELIN-6045] Separate Thrift-related code from zeppelin-interpreter to new zeppelin-protocol module Aug 6, 2024
@jhl8109 jhl8109 changed the title [ZEPPELIN-6045] [ZEPPELIN-6045] Separate Thrift-related code from zeppelin-interpreter to new zeppelin-protocol module [ZEPPELIN-6045] Separate Thrift-related code from zeppelin-interpreter to new zeppelin-protocol module Aug 6, 2024
Copy link
Contributor

@Reamer Reamer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to keep the communication protocol between Zeppelin Server <-> Zeppelin Interpreter interchangeable, then spinning off the Thrift protocol into a separate module is the right approach.
In my opinion, however, the module should then have thrift in its name so that a future grpc module can contain grpc in its name.
If you provide both protocols in one module, the ballast of both protocols is present.
What do you think?

@@ -284,6 +284,11 @@
<artifactId>commons-compress</artifactId>
<version>${commons.compress.version}</version>
</dependency>
<dependency>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to include this transitive dependency from the module zeppelin-interpreter, then this should be in the upper part of the pom.xml.

@@ -41,7 +41,7 @@
import org.apache.zeppelin.interpreter.InterpreterContext;
import org.apache.zeppelin.interpreter.InterpreterResult;
import org.apache.zeppelin.interpreter.InterpreterResult.Code;
import org.apache.zeppelin.interpreter.thrift.InterpreterCompletion;
import org.apache.zeppelin.protocol.thrift.InterpreterCompletion;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this break 3rd interpreters?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so.

@pan3793
Copy link
Member

pan3793 commented Aug 7, 2024

... to improve modularity and maintainability.

I would like to see the actual benefits before pushing this change. for now, I only see breaking changes.

@jhl8109
Copy link
Author

jhl8109 commented Aug 7, 2024

@pan3793

Problem

  • Each module uses a different protocol, and the files are scattered across various locations. This makes management difficult and maintaining consistency challenging.
  • For example, Thrift is managed in zeppelin-interpreter and proto in zeppelin-jupyter-interpreter.

(Therefore, instead of creating a separate zeppelin-protocol module, another alternative could be to manage zeppelin.interpreter.thrift and zeppelin.interpreter.proto within zeppelin-interpreter to avoid unforeseen side effects due to InterpreterCompletion.)

Improvement Goal

This PR aims to improve protocol management by starting with separating Thrift files into a zeppelin-protocol module.

  • zeppelin-protocol will manage files generated from .thrift and .grpc files (.java, .py, etc.) collectively. This ensures consistent communication methods among interpreters via zeppelin-protocol. Additionally, using pom.xml to generate protocol-related files helps maintain build integrity without running shell scripts manually.

  • Interpreters needing communication can add zeppelin-protocol as a dependency. For instance, it can be difficult to know if proto communication is needed by zeppelin-interpreter or zeppelin-jupyter-interpreter. Adding zeppelin-protocol as a dependency makes accessing necessary protocol dependencies straightforward.

  • When new interpreters are added, the zeppelin-protocol module facilitates the easy expansion of communication features. For example, if Zeppelin introduces a new language interpreter, it can utilize the required protocol files from zeppelin-protocol. Additionally, if Thrift or gRPC features are expanded, updating zeppelin-protocol ensures that all related interpreters can benefit from these enhancements. Moreover, even if new protocol files or shell scripts are needed, they are managed within a single module, preventing an increase in scattered management points.

Lastly, I understand there may be concerns regarding this PR and am open to discussing alternatives. The primary goal is to efficiently manage scattered protocol-related shell scripts and generated files. Your feedback is crucial, so please feel free to share your thoughts.

Thanks

@jhl8109
Copy link
Author

jhl8109 commented Aug 7, 2024

@Reamer
Thank you for the feedback. Initially, I planned to use the zeppelin-protocol module, but for clarity and maintainability, it seems better to separate Thrift and gRPC.

Currently, we are uncertain about the potential breaking changes and this PR's benefits. Therefore, if I continue with this PR, I will update it.

@jongyoul
Copy link
Member

jongyoul commented Aug 8, 2024

I would like to see the actual benefits before pushing this change. for now, I only see breaking changes.

We need to mitigate zeppelin-interpreter dependency. Currently, zeppelin-interpreter is quite big and has a lot of other dependencies unrelated to interpreters. To clean it up, dividing possible features with another module would be better. My goal is to merge zeppelin-interpreter, zeppelin-interpreter-parent and zeppelin-interpreter-shaded. :-) (I know the history but I believe that we can find a better way to manage them)

Moreover, I agree that we change the name to zeppelin-thrift and so on as the first step. We also would better keep the package name to avoid breaking changes. we can do it later but we don't have to do it in the PR now.

@pan3793
Copy link
Member

pan3793 commented Aug 14, 2024

We need to mitigate zeppelin-interpreter dependency. Currently, zeppelin-interpreter is quite big and has a lot of other dependencies unrelated to interpreters.

I fully agree with that, and I also noticed the packaging time for zeppelin-interpreter-shaded is pretty slow due to a lot of classes needing to be relocated, and some of those deps come from ZEPPELIN-3471, I have raised a thread to remove it in the dev mailing list, please leave your comments to help the community make the decision.

Interpreters needing communication can add zeppelin-protocol as a dependency.

To address the above issue, I think we should separate the optional dependencies into individual modules, then we can eliminate them in some interpreters, but is zeppelin-protocol that case? I think it is required by all interpreters.

@jongyoul
Copy link
Member

To address the above issue, I think we should separate the optional dependencies into individual modules, then we can eliminate them in some interpreters, but is zeppelin-protocol that case? I think it is required by all interpreters.

Yes, you're right. It should be mandatory for all interpreters for now. However, it was because we only provide all functions within the zeppelin-interpreter module. If we provided them as a zeppelin-protocol including definition, I believe that we might have different python interpreter implementation like pure python module. :-) I want to give some chance to utilize thrift definition as contributors want to use. It's the first step. I don't know the future yet but I believe that it gives more choice to the community.

I fully agree with that, and I also noticed the packaging time for zeppelin-interpreter-shaded is pretty slow due to a lot of classes needing to be relocated, and some of those deps come from ZEPPELIN-3471, I have raised a thread to remove it in the dev mailing list, please leave your comments to help the community make the decision.

For the raft issue, let's move to the next step 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants