Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ZEPPELIN-6060] Integrate Thrift code generation with Maven build and enhance genthrift.sh #4797

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

jhl8109
Copy link

@jhl8109 jhl8109 commented Aug 17, 2024

What is this PR for?

This PR aims to enhance the build process in Zeppelin by integrating Thrift Java file generation into the Maven build workflow. It also updates the genthrift.sh script to automatically handle Thrift installation. These changes are intended to reduce manual errors and ensure consistent build environments.

What type of PR is it?

Improvement

Todos

  • - Add exec-maven-plugin to pom.xml to automate the generation of Java files from Thrift IDL during the Maven build process.
  • - Enhance genthrift.sh to support automatic Thrift installation, including improved checks for Thrift being installed.

What is the Jira issue?

How should this be tested?

  1. Verify Thrift installation
  2. Confirm Thrift .java file generation through Maven

Questions:

  • Does the license files need to update? NO
  • Is there breaking changes for older versions? NO
  • Does this needs documentation? NO

@Reamer
Copy link
Contributor

Reamer commented Aug 19, 2024

Installation of packages via yum or apt-get requires root rights.
Maven runs without root rights.

I also looked at the problem with thrift some time ago and hoped that the Java classes could be generated with a Maven plugin. Similar to GRPC. Unfortunately, I have not found anything.

My goal was just to have the thrift files in git and generate the thrift classes each time. But due to the installation process I don't see a way to do this. How do other Thrift projects implement it?

@JooHyukKim
Copy link
Contributor

Did you come up with genthrift.sh script all by yourself, or is there reference(or similar script) you referred to and share with us? Like @Reamer says, it would be good to refer to how other projects handle situations like this.

install_thrift_macos() {
echo "Installing Thrift ${THRIFT_VERSION} on macOS..."
if command -v brew &> /dev/null; then
brew install [email protected]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does not work. and formulae with the suffix name@<version> won't be added into PATH automaticlly.

$ brew install [email protected]
Warning: No available formula with the name "[email protected]". Did you mean [email protected]?
==> Searching for similarly named formulae and casks...
==> Formulae
[email protected]

To install [email protected], run:
  brew install [email protected]

@pan3793
Copy link
Member

pan3793 commented Aug 19, 2024

integrating thrift codegen into the regular build process makes the developer hard to set up the environment, because thrift community does not publish the convenient thrift binary on releases, the downstream developers have to build thrift from source or get from the platform package managers like apt/brew, neither ways are convenient due to additional native deps requirements or only a few of versions providing.

given the generated thrift code won't change frequently, tracking the generated thrift code and providing a script to regenerate it is a common way. I think we can make some improvements in genthrift.sh, for example, make it working dir agnostic, allow passing env THRIFT_EXEC to use a custom thrift

@jongyoul
Copy link
Member

Thank you for the discussion. My first goal is to ask him to divide thrift definition and generated files with zeppelin-intepreter module to minimize itself and clean up the module. During this job, I thought it would be better to generate thrift files easier. By the way, in my past experience, I provided all binaries for thrift compilers into a repository. Can we choose a similar way? You also can use docker to build the thrift files. Current implementation looks good personally but as others said it might be a quite difficult. WDYT?

@@ -17,176 +17,25 @@
# * limitations under the License.
# */

# Determine the base directory dynamically (directory where the script is located)
THRIFT_EXEC="$(dirname "$(realpath "$0")")"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pan3793 It seems feasible to implement the configuration to operate independently of the working directory using THRIFT_EXEC="$(dirname "$(realpath "$0")")". This approach allows for determining the location of the thrift script file itself and executing the thrift operations from that location.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your approach. However, the path is not the Thrift.exe, but the Zeppelin Thrift folder. Please change the variable name, it has just confused me a lot.

Copy link
Member

@pan3793 pan3793 Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THRIFT_EXEC should point to the thrift command directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why I prefer THRIFT_EXEC instead of THRIFT_HOME?

the thrift binary has no extra dependency and can be copied to any folder directly, for example, I put the thrift binaries under $HOME/bin

$ ll bin | grep thrift
lrwxr-xr-x@ 1 chengpan  staff    13B Aug 19 19:35 thrift -> thrift-0.16.0
-rwxr-xr-x@ 1 chengpan  staff   3.4M Aug 19 11:13 thrift-0.14.1
-r-xr-xr-x@ 1 chengpan  staff   3.3M Aug 19 18:35 thrift-0.16.0
-r-xr-xr-x@ 1 chengpan  staff   3.6M Aug  1 10:15 thrift-0.20.0
-rwxr-xr-x@ 1 chengpan  staff   2.9M Aug 19 11:12 thrift-0.9.3

@jhl8109
Copy link
Author

jhl8109 commented Sep 2, 2024

@jongyoul Installing thrift and helping with its setup could be quite complicated and challenging. Therefore, I considered using Docker, but the current Docker thrift image has been deprecated since version 0.12.0, and there is no support for the current version, 0.13.0.

However, through @Reamer's review, I found an alternative approach by exploring the thrift-related code here: https://github.com/apache/hbase/blob/master/hbase-thrift/pom.xml

Looking at this code, instead of supporting the installation directly, we could consider an approach where we simply check if thrift is installed and proceed only if it is. This method could reduce the build burden for users who don't have thrift installed while maintaining build consistency for those who do.
However, this approach might only be feasible because, in the example above, the thrift code is separated into its own module.

Copy link
Contributor

@Reamer Reamer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your changes in genthrift.sh, especially the comments. I think the changes in pom.xml are unnecessary.

@@ -17,176 +17,25 @@
# * limitations under the License.
# */

# Determine the base directory dynamically (directory where the script is located)
THRIFT_EXEC="$(dirname "$(realpath "$0")")"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your approach. However, the path is not the Thrift.exe, but the Zeppelin Thrift folder. Please change the variable name, it has just confused me a lot.

<version>3.0.0</version>
<executions>
<execution>
<!-- This execution runs during the compile phase to generate Java files from Thrift files -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thrift is not installed in the CI pipeline, so these steps can only be executed using profiles similar to hbase-thrift.

<executable>bash</executable>
<arguments>
<argument>${basedir}/src/main/thrift/genthrift.sh</argument>
<argument>install</argument>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

genthrift.sh does not support arguments, so the execution can be removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants