Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ZEPPELIN-6086] Remove Spark Shims and unofficial support for Spark 4.0 #4825

Merged
merged 11 commits into from
Oct 1, 2024

Conversation

Kimahriman
Copy link
Contributor

@Kimahriman Kimahriman commented Sep 11, 2024

What is this PR for?

Add support for Spark 4 by removing the spark shims. There is only one version of shims now, and they are compatible with the current Spark 4.0 preview releases, so we can just remove the shim concept.

What type of PR is it?

Improvement

Todos

  • - Disable 4.0 tests in CI and re-enable supported version check

What is the Jira issue?

ZEPPELIN-6086

How should this be tested?

A CI stage is added for Spark 4.0 tests, but these will need to be disabled until 4.0 is officially supported.

Screenshots (if appropriate)

Questions:

  • Does the license files need to update? No
  • Is there breaking changes for older versions? No
  • Does this needs documentation? Maybe

Comment on lines 432 to 436
- name: run spark-4.0 tests with scala-2.13 and python-${{ matrix.python }}
if: matrix.java >= '17'
run: |
rm -rf spark/interpreter/metastore_db
./mvnw verify -pl spark-submit,spark/interpreter -am -Dtest=org/apache/zeppelin/spark/* -Pspark-4.0 -Pspark-scala-2.13 -Phadoop3 -Pintegration -DfailIfNoTests=false ${MAVEN_ARGS}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plan would be to comment this out until 4.0 is officially out and the max supported version can be bumped

@@ -87,7 +87,7 @@ public SparkInterpreter(Properties properties) {
}

this.enableSupportedVersionCheck = java.lang.Boolean.parseBoolean(
properties.getProperty("zeppelin.spark.enableSupportedVersionCheck", "true"));
properties.getProperty("zeppelin.spark.enableSupportedVersionCheck", "false"));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will re-enable this, but need to disable this for tests to run for 4.0

Comment on lines 61 to 63
if (sparkMajorVersion == 3 || sparkMajorVersion == 4) {
LOGGER.info("Initializing shims for Spark 3.x");
sparkShimsClass = Class.forName("org.apache.zeppelin.spark.Spark3Shims");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current shims appear to still work fine. I'm not sure if we want to rename the shim, or remove the shim since different functionality isn't needed right now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we should remove the shim layer to simplify the codebase given the Spark API is pretty stable in recent versions, the corner compatibility cases could be resolved by using reflection directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I'll work on that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been trying to remove the shims and consolidate things but I keep running into class loading errors and I'm not really sure what to do about it. I moved most from the shim into a SparkInterpreterUtils class in a spark-common module, but just keep running into

IllegalAccess failed to access class org.apache.zeppelin.spark.SparkInterpreterUtils from class org.apache.zeppelin.spark.SparkScala213Interpreter (org.apache.zeppelin.spark.SparkInterpreterUtils is in unnamed module of loader 'app'; org.apache.zeppelin.spark.SparkScala213Interpreter is in unnamed module of loader java.net.URLClassLoader @2c768ada)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the error indicates they are loaded by different classloaders ... classloader issues are always quirky, and I may not be able to give more advice

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark community proposes postponing the 4.0.0 release ETA to 2025, could you please make a dedicated PR for shims remove? we can merge that earlier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would still be nice to be able to use the preview releases in Zeppelin which this enables

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the only things actually related to Spark 4 in this are:

  • Maven profile which will be helpful to be able to run manual tests against the preview release
  • CI stage for testing that will have to be removed or commented out since it's not an officially supported version

So if/when you're good with everything else, I can just remove the CI stage (preferably keep the profile in), and it will mostly be a shim removal PR that just happens to remove the requirement that the major version is 3

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed those back and update the title and description

@Kimahriman Kimahriman changed the title [ZEPPELIN-6086] Unofficial support for Spark 4.0 [ZEPPELIN-6086] Remove Spark Shims and unofficial support for Spark 4.0 Sep 30, 2024
@pan3793 pan3793 merged commit 79f30d5 into apache:master Oct 1, 2024
28 checks passed
pan3793 pushed a commit that referenced this pull request Oct 1, 2024
### What is this PR for?

Add support for Spark 4 by removing the spark shims. There is only one version of shims now, and they are compatible with the current Spark 4.0 preview releases, so we can just remove the shim concept.

### What type of PR is it?
Improvement

### Todos
* [x] - Disable 4.0 tests in CI and re-enable supported version check

### What is the Jira issue?

ZEPPELIN-6086

### How should this be tested?
A CI stage is added for Spark 4.0 tests, but these will need to be disabled until 4.0 is officially supported.

### Screenshots (if appropriate)

### Questions:
* Does the license files need to update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? Maybe

Closes #4825 from Kimahriman/spark-4.0.

Signed-off-by: Cheng Pan <[email protected]>
(cherry picked from commit 79f30d5)
Signed-off-by: Cheng Pan <[email protected]>
@pan3793
Copy link
Member

pan3793 commented Oct 1, 2024

Thanks, merged to master/0.12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants