-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ZEPPELIN-6086] Remove Spark Shims and unofficial support for Spark 4.0 #4825
Conversation
.github/workflows/core.yml
Outdated
- name: run spark-4.0 tests with scala-2.13 and python-${{ matrix.python }} | ||
if: matrix.java >= '17' | ||
run: | | ||
rm -rf spark/interpreter/metastore_db | ||
./mvnw verify -pl spark-submit,spark/interpreter -am -Dtest=org/apache/zeppelin/spark/* -Pspark-4.0 -Pspark-scala-2.13 -Phadoop3 -Pintegration -DfailIfNoTests=false ${MAVEN_ARGS} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plan would be to comment this out until 4.0 is officially out and the max supported version can be bumped
@@ -87,7 +87,7 @@ public SparkInterpreter(Properties properties) { | |||
} | |||
|
|||
this.enableSupportedVersionCheck = java.lang.Boolean.parseBoolean( | |||
properties.getProperty("zeppelin.spark.enableSupportedVersionCheck", "true")); | |||
properties.getProperty("zeppelin.spark.enableSupportedVersionCheck", "false")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will re-enable this, but need to disable this for tests to run for 4.0
if (sparkMajorVersion == 3 || sparkMajorVersion == 4) { | ||
LOGGER.info("Initializing shims for Spark 3.x"); | ||
sparkShimsClass = Class.forName("org.apache.zeppelin.spark.Spark3Shims"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current shims appear to still work fine. I'm not sure if we want to rename the shim, or remove the shim since different functionality isn't needed right now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we should remove the shim layer to simplify the codebase given the Spark API is pretty stable in recent versions, the corner compatibility cases could be resolved by using reflection directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I'll work on that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been trying to remove the shims and consolidate things but I keep running into class loading errors and I'm not really sure what to do about it. I moved most from the shim into a SparkInterpreterUtils
class in a spark-common
module, but just keep running into
IllegalAccess failed to access class org.apache.zeppelin.spark.SparkInterpreterUtils from class org.apache.zeppelin.spark.SparkScala213Interpreter (org.apache.zeppelin.spark.SparkInterpreterUtils is in unnamed module of loader 'app'; org.apache.zeppelin.spark.SparkScala213Interpreter is in unnamed module of loader java.net.URLClassLoader @2c768ada)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the error indicates they are loaded by different classloaders ... classloader issues are always quirky, and I may not be able to give more advice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spark community proposes postponing the 4.0.0 release ETA to 2025, could you please make a dedicated PR for shims remove? we can merge that earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would still be nice to be able to use the preview releases in Zeppelin which this enables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the only things actually related to Spark 4 in this are:
- Maven profile which will be helpful to be able to run manual tests against the preview release
- CI stage for testing that will have to be removed or commented out since it's not an officially supported version
So if/when you're good with everything else, I can just remove the CI stage (preferably keep the profile in), and it will mostly be a shim removal PR that just happens to remove the requirement that the major version is 3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed those back and update the title and description
### What is this PR for? Add support for Spark 4 by removing the spark shims. There is only one version of shims now, and they are compatible with the current Spark 4.0 preview releases, so we can just remove the shim concept. ### What type of PR is it? Improvement ### Todos * [x] - Disable 4.0 tests in CI and re-enable supported version check ### What is the Jira issue? ZEPPELIN-6086 ### How should this be tested? A CI stage is added for Spark 4.0 tests, but these will need to be disabled until 4.0 is officially supported. ### Screenshots (if appropriate) ### Questions: * Does the license files need to update? No * Is there breaking changes for older versions? No * Does this needs documentation? Maybe Closes #4825 from Kimahriman/spark-4.0. Signed-off-by: Cheng Pan <[email protected]> (cherry picked from commit 79f30d5) Signed-off-by: Cheng Pan <[email protected]>
Thanks, merged to master/0.12 |
What is this PR for?
Add support for Spark 4 by removing the spark shims. There is only one version of shims now, and they are compatible with the current Spark 4.0 preview releases, so we can just remove the shim concept.
What type of PR is it?
Improvement
Todos
What is the Jira issue?
ZEPPELIN-6086
How should this be tested?
A CI stage is added for Spark 4.0 tests, but these will need to be disabled until 4.0 is officially supported.
Screenshots (if appropriate)
Questions: