Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Dataframe mode with parquet #25

Merged
merged 12 commits into from
Dec 16, 2024

Conversation

sfc-gh-dbabbjimenez
Copy link
Contributor

Motivation & Context

JIRA: SNOW-1854994

Description

This pull request includes significant updates to the Demos/demo_pyspark_pipeline.py and Demos/demo_snowpark_pipeline.py files, focusing on the handling of schema and data checkpoints, as well as some minor changes to the snow_connection.py and summary_stats_collector.py files.

Changes to schema and data checkpoints:

  • Demos/demo_pyspark_pipeline.py: Added import for CheckpointMode and updated the collect_dataframe_checkpoint function to include a new mode parameter. [1] [2]
  • Demos/demo_snowpark_pipeline.py: Updated import statements to include CheckpointMode and validate_dataframe_checkpoint. Replaced check_dataframe_schema_file with validate_dataframe_checkpoint and added a new mode parameter. [1] [2]

Commenting out specific data types:

Minor changes:

How Has This Been Tested?

Checklist

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Data correction (data quality issue originating from upstream source or dataset)
  • Cleanup and optimization (improvement that does not alter the data returned by a model)
  • Other (please specify)
  • I attest that this change meets the bar for low risk without security requirements as defined in the Accelerated Risk Assessment Criteria and I have taken the Risk Assessment Training in Workday.
    • Checking this checkbox is mandatory if using the Accelerated Risk Assessment to risk assess the changes in this Pull Request.
    • If this change does not meet the bar for low risk without security requirements (as confirmed by the peer reviewers of this pull request) then a formal Risk Assessment must be completed. Please note that a formal Risk Assessment will require you to spend extra time performing a security review for this change. Please account for this extra time earlier rather than later to avoid unnecessary delays in the release process.

Copy link
Contributor

@sfc-gh-fgonzalezmendez sfc-gh-fgonzalezmendez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at the comments I left.

@sfc-gh-dbabbjimenez sfc-gh-dbabbjimenez merged commit 2b40973 into main Dec 16, 2024
32 checks passed
@sfc-gh-dbabbjimenez sfc-gh-dbabbjimenez deleted the feature/dbabb/SNOW-1854994 branch December 16, 2024 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants