Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG(string dtype): Empty sum produces incorrect result #60936

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

rhshadrach
Copy link
Member

@rhshadrach rhshadrach added Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Strings String extension data type and string data labels Feb 15, 2025
@rhshadrach
Copy link
Member Author

Friendly ping @WillAyd @jorisvandenbossche

@rhshadrach rhshadrach added this to the 2.3 milestone Feb 19, 2025
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good - minor question

if op.how == "sum":
# https://github.com/pandas-dev/pandas/issues/60229
# All NA should result in the empty string.
assert "skipna" in kwargs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the need for assert here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should always be adding skipna to kwargs by the time we get here. If that doesn't happen for some reason (e.g. a future refactor), it can help debugging as an assert failing indicates a clear violation of an assumption, key error perhaps not. If it did somehow end up in a release with user code raising, an assert also indicates to the user "this is clearly a bug in pandas".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG/API: sum of a string column with all-NaN or empty
2 participants