Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: pandas.api.interchange.from_dataframe now uses the Arrow PyCapsule Interface if available, only falling back to the Dataframe Interchange Protocol if that fails #60739

Merged
merged 3 commits into from
Jan 21, 2025

Conversation

MarcoGorelli
Copy link
Member

@MarcoGorelli MarcoGorelli commented Jan 20, 2025

There's some related discussion to the interchange protocol in #56732

Regardless of whether it gets deprecated, what we can already do is prefer the PyCapsule Interface if it's available. I'd discussed this informally with @WillAyd

pandas.api.interchange.from_dataframe is still used in some places (Seaborn, Plotly<6.0, Altair<5.4) where people may want to upgrade pandas whilst keeping some other deps pinned, so this should make the transition to the PyCapsule Interface seamless for them

@MarcoGorelli MarcoGorelli added the Interchange Dataframe Interchange Protocol label Jan 20, 2025
tm.assert_frame_equal(result, expected)
tm.assert_frame_equal(result, expected, check_column_type=False)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is

(Pdb) p result.columns
Index([], dtype='object')
(Pdb) p expected.columns
RangeIndex(start=0, stop=0, step=1)

Given that this only affects dataframes with zero columns, are we OK with this difference? I am

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally RangeIndex would be better, but OK to punt

@MarcoGorelli MarcoGorelli marked this pull request as ready for review January 20, 2025 14:25
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

RuntimeError,
match="To join chunks a copy is required which is "
"forbidden by allow_copy=False",
pa.ArrowInvalid,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have generally been trying to catch the pyarrow error internally and raise something more generic from pandas but @mroeschke would know best

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes ideally this would be the case, but we're definitely not consistent about it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, i'm catching to re-raise RunTimeError (which is what currently gets raised)

@@ -30,6 +30,7 @@ Other enhancements
^^^^^^^^^^^^^^^^^^
- :class:`pandas.api.typing.FrozenList` is available for typing the outputs of :attr:`MultiIndex.names`, :attr:`MultiIndex.codes` and :attr:`MultiIndex.levels` (:issue:`58237`)
- :class:`pandas.api.typing.SASReader` is available for typing the output of :func:`read_sas` (:issue:`55689`)
- :meth:`pandas.api.interchange.from_dataframe` now uses the [Arrow PyCapsule Interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html) if available, only falling back to the Dataframe Interchange Protocol if that fails (:issue:`60739`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll probably need to use the rst version of making Arrow PyCapsule Interface hyperlink

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, been too long since i made a PR 🤦 😳

@mroeschke mroeschke added this to the 3.0 milestone Jan 21, 2025
@mroeschke mroeschke merged commit bbd6526 into pandas-dev:main Jan 21, 2025
45 of 51 checks passed
@mroeschke
Copy link
Member

Thanks @MarcoGorelli

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Interchange Dataframe Interchange Protocol
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants