-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: pandas.api.interchange.from_dataframe
now uses the Arrow PyCapsule Interface if available, only falling back to the Dataframe Interchange Protocol if that fails
#60739
Conversation
tm.assert_frame_equal(result, expected) | ||
tm.assert_frame_equal(result, expected, check_column_type=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference is
(Pdb) p result.columns
Index([], dtype='object')
(Pdb) p expected.columns
RangeIndex(start=0, stop=0, step=1)
Given that this only affects dataframes with zero columns, are we OK with this difference? I am
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally RangeIndex
would be better, but OK to punt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
RuntimeError, | ||
match="To join chunks a copy is required which is " | ||
"forbidden by allow_copy=False", | ||
pa.ArrowInvalid, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have generally been trying to catch the pyarrow error internally and raise something more generic from pandas but @mroeschke would know best
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes ideally this would be the case, but we're definitely not consistent about it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, i'm catching to re-raise RunTimeError (which is what currently gets raised)
doc/source/whatsnew/v3.0.0.rst
Outdated
@@ -30,6 +30,7 @@ Other enhancements | |||
^^^^^^^^^^^^^^^^^^ | |||
- :class:`pandas.api.typing.FrozenList` is available for typing the outputs of :attr:`MultiIndex.names`, :attr:`MultiIndex.codes` and :attr:`MultiIndex.levels` (:issue:`58237`) | |||
- :class:`pandas.api.typing.SASReader` is available for typing the output of :func:`read_sas` (:issue:`55689`) | |||
- :meth:`pandas.api.interchange.from_dataframe` now uses the [Arrow PyCapsule Interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html) if available, only falling back to the Dataframe Interchange Protocol if that fails (:issue:`60739`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll probably need to use the rst version of making Arrow PyCapsule Interface
hyperlink
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, been too long since i made a PR 🤦 😳
Thanks @MarcoGorelli |
There's some related discussion to the interchange protocol in #56732
Regardless of whether it gets deprecated, what we can already do is prefer the PyCapsule Interface if it's available. I'd discussed this informally with @WillAyd
pandas.api.interchange.from_dataframe
is still used in some places (Seaborn, Plotly<6.0, Altair<5.4) where people may want to upgrade pandas whilst keeping some other deps pinned, so this should make the transition to the PyCapsule Interface seamless for them