-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: pandas.api.interchange.from_dataframe
now uses the Arrow PyCapsule Interface if available, only falling back to the Dataframe Interchange Protocol if that fails
#60739
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -278,7 +278,7 @@ def test_empty_pyarrow(data): | |
expected = pd.DataFrame(data) | ||
arrow_df = pa_from_dataframe(expected) | ||
result = from_dataframe(arrow_df) | ||
tm.assert_frame_equal(result, expected) | ||
tm.assert_frame_equal(result, expected, check_column_type=False) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The difference is (Pdb) p result.columns
Index([], dtype='object')
(Pdb) p expected.columns
RangeIndex(start=0, stop=0, step=1) Given that this only affects dataframes with zero columns, are we OK with this difference? I am There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ideally |
||
|
||
|
||
def test_multi_chunk_pyarrow() -> None: | ||
|
@@ -287,9 +287,8 @@ def test_multi_chunk_pyarrow() -> None: | |
names = ["n_legs"] | ||
table = pa.table([n_legs], names=names) | ||
with pytest.raises( | ||
RuntimeError, | ||
match="To join chunks a copy is required which is " | ||
"forbidden by allow_copy=False", | ||
pa.ArrowInvalid, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we have generally been trying to catch the pyarrow error internally and raise something more generic from pandas but @mroeschke would know best There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes ideally this would be the case, but we're definitely not consistent about it There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sure, i'm catching to re-raise RunTimeError (which is what currently gets raised) |
||
match="Cannot do zero copy conversion into multi-column DataFrame block", | ||
): | ||
pd.api.interchange.from_dataframe(table, allow_copy=False) | ||
|
||
|
@@ -641,3 +640,12 @@ def test_buffer_dtype_categorical( | |
col = dfi.get_column_by_name("data") | ||
assert col.dtype == expected_dtype | ||
assert col.get_buffers()["data"][1] == expected_buffer_dtype | ||
|
||
|
||
def test_from_dataframe_list_dtype(): | ||
pa = pytest.importorskip("pyarrow", "14.0.0") | ||
data = {"a": [[1, 2], [4, 5, 6]]} | ||
tbl = pa.table(data) | ||
result = from_dataframe(tbl) | ||
expected = pd.DataFrame(data) | ||
tm.assert_frame_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll probably need to use the rst version of making
Arrow PyCapsule Interface
hyperlinkThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, been too long since i made a PR 🤦 😳