-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement String/Binary View Support #340
Conversation
@skyth540 if you get the chance to test this out on your end that would be very helpful |
Hmm the polars behavior looks strange - opened pola-rs/polars#18909 upstream to take a closer look |
How do I get the changes? Last time I built from main, nothing was different from the normal pip release |
d0bfd5b
to
bec34af
Compare
You can pip install directly from the branch:
However, probably worth tracking the aforementioned issue before diving too far in. I think there are some bugs upstream in polars that might need to be fixed first |
It is erroring Building wheels for collected packages: pantab × Building wheel for pantab (pyproject.toml) did not run successfully.
cannot convert argument 1 from 'ArrowBufferViewData' to 'std::basic_string<char,std::char_traits,std::allocator> &&'
note: This error originates from a subprocess, and is likely not a problem with pip. |
bec34af
to
f55a340
Compare
Sorry about that. Should be working now if you'd like to give it another go |
Looks like something is still breaking... I successfully installed it from the branch, but using from_to_hyper is crashing my kernel. Here are the jupyter logs: Visual Studio Code (1.93.1, undefined, desktop) |
Do you have a small code sample that is crashing? |
|
Thanks, but I can't do much with that, especially not having access to a windows machine. Can you try and identify a subset of the data that can be used as a fully reproducible code sample? |
Here is basically my issue:
|
Great thanks. I'm surprised you aren't getting a better error message, but I am getting So there must be some kind of issue with error handling on Windows, but more generally the issue is that we don't support dictionary types, which I assume polars is using for storage on the categorical type. If you cast that to string or remove from the dataset it should work. Can you open a separate issue to request dictionary support? It ultimately will get you to the same place as strings (since Hyper does not support such a feature), but I can see that as a nice usability perk to write that data type (assuming strings are being held) |
That's a bummer... I switched from strings to categoricals and had a huge performance increase, as well as making some of what I wanted to do, possible, with RAM limitations (categoricals are more efficient). But if hyper doesn't support them then I'll switch back to strings |
To be clear we can theoretically write them and keep the memory savings there. However, Hyper cannot store strings encoded in that manner (at least not today), so when you read the same data back it will come back as a string |
😅 Switching to strings still errors with the code sample I have above schema = { |
I think this PR in its current state does what we need to do for string / binary views, so merging as is. |
closes #333 and #316