-
-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix/763 improve type annotations for DataFrameModel.validate #1905
Conversation
Signed-off-by: Matt Richards <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1905 +/- ##
==========================================
- Coverage 94.28% 93.36% -0.92%
==========================================
Files 91 121 +30
Lines 7013 9362 +2349
==========================================
+ Hits 6612 8741 +2129
- Misses 401 621 +220 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Matt Richards <[email protected]>
inplace: bool = False, | ||
) -> DataFrame[Self]: | ||
"""%(validate_doc)s""" | ||
return cast( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could potentially also all be in an if typing.TYPE_CHECKING block if that's cleaner than providing an implementation the same as the parent. I also don't know if it's preferable to use super() instead?
Edit: I also briefly looked at seeing if an overload on recieving a GeoDataFrame is possible. I had a solution working in my standalone test case, but it was pretty ugly with conditional overloads, and pre-commit mypy was not very happy with that.
Signed-off-by: Matt Richards <[email protected]>
|
||
@classmethod | ||
@overload | ||
def validate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With mypy in pre-commit, this gives:
pandera\api\polars\model.py:130: error: Overloaded function signature 2 will never be matched: signature 1's parameter type(s) are the same or broader [misc]
but that doesn't seem to be the case. I have this as a test case:
import typing
from typing import reveal_type
import pandera as pda
import pandera.polars as pla
import pandas as pd
import polars as pl
from pandera.typing import Series
from pandera.typing.geopandas import GeoSeries
import geopandas as gpd
class SchemaPandas(pda.DataFrameModel):
col1: Series[int]
col2: Series[int]
class SchemaGeoPandas(pda.DataFrameModel):
col1: Series[int]
col2: GeoSeries
class SchemaPolars(pla.DataFrameModel):
col1: Series[int]
col2: Series[int]
pandas_df = pd.DataFrame({"col1": [1, 2, 3], "col2": [1, 2, 3]})
gdf = gpd.GeoDataFrame(pandas_df.assign(col2=gpd.GeoSeries.from_xy(x=[1,1,1], y=[2,2,2]))).set_geometry("col2")
polars_df = pl.from_pandas(pandas_df)
lazyframe = polars_df.lazy()
reveal_type(pandas_df)
print("pd.DataFrame")
result = SchemaPandas.validate(pandas_df)
# test operations for methods accepting pandas dataframes
result.to_csv("test")
pd.concat([result, result])
reveal_type(result)
print("gpd.GeoDataFrame")
reveal_type(SchemaGeoPandas.validate(gdf))
print("pl.DataFrame")
result2 = SchemaPolars.validate(polars_df)
reveal_type(result2)
print("pl.LazyFrame")
reveal_type(SchemaPolars.validate(lazyframe))
if typing.TYPE_CHECKING:
# this should fail mypy, I don't want it crashing my test script though
SchemaPolars.validate(pandas_df) # should fail
when then shows
fork\tester.py:34: note: Revealed type is "pandas.core.frame.DataFrame"
fork\tester.py:41: note: Revealed type is "pandera.typing.pandas.DataFrame[tester.SchemaPandas]"
fork\tester.py:43: note: Revealed type is "pandera.typing.pandas.DataFrame[tester.SchemaGeoPandas]"
fork\tester.py:48: note: Revealed type is "pandera.typing.polars.DataFrame[tester.SchemaPolars]"
fork\tester.py:50: note: Revealed type is "pandera.typing.polars.LazyFrame[tester.SchemaPolars]"
fork\tester.py:53: error: No overload variant of "validate" of "DataFrameModel" matches argument type "DataFrame" [call-overload]
under mypy. I thought at first this might be because pl.LazyFrame and pl.DataFrame are resolving to Any in the pre-commit environment, but I've tried adding polars there explicitly and it still shows the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @cosmicBboy I had another look at this and I think it's ready for another look if you have a chance. I worked out how to resolve the
pandera\api\polars\model.py:130: error: Overloaded function signature 2 will never be matched: signature 1's parameter type(s) are the same or broader [misc]
I had above. This was happening for two reasons. Polars wasn't installed in the mypy environment, so pl.DataFrame and pl.LazyFrame were resolving to Any. The second reason was the global mypi config settingfollow_imports=skip
. I swapped this to be a per module follow_imports = skip, and then relaxed the required cases so that this import would get treated properly.
I did however notice that adding polars to the mypy environment resulted in a lot of new errors so I've temporarily supressed some errors with more rules in the mypy.ini. I was hoping I would just be able to fix them, but they probably need a bit of input (particularly around how the PolarsData namedtuple works), so I've left that over in #1911 and kept only the minimal mypy changes here.
I guess question, are you happy with mypy changes in this PR, or would be better to revert to the previous state and I just type:ignore the error I quoted up above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with this approach @m-richards. It does add a boilerplate method for each dataframe model type, but I think it's acceptable in order to make the type linters happy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did however notice that adding polars to the mypy environment resulted in a lot of new errors so I've temporarily supressed some errors with more rules in the mypy.ini. I was hoping I would just be able to fix them, but they probably need a bit of input (particularly around how the PolarsData namedtuple works), so I've left that over in #1911 and kept only the minimal mypy changes here.
Thanks for opening up #1911! Let me know if you need any additional help address the mypy errors: I'm okay with type: ignore
'ing them and slowly chipping away at the errors.
Signed-off-by: Matt Richards <[email protected]>
Signed-off-by: Matt Richards <[email protected]>
Signed-off-by: Matt Richards <[email protected]>
Signed-off-by: Matt Richards <[email protected]>
Signed-off-by: Matt Richards <[email protected]>
Signed-off-by: Matt Richards <[email protected]>
…nai-oss#1905) Signed-off-by: Matt Richards <[email protected]> * trial type annotations Signed-off-by: Matt Richards <[email protected]> * changes in individual api files Signed-off-by: Matt Richards <[email protected]> * pl.dataframe working in local test Signed-off-by: Matt Richards <[email protected]> * older python union compat Signed-off-by: Matt Richards <[email protected]> * try polars in the mypy env on ci Signed-off-by: Matt Richards <[email protected]> * translate toplevel mypy skip into module specific skips Signed-off-by: Matt Richards <[email protected]> * mypy passes Signed-off-by: Matt Richards <[email protected]> * missing line continuation Signed-off-by: Matt Richards <[email protected]> * python 3.8 Signed-off-by: Matt Richards <[email protected]> --------- Signed-off-by: Matt Richards <[email protected]>
…nai-oss#1905) Signed-off-by: Matt Richards <[email protected]> * trial type annotations Signed-off-by: Matt Richards <[email protected]> * changes in individual api files Signed-off-by: Matt Richards <[email protected]> * pl.dataframe working in local test Signed-off-by: Matt Richards <[email protected]> * older python union compat Signed-off-by: Matt Richards <[email protected]> * try polars in the mypy env on ci Signed-off-by: Matt Richards <[email protected]> * translate toplevel mypy skip into module specific skips Signed-off-by: Matt Richards <[email protected]> * mypy passes Signed-off-by: Matt Richards <[email protected]> * missing line continuation Signed-off-by: Matt Richards <[email protected]> * python 3.8 Signed-off-by: Matt Richards <[email protected]> --------- Signed-off-by: Matt Richards <[email protected]>
…nai-oss#1905) Signed-off-by: Matt Richards <[email protected]> * trial type annotations Signed-off-by: Matt Richards <[email protected]> * changes in individual api files Signed-off-by: Matt Richards <[email protected]> * pl.dataframe working in local test Signed-off-by: Matt Richards <[email protected]> * older python union compat Signed-off-by: Matt Richards <[email protected]> * try polars in the mypy env on ci Signed-off-by: Matt Richards <[email protected]> * translate toplevel mypy skip into module specific skips Signed-off-by: Matt Richards <[email protected]> * mypy passes Signed-off-by: Matt Richards <[email protected]> * missing line continuation Signed-off-by: Matt Richards <[email protected]> * python 3.8 Signed-off-by: Matt Richards <[email protected]> --------- Signed-off-by: Matt Richards <[email protected]>
…nai-oss#1905) Signed-off-by: Matt Richards <[email protected]> * trial type annotations Signed-off-by: Matt Richards <[email protected]> * changes in individual api files Signed-off-by: Matt Richards <[email protected]> * pl.dataframe working in local test Signed-off-by: Matt Richards <[email protected]> * older python union compat Signed-off-by: Matt Richards <[email protected]> * try polars in the mypy env on ci Signed-off-by: Matt Richards <[email protected]> * translate toplevel mypy skip into module specific skips Signed-off-by: Matt Richards <[email protected]> * mypy passes Signed-off-by: Matt Richards <[email protected]> * missing line continuation Signed-off-by: Matt Richards <[email protected]> * python 3.8 Signed-off-by: Matt Richards <[email protected]> --------- Signed-off-by: Matt Richards <[email protected]>
This aims to close #763, but is currently a partial WIP step towards that.
I did come across that #1450 already exists, but it doesn't deal with different dataframe libraries, so it seemed easier to start from scratch, and focus specifically on the validate method to get some initial feedback too.
In the current state there's a few issues to resolve:
pandera\api\dataframe\model.py:298: error: Overloaded function signature 2 will never be matched: signature 1's parameter type(s) are the same or broader [misc]
which I gather is because without a stub library (and perhaps due to some mypy configuration) the revealed type of both pd.DataFrame and pl.LazyFrame is Any, and so the overloads are in conflict with one another.
Another possible approach would be to overload
validate
in all the dataframe library implementations, purely to update type annotations (avoiding the need to deal with optional imports) but this might not be worth the mess it creats.I'd welcome any feedback, it's been quite a while since I last submitted a PR to pandera, didn't think it would be such a long gap.