Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional class and protocol fields and methods #601

Open
srittau opened this issue Dec 17, 2018 · 28 comments
Open

Optional class and protocol fields and methods #601

srittau opened this issue Dec 17, 2018 · 28 comments
Labels
topic: feature Discussions about new features for Python's type annotations

Comments

@srittau
Copy link
Collaborator

srittau commented Dec 17, 2018

Sometimes, implementations check for the existence of a method on an object before using it. Take this example from Python 2.7's urllib:

class addbase:
    def __init__(self, fp):
        self.fp = fp
        self.read = self.fp.read
        self.readline = self.fp.readline
        if hasattr(self.fp, "readlines"): self.readlines = self.fp.readlines
        if hasattr(self.fp, "fileno"):
            self.fileno = self.fp.fileno
        else:
            self.fileno = lambda: None
        if hasattr(self.fp, "__iter__"):
            self.__iter__ = self.fp.__iter__
            if hasattr(self.fp, "next"):
                self.next = self.fp.next

Currently this is best modeled by leaving these methods out from a protocol. But this means that their signature is unknown, and mypy will complain about the non-existing attribute. It would be useful to be able to model this somehow, for example by using a decorator for such methods.

@ilevkivskyi
Copy link
Member

This was explicitly deferred in PEP 544. On the other hand this would be not hard to implement now. Also another structural type in mypy -- TypedDict -- supports this. So maybe we should also support total=False for protocols?

@JukkaL what do you think?

@srittau
Copy link
Collaborator Author

srittau commented Dec 18, 2018

I don't think a total=False attribute would be sufficient. In the example above - in fact in all cases where I have encountered this so far - there are required members as well as optional ones. A per-attribute flag (a decorator?) would serve these cases much better.

@ilevkivskyi
Copy link
Member

I don't think a total=False attribute would be sufficient

It is always sufficient. Whether it is convenient is a different question.

@srittau srittau added topic: feature Discussions about new features for Python's type annotations and removed enhancement labels Nov 4, 2021
@srittau
Copy link
Collaborator Author

srittau commented Nov 4, 2021

I mentioned this again on typing-sig. It could reuse PEP 655's NotRequired class:

class MyProto:
    x: NotRequired[int]
    @not_required
    def foo(self) -> None: ...

It would also make sense for non-protocols:

class Foo:
    x: NotRequired[int]
    def setup(self) -> None:
        self.x = some_calc()

(Not a design I would recommend, but I have seen this from time to time in the wild.)

@srittau srittau changed the title Protocols: Optional methods Optional class and protocol fields and methods Nov 4, 2021
@vnmabus
Copy link

vnmabus commented Oct 27, 2023

I think this could be helpful for solving #1498 by marking _my_property as a non-required attribute. What do you think?

@ottokruse
Copy link

It's 2024 and I need this for a scenario where I want to type classes that may optionally implement a method.

My function takes in instances that should match a protocol, where one method (reset()) is optional. If they have defined it, my function will call it, if not proceed straight with other logic.

class MyClassProtocol(Protocol):
  pass
  # Can't put this in or that'd make it mandatory:
  # def reset() -> void: ...
def my_func(i: MyClassProtocol):
  if hasattr(i, "reset"):
    i.reset() # fails typecheck now
  # other logic

@erictraut
Copy link
Collaborator

@ottokruse, here's a solution that might work for you:

Code sample in pyright playground

from typing import Protocol, runtime_checkable

@runtime_checkable
class MyClassProtocol1(Protocol):
    def required_method(self) -> None:
        ...

@runtime_checkable
class MyClassProtocol2(MyClassProtocol1, Protocol):
    def reset(self) -> None:
        ...

type MyClassProtocol = MyClassProtocol1 | MyClassProtocol2

def my_func(i: MyClassProtocol):
    if isinstance(i, MyClassProtocol2):
        i.reset()

# Test with some concrete classes
class Foo:
    def required_method(self) -> None:
        ...

class Bar(Foo):
    def reset(self) -> None:
        ...

my_func(Foo())
my_func(Bar())

@ottokruse
Copy link

Thank you Eric!

Also found your typeguard PEP now. Nice work

@yangdanny97
Copy link
Contributor

I did some number crunching here: https://github.com/yangdanny97/type_coverage_py/blob/hasattr_getattr/package_report.json

and it looks like 65% of the top 2000 packages on pypi have calls to hasattr or three-argument calls to getattr in their source code, which suggests that this is a common pattern (unsurprising since these are builtins).

IMO it's common enough in existing code to warrant a PEP for this.

It would also make sense for non-protocols:

class Foo:
    x: NotRequired[int]
    def setup(self) -> None:
        self.x = some_calc()

(Not a design I would recommend, but I have seen this from time to time in the wild.)

I'd advocate for NotRequired to be allowed in classes directly, so that it can be adopted with minimal effort by more people.

If this was protocol-only, I get the sense that maintainers will grumble about writing extra classes just to satisfy the typechecker, and I worry it would be like Callable protocols where it's kind of verbose and relatively harder to adopt.

@yangdanny97
Copy link
Contributor

yangdanny97 commented Jan 2, 2025

I came up with some proposed semantics for this - feedback would be greatly appreciated

Semantics

The NotRequired qualifier will be allowed for attribute annotations in classes and protocols. It won't be possible to annotate methods with a decorator equivalent, since there is no way to declare a method without it being bound in the class.

Structural Typechecking

If a protocol has a NotRequired attribute, structural typechecking will depend on finality:

  • If the class is final, it can omit the attribute, declare it normally, or declare it with NotRequired
  • If the class is not final, it must declare the attribute normally or with NotRequired; the attribute may NOT be omitted
class Proto(Protocol):
    x: NotRequired[int]
class Class1:
    pass
@final
class Class2:
    pass
class Class3:
    x: NotRequired[int]
class Class4:
    x: int
a: Proto = Class1()  # not OK
b: Proto = Class2()  # OK
c: Proto = Class3()  # OK
d: Proto = Class4()  # OK

The requirement that implementing classes must declare the attribute as NotRequired (instead of allowing it to be omitted) is motivated by this example from @JelleZijlstra

class P(Protocol):
    a: int
    b: NotRequired[str]

def user(p: P) -> None:
    if hasattr(p, 'b'):
        print(len(p.b))

class A:
    a: int
    def __init__(self) -> None:
        self.a = 3

class B(A):
    b: int
    def __init__(self) -> None:
        super().__init__()
        self.b = 1

def accept_a(a: A) -> None:
    user(a) # OK, A has an attribute a of the right type and doesn't have an attribute b

accept_a(B()) # boom

@runtime_checkable

Attributes annotated with NotRequired in a runtime_checkable protocol will be skipped when checking an object at runtime. The current runtime checkable behavior only checks for the presence of an attribute with the right name without checking the type.

Overrides

Subclasses may only remove the NotRequired qualifier from an overridden attribute.

Final

NotRequired will not be compatible with the Final qualifier in the same type annotation, since attributes with the latter are required to be initialized so NotRequired wouldn’t do anything.

ReadOnly

PEP767 may introduce read-only attributes. Subclasses will be allowed to override ReadOnly attributes to remove NotRequired (making a non-required attribute required).

class Class1:
    x: ReadOnly[NotRequired[int]]
class Class2(Class1):
    x: ReadOnly[int]  # OK
class Class3(Class1):
    x: ReadOnly[NotRequired[int]]  # OK

Pattern Matching

Not-required attributes are not allowed to be used with __match_args__, and may not be matched in general.

Assignment/Deletion

There will be no changes to assignment or deletion behavior at runtime. For the purposes of typechecking, assignment to NotRequired attributes will work the same as attributes annotated without the qualifier.

Currently, despite all attributes being “required”, none of the major typecheckers prevent attributes from being deleted. This behavior will stay the same, and both regular and NotRequired attributes will be deletable.

Access

There will be no changes to access behavior at runtime. Typecheckers may error if the attribute is accessed without being narrowed (using a hasattrcall or assigning to it) or if getattris used without a default.

This would be similar to emitting errors for accessing non-required TypedDict keys, and narrowing would work the same way (Mypy and Pyre don’t support this kind of error/narrowing, but Pyright does).

Given the lack of standardization of the equivalent typechecking behavior for TypedDicts, I think we probably want to make this behavior optional for now.

class Class1:
    x: NotRequired[int]
c = Class1()
if hasattr(c, "x"):
    c.x  # OK
c.x  # not OK (optional)

Uninitialized Attribute Checks

Typecheckers including Mypy, Pyright, and Pyre, can check for uninitialized attributes, though this is generally a best-effort check and opt-in/experimental in some cases. Typecheckers should not raise an error if the uninitialized attribute is annotated with NotRequired.

Effect for Users

When implementing all the required proposed features as described above (but none of the optional ones), this is what changes.

Compared to leaving the attribute unannotated:

  • We know what type the attribute is when accessing/assigning (instead of giving an unknown attribute error/defaulting to Any)
  • We can restrict the type of the attribute in a subclass

Compared to annotating the attribute with an unqualified type:

  • No typechecker complaints on uninitialized attributes
  • The presence of the qualifier documents that the attribute may be absent

Things the proposed semantics do NOT guarantee:

  • Absence of NotRequired means that the attribute is present
  • Attribute accesses to NotRequired attributes must be gated
  • Banning deletion of required attributes

cc @migeed-z @stroxler @samwgoldman @rchen152 @grievejia

@GalaxySnail
Copy link

  • If the class is not final, it must declare the attribute normally or with NotRequired; the attribute may NOT be omitted

How does it work with dataclasses? I suspect that dataclasses should inspect the type and check if it's NotRequired, just like how dataclasses handle typing.ClassVar.

@ottokruse
Copy link

So to declare an optional function signature you would do this?:

class Class1((Protocol):):
    fn: NotRequired[Callable[[Class1, ...], ...]]

Ugly because you have to type self, but if it works I'll take it.

@samwgoldman
Copy link

samwgoldman commented Jan 6, 2025

@ottokruse

So to declare an optional function signature you would do this?:

Interesting question. Could you give a motivating example of a non-required method? I think this would help us understand the use case and appropriate solutions.

[Edit: snippet a comment that isn't relevant to the original question. I missed that Class1 was a Protocol originally.]

@JelleZijlstra
Copy link
Member

Interesting question. Could you give a motivating example of a non-required method? I think this would help us understand the use case and appropriate solutions.

Here's an example in typeshed: https://github.com/python/typeshed/blob/a51dd6d6d86bf1bdf87c22cbacfe2fda231418dc/stdlib/bz2.pyi#L17

File objects must have a write method with a particular signature. They may or may not have fileno and close methods, but if they exist they must have a particular signature.

@samwgoldman
Copy link

Here's an example in typeshed

Thanks! That's a great example. I took a peek at the bz2 implementation, and I am not sure that fileno or close are actually optional here. I could be misunderstanding, so please point out any mistakes!

  1. The BZ2File constructor uses hasattr tests for "read" and "write" and sets _fp if those tests pass. code
  2. If you call fileno on the BZ2File instance, it will forward that call to the underlying _fp without performing a hasattr check. code
  3. If you call close on the BZ2File instance, we will not call close on the underlying _fp object, because _closefp will not be set.

Based on that reading, I'd say that fileno is in fact required and close has no constraint in the non-str/bytes/pathlike case.

@JelleZijlstra
Copy link
Member

I think you're right about this particular case, but there's a number of similar ones elsewhere. Grep for # def in the typeshed source.

Another example I found was the rollback method in the PEP 249 database API (https://peps.python.org/pep-0249/#rollback, https://github.com/python/typeshed/blob/a51dd6d6d86bf1bdf87c22cbacfe2fda231418dc/stdlib/_typeshed/dbapi.pyi#L17).

@ostr00000
Copy link

I would like to add another example, because in typeshed a class with maximum number of optional method has only 4 optional method (usually 2).

My example is a "control" class in FreeCAD. Depending on if the method is defined in a provided class, it will be called in C++.
What makes this example different from previous ones is the number of optional method - there are about 19 optional methods in one class.
I believe that a solution from Eric's example is no longer possible due to performance reasons (2**19 possible combinations).


Also, I would like to point to a missing ability in Callable to distinguish positional/keyword arguments.
But maybe this can be by-passed?:

class DoubleClickedMethod(Protocol):
    def __call__(self: ControlClass, viewObj: FreeCADGui.ViewProviderDocumentObject, /) -> bool: ...

class ControlClass(Protocol):
    doubleClicked: NotRequired[DoubleClickedMethod]

But annotating self with different class is very suspicious for me (self: ControlClass).
On the other hand, pyright and mypy allows using a protocol for annotating self, but pyre fails.
Is annotating self with something other than Self/current class is defined in typing?


Could you @yangdanny97 explain your example why this is a problem?:

accept_a(B()) # boom

Or maybe there should be this code?:

user(B()) # boom

@yangdanny97
Copy link
Contributor

yangdanny97 commented Jan 7, 2025

Regarding expressing possibly-present methods as Callable attributes: @stroxler mentioned that methods can be accessed both from an instance and from the class, and due to the former case being a bound method they would accept different numbers of arguments.

The behavior of existing type checkers does't appear to be consistent here, and it appears to be unspecified:

That said I don't view standardizing/specifying this behavior as vital/blocking to the proposal, I think that could be done separately.

@yangdanny97
Copy link
Contributor

@ostr00000

Sorry, I messed up an indentation in the example. I've fixed it above, but the changed section is here:

def accept_a(a: A) -> None:
    user(a) # OK, A has an attribute a of the right type and doesn't have an attribute b

accept_a(B()) # boom
  • B extends A, so it should be allowed as an argument to accept_a.
  • A implements P, so it should be allowed as an argument to user
  • but B adds an attribute in a way that's incompatible with P, causing it to blow up

The proposed solution is to require A to declare the attribute as NotRequired in order to implement P, preventing incompatible overrides in subclasses

class A:
    a: int
    b: NotRequired[str]
    def __init__(self) -> None:
        self.a = 3

class B(A):
    b: int. # typecheck error

@ottokruse
Copy link

That said I don't view standardizing/specifying this behavior as vital/blocking to the proposal, I think that could be done separately.

Would be good to have datapoints, how often is this needed for attributes, how often for methods? Maybe simple enough to do in the number crunching script, that checks all the getattr calls, that you ran before? (which was super helpful)

@yangdanny97
Copy link
Contributor

@GalaxySnail

How does it work with dataclasses? I suspect that dataclasses should inspect the type and check if it's NotRequired, just like how dataclasses handle typing.ClassVar.

I can think of two options here, others can chime in if there are more ideas.

The first/easiest option is just to ban this type qualifier in dataclasses.

The second option is to have some special handling based on inspecting the type. @migeed-z suggested creating a new type UninitializedType that's inhabited by a single value (similar to NoneType or EllipsisType). It wouldn't have a use outside of dataclasses, but if it's passed as an argument to the constructor or used as a default for a dataclass then the attribute could be left uninitialized.

@yangdanny97
Copy link
Contributor

yangdanny97 commented Jan 20, 2025

@ottokruse

Would be good to have datapoints, how often is this needed for attributes, how often for methods

So this is a bit tricky because the script I used only looks at the AST nodes, and it doesn't have the ability to look up the attr referenced in a hasattr or getattr call to see if it's a method or attribute.

I had an idea to adapt the script to search for getattr calls where the default value is a lambda, which I think may be a rough lower bound of how many getattr calls reference methods.

The results from looking at the top 2k packages are as follows:

  • 81 packages (4%) contain at least one getattr callsite w/ a lambda as the default, compared to ~50% that have getattr w/ any default
  • 2.4% of getattr callsites w/ defaults have a lambda as the default

@yangdanny97
Copy link
Contributor

I discussed with @samwgoldman re: dataclasses

If we do go with using a special uninitialized/undefined value for dataclasses as proposed above, we would probably want the values to be optional in the generated constructor, like so:

@dataclass
class Foo:
    x: NotRequired[int]

# generated __init__ method would look like this

def __init__(self, x: int | dataclass.UninitializedType = dataclass.Uninitialized): ...

This would require placing all NotRequired fields at the end, similar to fields with default values. We might also want to make them keyword-only.

@srittau
Copy link
Collaborator Author

srittau commented Jan 23, 2025

The proposal for NotRequired fields makes sense to me, and implementing it as is would be a a step forward.

It won't be possible to annotate methods with a decorator equivalent, since there is no way to declare a method without it being bound in the class.

Unfortunately that means that a significant class of problems – and especially those that motivated this issue – is not handled by this proposal. While some protocols could be described using the – awkward – Callable syntax, others can not. I would really like to see a @not_required decorator to be included in the proposal – even if it would just work for protocols.

@yangdanny97
Copy link
Contributor

yangdanny97 commented Jan 27, 2025

@srittau

With the semantics as currently proposed, allowing @not_required on protocols but not classes would make it impossible for a non-final class that doesn't have the method to match the protocol, and it could also have some issues when a concrete class directly extends a protocol.

I don't think working around this is possible without runtime changes, but if we do make runtime changes it should be possible to support it for both protocols and classes.

For example:

class Proto(Protocol):
    @not_required
    def foo() -> None: ...

# does not implement Proto
class Class1:
    pass

# OK
@final
class Class2:
    pass

# this doesn't work, because foo is still bound to the class
class Class3:
    @not_required
    def foo() -> None: ...

# OK
class Class4:
    def foo() -> None: ...

# this also doesn't work, because the inherited foo is bound to the class now that it's concrete
class Class5(Proto):
   pass

To make Class3 and Class5 work, we would need to change the runtime to not bind any method that's decorated with @not_required.

cc @JelleZijlstra @rchen152 for a sanity check on whether this would be a reasonable change to propose or if it would be too controversial

In a type stub file, I think we could work around it by decorating the absent method with both @type_check_only and @not_required which would restrict invalid overrides in user code, but that won't work in a regular file since the @type_check_only decorator is not available at runtime.

class Class3:
    @not_required
    @type_check_only
    def foo() -> None: ...

So I guess from what I can see, the options are:

  1. don't allow the decorator
  2. allow the decorator, but there's no way for concrete non-final classes in regular files to omit the method
  3. make the necessary runtime changes to make the decorator work the same as the type qualifier on attributes

if anyone else has any ideas please let me know :D

@JelleZijlstra
Copy link
Member

we would need to change the runtime to not bind any method that's decorated with @not_required.

I don't think that's realistic, sorry. It wouldn't be possible without changes to the language core, and I don't see a good way to make this work.

It seems difficult to make this feature both sound and convenient to use.

@yangdanny97
Copy link
Contributor

I cross-posted to the forum to get more feedback/ideas

https://discuss.python.org/t/discussion-optional-class-and-protocol-fields-and-methods/79254

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: feature Discussions about new features for Python's type annotations
Projects
None yet
Development

No branches or pull requests