Demystifying Python’s Descriptor Protocol
A walkthrough of descriptor protocol to understand inner working of property, classmethod, staticmethod builtins
- By Karan
- ·
- Insights
- Python
A lot of modern frameworks and libraries use the "descriptor" protocol to make the process of creating APIs for end-users neat and simple. Let's discuss how the behavior of Python's builtins like property, staticmethod and classmethod can be imitated using the descriptor protocol.
Consider the following example class:
class Person:
def __init__(self, first_name, last_name):
self.first_name = first_name
self.last_name = last_name
def _full_name_getter(self):
return f'{self.first_name} {self.last_name}'.title()
def _full_name_setter(self, value):
first_name, *_, last_name = value.split()
self.first_name = first_name
self.last_name = last_name
full_name = property(fget=_full_name_getter, fset=_full_name_setter)
foo = Person('foo', 'bar')
Whenever we access any of foo
's attribute, say foo.first_name
, then first_name
is checked in the following until it is found:
foo.__dict__
,type(foo).__dict__
__dict__
offoo
's base classes in the MRO1 — except for metaclasses.
>>> foo.__dict__
{'first_name': 'foo', 'last_name': 'bar'}
>>> foo.first_name
foo
>>> foo.last_name
bar
>>> foo.full_name
Foo Bar
>>>
Notice that attribute full_name
isn't there in foo.__dict__
. Huh, where did it come from?
Well, the attribute access mechanism that we discussed was incomplete. But, before going into that, let's take a detour and look at how the descriptor protocol — on the base of which the property, classmethod, staticmethod work — works.
What's the descriptor protocol?
Any object which has at least one of __get__
, __set__
, __delete__
methods defined, is called a descriptor. The signature of these methods are:
__get__(self, obj, type=None) -> value
__set__(self, obj, value) -> None
__delete__(self, obj) -> None
There are two types of descriptors: data descriptors, and non-data descriptors. The difference between two is that, if an object has either of __set__
or __delete__
defined then it's called as a data descriptor. A non-data descriptor, therefore, only has __get__
defined among these three methods. Data and non-data descriptors have different precedence in the attribute lookup chain (more on it later).
In Person
class, the class attribute full_name
is a descriptor. When foo.full_name
is accessed, the Person.full_name.__get__(foo, Person)
gets called, which in turn calls the function that we passed in property
as fget
keyword argument.
So the attribute access mechanism now is:
- Check if
type(foo).__dict__['first_name']
is a data descriptor. If yes, thenPerson.first_name.__get__(foo, Person)
is returned. - If not,
first_name
is checked infoo.__dict__
,type(foo).__dict__
and in__dict__
offoo
's base classes in MRO1 — unless it's a metaclass. - Lastly, it is checked if
type(foo).__dict__['first_name']
is a non-data descriptor, in which casePerson.first_name.__get__(foo, Person)
is returned.
Note that the first and third steps are almost similar. But, if an attribute is a data descriptor, then it's given the highest precedence, and in case of non-data descriptor the __dict__
lookup has higher precedence than non-data descriptors. We'll see how this will be used in the cached property later in the post.
You might be wondering what orchestrates this lookup mechanism. Well it's __getattribute__
(not to be confused with __getattr__
) — When we lookup foo.full_name
, foo.__getattribute__('full_name')
is called, which handles it according to the attribute access mechanism we just defined.
It is also important to understand attribute setting mechanism. Consider this statement: foo.age = 32
:
- if
age
attribute is a descriptor thentype(foo).__dict__['age'].__set__(32)
is called. In caseage
is a non-data descriptor,AttributeError
is thrown. - Otherwise, an entry is created in
foo
's__dict__
, i.efoo.__dict__['age'] = 32
.
How does property builtin works?
Let's first see the signature of property
.
property(fget=None, fset=None, fdel=None, doc=None)
although it looks like a function, but it's actually a class which is also a descriptor because it has __get__
, __set__
, and __delete__
defined.
We know that an attribute which is a descriptor, when accessed on an object say foo
, calls its __get__
method with the object and class of the object as arguments, i.e. type(foo).__dict__['attr_name'].__get__(foo, type(foo))
. Similarly, when it's being set, then its __set__
method is called with the object and value to be set, i.e. type(foo).__dict__['attr_name'].__set__(foo, value)
.
Continuing with the opening example:
>>> foo.full_name
Foo Bar
>>> # Person.__dict__['full_name'].__get__(foo, Person)
>>> foo.full_name = 'keanu reeves'
>>> # Person.__dict__['full_name'].__set__(foo, 'keanu reeves')
>>> foo.first_name
keanu
>>> foo.last_name
reeves
>>> foo.full_name
Keanu Reeves
Note that when we set foo.full_name = 'keanu reeves'
, then full_name
property's __set__
is called which in turn calls the _full_name_setter
that we passed to property as fset
argument.
We can mimic the property
behavior with the following implementation:
class Property:
def __init__(self, fget=None, fset=None, fdel=None, doc=None):
self.fget = fget
self.fset = fset
self.fdel = fdel
self.doc = doc
def __get__(self, instance, owner):
if self.fget is None:
raise AttributeError("unreadable attribute")
return self.fget(instance)
def __set__(self, obj, value):
if self.fset is None:
raise AttributeError("can't set attribute")
self.fset(obj, value)
def __delete__(self, obj):
if self.fdel is None:
raise AttributeError("can't delete attribute")
self.fdel(obj)
def getter(self, fget):
return type(self)(fget, self.fset, self.fdel, self.__doc__)
def setter(self, fset):
return type(self)(self.fget, fset, self.fdel, self.__doc__)
def deleter(self, fdel):
return type(self)(self.fget, self.fset, fdel, self.__doc__)
How does cached property works?
The expected behavior for a cached property is that it should be calculated if it hasn't been calculated already, and after the calculation, it should be stored ('cached') so that it can be quickly be accessed next time onwards.
class CachedProperty:
def __init__(self, function):
self.function = function
def __get__(self, instance, owner):
result = self.function(instance)
instance.__dict__[self.function.__name__] = result
return result
Let's now use it
>>> class Foo:
>>> def score(self):
>>> print('doing some time-consuming calculations')
>>> return 19.5
>>>
>>> score = CachedProperty(score)
>>> # you can also use CachedProperty as decorator
>>>
>>> foo = Foo()
>>> vars(foo) # i.e foo.__dict__
>>> {}
>>> foo.score
doing some time-consuming calculations
19.5
>>> vars(foo)
{'score': 19.5}
>>> foo.score
19.5
Observe that when we first accessed the score
attribute on foo
, it printed "doing some time-consuming calculations". After foo.score
was accessed once, foo.__dict__
was populated with a new entry with the key score
. If we access foo.score
for a second time now, nothing would be printed — it returns vars(foo)['score']
instead.
Why did that happen?
To answer this, it's time to recall the attribute access machinery. When score
was accessed for first time:
- It was checked if score was a data descriptor. It was not.
- The next check was done on
__dict__
. Againscore
key wasn't found in eitherfoo
or in it's base's__dict__
. - Next, it was checked if
score
was a non-data descriptor — True, thereforetype(foo).__dict__['score'].__get__(foo, type(foo))
was called which stored and returned the result.
When score
is now accessed second time onward:
- Check if
score
is a data descriptor — It's not. 'score'
key is then looked up infoo.__dict__
, where it was inserted whenscore
was accessed for the first time.foo.__dict__['score']
is returned.
One example where using cached property becomes particularly useful is if you've a model class in Django and you've defined a property which makes a time consuming query. Django's "batteries included" philosophy falls no short, and provides django.utils.functional.cached_property
for this use case.
How do staticmethod and classmethod work?
A method decorated by staticmethod
does not receive an implicit first argument. It converts a function to be a static method. Let's implement it using descriptor protocol:
class StaticMethod:
def __init__(self, function):
self.function = function
def __get__(self, instance, owner):
return self.function
Similarly, the descriptive API can be used to implement classmethod
decorated methods — which receive the class object as the first argument — as follows:
class ClassMethod:
def __init__(self, function):
self.function = function
def __get__(self, instance, owner):
def wrapper(*args, **kwargs):
return self.function(owner or type(instance), *args, **kwargs)
return wrapper
We've used descriptor magic to understand how builtins like staticmethod, classmethod and property work, and how we can implement one like CachedProperty ourselves. Note that the CachedProperty that we implemented is not a hack — Python 3 provides these APIs to enable developers to be able to customize things as and when needed.
Helpful links:
- http://dabeaz.com/py3meta
- https://docs.python.org/3/howto/descriptor.html
- https://docs.djangoproject.com/en/dev/ref/utils/#django.utils.functional.cached_property