What’s __new__ with you? Python’s underappreciated dunder

EDIT: Based on valueable reader feedback, I’ve put my rambling musings on my reasons for writing this tutorial at the END instead of the BEGINNING. tl;dr, I found little info out there on the __new__() dunder method in class definition, so I decided to write my most typical (but not only) use case: validating class attributes of subclasses using metaclasses.

Quick background about metaclasses and __new__

  • By default, the metaclass of any class is type. You can change that metaclass by specifying your own defined one when you define the class, it’s as easy as class MyClass(metaclass=MyMetaclass) (assuming you’ve already defined MyMetaclass by subclassing type)
  • The __new__ dunder automatically grabs four arguments when it’s invoked every time a class with that metaclass is defined (not instantiated!). In order they are:
    1. metacls, a reference to the object representing the metaclass. Its repr is something like although of course your module might not be called __main__ if it’s imported. Its type is type, of course, it’s a metaclass.
    2. clsname, a string which is exactly what the calling class’s __name__ dunder contains
    3. bases, a tuple containing a reference to every class above it in the Method Resolution Order, a can of worms I will not open here because (a) it’s off topic, and (b) Modern OOP Python programmers like me have figured out that it’s so much less cognitive strain to use the compositional pattern (i.e. mixins) or Abstract Base Classes than to use multiple inheritance. If you want to know more about the MRO, Geeksforgeeks has a nice tutorial about it which actually shows a diagram of the Diamond Inheritance Pattern (or as they hilariously call it, the ‘Diamond of Death’) which is VERY BAD and which the MRO avoids. Avoiding multiple inheritance does the same thing way easier, imo.
    4. namespace. This is actually created by the __prepare__ dunder called by the __new__ dunder, but for chrissakes, this is complicated enough. Basically it is a dict with the contents of MyClass.__dict__ (which is not actually a dict but a mappingproxyif you want to be technical) plus a few extra keys and values some of which will be relevant to this tutorial)

Use case: validating class attribute of a subclass

Okay, I can’t find where I first learned this trick, but I’m pretty sure it used the class Polygon and then subclasses to Triangle, Rectangle, Trapezoid, etc. The other usual toy class/subclass example in Python tutorials is Pet/Dog, Pet/Cat, etc; I’ll do something similar with Animal/Dog, etc.

This is mostly pure Python, but we will import two libraries from the Standard Library:

import pprint
import string

Now, let’s see what happens when we don’t use metaclasses. We’ll define a class Animal with a class attribute taxonomy, e.g. the Linnaean genus and species, like Mus musculus for the house mouse. Obviously for something as abstract as a generic Animal, we can’t specify the taxonomy, so we use None as a placeholder.

I’ve inserted an assertion for the __init__ method, showing that we can validate instances attributes easily, but the class attribute is another matter.

class Animal:  # since we didn't specify it, its metaclass is type
    taxonomy = None  # class attribute; this is what we will validate later
    def __init__(self, legs):
        assert isinstance(legs, int)
        self.legs = legs

All animals have a number of legs (which can be zero, Mr. Earthworm, or Lumbricus terrestris, or maybe up to 344 for the Geophilomorph genus of centipedes, despite the name — yes, both earthworms and centipedes are Animals, it’s a big Kingdom.)

Now we can subclass Animal to a specific genus and species:

class BlackWidowSpider(Animal):
    taxonomy = 'Creepy crawly poisonous get it away nope nope nope'
    def __init__(self, legs, eyes, venomous):
        super().__init__(legs)
        assert self.legs <= 8
        self.eyes = eyes
        assert self.eyes <= 8
        self.venomous = venomous
some_spider = BlackWidowSpider(legs=8, eyes=8, venomous=True)

So obviously that’s not the correct taxonomy for a black widow spider (it’s Latrodectus mactans). We added instance attributes that are relevant to this species, as well as assertions (in case you’re wondering why I used <= instead of ==, any particular instance of a Black Widow Spider could be injured and save, say, 7 legs.).

So how to validate the class instance taxonomy? Well, we’ll use a Validating Metaclass.

Just to show you the arguments to __new__, first we’ll define a metaclass that does so:

class PrintNewArguments(type):  # since it inherits from type, it's a metaclass
    """print __new__ arguments"""
    def __new__(metacls, name, bases, namespace):
        print('metacls:', type(metacls), metacls)
        print('name:', type(name), name)
        print('bases:', type(bases), bases)
        pprint.pprint(('namespace:', type(namespace), namespace))
        return type.__new__(metacls, name, bases, namespace)
        # that's the normal return value for __new__

Now we’ll use the exact same definition of Animal, just specifying this metaclass and see what happens:

class Animal(metaclass=PrintNewArguments):
    taxonomy = None 
    def __init__(self, legs):
        assert isinstance(legs, int)
        self.legs = legs
metacls: <class 'type'> <class '__main__.PrintNewArguments'>
name: <class 'str'> Animal
bases: <class 'tuple'> ()
('namespace:',
 <class 'dict'>,
 {'__init__': <function Animal.__init__ at 0x7f59cc213ea0>,
  '__module__': '__main__',
  '__qualname__': 'Animal',
  'taxonomy': None})

__new__ was run when we defined the class. It’s also run when we subclass with the exact same code we did before, but the arguments are slightly different: bases has one entry in the MRO and there’s a __classcell__ key in the namespace, which we won’t get into here.

class BlackWidowSpider(Animal):
    taxonomy = 'Creepy crawly poisonous get it away nope nope nope'
    def __init__(self, legs, eyes, venomous):
        super().__init__(legs)
        assert self.legs <= 8
        self.eyes = eyes
        assert self.eyes <= 8
        self.venomous = venomous
metacls: <class 'type'> <class '__main__.PrintNewArguments'>
name: <class 'str'> BlackWidowSpider
bases: <class 'tuple'> (<class '__main__.Animal'>,)
('namespace:',
 <class 'dict'>,
 {'__classcell__': <cell at 0x7f59cc242678: empty>,
  '__init__': <function BlackWidowSpider.__init__ at 0x7f59cc213bf8>,
  '__module__': '__main__',
  '__qualname__': 'BlackWidowSpider',
  'taxonomy': 'Creepy crawly poisonous get it away nope nope nope'})

Now if we instantiate it, __init__ is run but not __new__, that’s only run on class/subclass definition:

some_spider = BlackWidowSpider(legs=8, eyes=8, venomous=True)
print(some_spider.taxonomy)
Creepy crawly poisonous get it away nope nope nope

Okay! Obviously that taxonomy is wrong, and now we’re going to define a function and a metaclass to validate. We could define the function inside the metaclass, but I’m going to define it outside so we can see what’s happening. Note that it just checks if the taxonomy ‘looks right’, it doesn’t actually thoroughly validate the taxonomy, e.g. check it against a list, but it’s good enough for this tutorial. (Really, Polygons would have been easier to validate, but I didn’t want to copy what I vaguely remember reading before.)

def check_taxonomy_appears_valid(taxonomy):
    """Runs several tests on taxonomy to ensure it 'looks right'

    Note that these are surface tests, and do not validate that the taxonomy 
    actually exists

    Args:
        taxonomy (str)

    Returns:
        None

    Raises:
        ValueError if any of the following are not true:
        1. taxonomy consists of two space-separated words, genus and species
        2. both words consist of ascii characters only
        3. first letter of genus is uppercase, all other letters lowercase
        4. each of genus and species is at least two letters long
        5. each of genus and species contains at least one vowel, including y
        6. each of genus and species contains at least one consonant
    """
    if not isinstance(taxonomy, str):
        raise ValueError(f'taxonomy must be str; it is class {taxonomy.__class__}')
    taxonomy_split = taxonomy.split(' ')
    # taxonomy is two words long
    if len(taxonomy_split) != 2:
        raise ValueError(f'taxonomy "{taxonomy}" must consist of two space-separated words, '
                         'genus and species')
    genus, species = taxonomy_split
    # genus must begin with capital ascii
    if not genus[0] in string.ascii_uppercase:
        raise ValueError(f'genus "{genus}" must start with ascii uppercase letter')
    # all other letters must be lowercase ascii
    if not all([x in string.ascii_lowercase for x in genus.lower() + species]):
        raise ValueError(f'taxonomy "{taxonomy}" must be all lowercase ascii except the first character')
    # define vowels and consonants... note vowels includes y, e.g. Latin word pyx        '
    vowels = ['a', 'e', 'i', 'o', 'u', 'y']
    consonants = [x for x in string.ascii_lowercase if x not in vowels] + ['y']
    # check each word
    for name, word in zip(['genus', 'species'], [genus, species]):
        # word must be two letters long at least
        if len(word) < 2:
            raise ValueError(f'{name} "{word}" must be at least two characters long')
        # word must have at least one vowel and at least one consonant
        if not any([x in vowels for x in word.lower()]):
            raise ValueError(f'{name} "{word}" must contain at least one vowel including y')
        if not any([x in consonants for x in word.lower()]):
            raise ValueError(f'{name} "{word}" must contain at least one consonant')

Whew! That’s actually a long function for something so imprecise. Well, anyways. Let’s run some tests to make sure it catches all the errors:

check_taxonomy_appears_valid('Homo sapiens')

for taxonomy, error_message_fragment in [[None, 'must be str'],
                                         ['homosapiens', 'two space-separated'],
                                         ['homo sapiens', 'uppercase'],
                                         ['HOMO sapiens', 'lowercase ascii except'],
                                         ['Homo sapi3ns', 'lowercase ascii except'],
                                         ['H sapiens', 'two characters'],
                                         ['Hm sapiens', 'one vowel'],
                                         ['Homo spns', 'one vowel'],
                                         ['Oo sypyns', 'one consonant'],
                                         ['Homo aie', 'one consonant']
                                        ]:
    try:
        check_taxonomy_appears_valid(taxonomy)
    except ValueError as e:
        assert error_message_fragment in e.args[0], (taxonomy, e.args[0], 
                                                     error_message_fragment)

Okay, looks good, all the bad-looking taxonomies were caught and the good-looking one got through. Now we’ll just define a metaclass to implement this function, but only on subclasses because of course the generic ‘Animal’ has no taxonomy.

class ValidateTaxonomy(type):  # this is a validating metaclass
    """Validates that taxonomy exists and 'looks valid'"""
    def __new__(metacls, name, bases, namespace):
        if bases:  # skip class with no MRO, check only subclass
            taxonomy = namespace.get('taxonomy')
            if taxonomy:
                # if it's specified, make sure taxonomy is valid-looking
                check_taxonomy_appears_valid(namespace['taxonomy'])
            else:
                # make sure taxonomy is specified at all
                # when subclass is defined.
                raise KeyError('taxonomy was not specified')
        return type.__new__(metacls, name, bases, namespace)

So we’ll define Animal EXACTLY as before, just with the new metaclass.

class Animal(metaclass=ValidateTaxonomy):
    taxonomy = None
    def __init__(self, legs):
        assert isinstance(legs, int)
        self.legs = legs

And now we’re ready to define a subclass, i.e. a specific genus & species of animal. Let’s start with half of humanity’s favorite, the common dog. Obviously the breed is important for this particular species, so we’ll add that to the __init__ instead of eyes and venomous like we did with Spider above.

Also, we’ve added a ‘domesticated’ class attribute, because the genus and species for dogs and wolves are the same, dogs are actually a subspecies, which is not reflected in the Linnaean genus-species taxonomy we’ve defined and validated.

class Dog(Animal):
    taxonomy = 'Canis lupus'
    domesticated = True
    def __init__(self, legs, breed):
        super().__init__(legs)
        if legs > 4:
            raise ValueError('must have 4 or fewer legs')
        self.breed = breed

Now we can instantiate this to represent a particular dog. We’ll specify this poor doggo, which was up for adoption on Petfinder while I was writing this. I hope he got a home!!!

nawab = Dog(legs=3, breed='Pariah Dog')  # yes, that's the actual name of his breed!

Now let’s see what happens when we try to define a Cat, the favorite animal of the other half of humanity (myself included!). Note that wild cats and domestic cats are virtually genetically identical, so there’s an argument to be made that cats domesticated humans rather than the other way around!

We will ‘forget’ to specify a taxonomy.

class Cat(Animal):
    domesticated = 'sort of'
    def __init__(self, legs, breed):
        super().__init__(legs)
        if legs > 4:
            raise ValueError('must have 4 or fewer legs')
        self.breed = breed
---------------------------------------------------------------------------

KeyError                                  Traceback (most recent call last)

<ipython-input-15-230b4ed0aa4d> in <module>
----> 1 class Cat(Animal):
      2     domesticated = 'sort of'
      3     def __init__(self, legs, breed):
      4         super().__init__(legs)
      5         if legs > 4:

<ipython-input-11-0f5854fd57ef> in __new__(metacls, name, bases, namespace)
      9             else:
     10                 # make sure taxonomy is specified at all when subclass is defined.
---> 11                 raise KeyError('taxonomy was not specified')
     12         return type.__new__(metacls, name, bases, namespace)

KeyError: 'taxonomy was not specified'

Good, that failed validation! Now let’s “forget” to capitalize the genus:

class Cat(Animal):
    taxonomy = 'felix catus'
    domesticated = 'not really'
    def __init__(self, legs, breed):
        super().__init__(legs)
        self.breed = breed
---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-16-984e6ebff745> in <module>
----> 1 class Cat(Animal):
      2     taxonomy = 'felix catus'
      3     domesticated = 'not really'
      4     def __init__(self, legs, breed):
      5         super().__init__(legs)

<ipython-input-11-0f5854fd57ef> in __new__(metacls, name, bases, namespace)
      6             if taxonomy:
      7                 # if it's specified, make sure taxonomy is valid-looking
----> 8                 check_taxonomy_appears_valid(namespace['taxonomy'])
      9             else:
     10                 # make sure taxonomy is specified at all when subclass is defined.

<ipython-input-9-3ca73e5d6d4d> in check_taxonomy_appears_valid(taxonomy)
     29     # genus must begin with capital ascii
     30     if not genus[0] in string.ascii_uppercase:
---> 31         raise ValueError(f'genus "{genus}" must start with ascii uppercase letter')
     32     # all other letters must be lowercase ascii
     33     if not all([x in string.ascii_lowercase for x in genus.lower() + species]):

ValueError: genus "felix" must start with ascii uppercase letter

Okay, that worked too! Now let’s specify a valid cat like we did for Dog:

class Cat(Animal):
    taxonomy = 'Felix catus'
    domesticated = 'kinda?'
    def __init__(self, legs, breed):
        super().__init__(legs)
        self.breed = breed

And instantiate it. I’m not using the name of my first cat, BTW, I don’t want people cracking my secret questions on secure websites!

fluffy = Cat(legs=4, breed='Persian')

And there you have it, we’ve used the __new__ dunder for something useful and it wasn’t scary at all! If you made it all the way to the end of this post, yay, and I hope you learned something __new__.

# My reasons for writing this post #

Safely skippable if you’re just here for the tutorial.

So I’ve mostly done object-oriented programming in Python the past several years, and I won’t get into the reasons other than to say I’ve been also writing pure, idempotent, testable functions too before they were cool (even when idempotence isn’t strictly necessary, I’m weird that way, but idempotent functions are so much easier to keep track of). Recently, I hadn’t done any decently complex OOP in a while because I’ve been working on AWS Lambda and Zappa and GCP functions a bit, and I had what you might call a bit of a brain fart (hardly a unique experience).

I could not for the life of me remember which of __getattribute__ and __getattr__was the one that got called every time, and which was the one that only got called once if the attribute is missing. (New mnemonic: the lazily named one is lazy!)

Anyways, I was one I one of those “follow every hyperlink” moods and I wondered if anyone had made a decent list of all the dunders (so-called because of the double underscores around them, like __init__(). They’re also known in some circles as “magic methods”, which has an unfortunate similarity to IPython/Jupyter’s “magic commands”). And of course, there were several; many of them rather out of date (e.g. Python<=2.6) unfortunately. After I mused for a bit how I could use __iadd__ to implement Reverse Polish Notation or systems of mathematics really beyond my ken that do not follow the commutative law, it occurred to me that a dunder I use occasionally seemed to be getting short shrift: __new__. I mean, it’s not the most exciting dunder in the world, it’s no __init__, but it’s great for a few cases that come up regularly enough and that metaclasses can easily solve.

For example, This GitHub repo called “magicmethods”, which appears pretty authoritative, was last updated in 2015, which is okay, __new__ hasn’t changed since then (what’s less okay is a hyperlink on the README that purports to lead to the author’s website has expired and instead leads to a page about cannabis) and explains the bare bones (__new__ gets called before __init__ when you instantiate a class), but says: “I don’t want to go in to too much detail on __new__ because it’s not too useful, but it is covered in great detail in the Python docs. True, it’s covered fine in the Python docs, but they’re docs, they explain what it is but not what to do with it. To be fair, that repo later uses __new__ twice in two examples, one subclassing the immutable str and one indirectly with implementing the pickling protocol on your own classes, which is a good use case but just uses __new__, it doesn’t do anything special with it.

Pro tip: you can get a list of the dunders in your terminal by typing $ pydoc SPECIALMETHODS in a terminal. (Wait, that’s another name for them!), but those are again straight from the Python docs.

Then there’s this totally out of date page which gave me a laugh because the only thing it says about the __new__ __underscore method__— yay, a fourth name, and it’s not even really accurate, since there are two underscores — is to put it in a category called “Really Complicated!”. Guido van Rossum himself wrote a python.org draft paper called “unifying types and classes” (i.e. metaclasses), applicable to version 2.2.3 which was released in 2003 (in a weird coincidence, the year I first started learning Python) and refers a lot to “new-style classes”, which is not a thing anymore really because all classes are what were new-style 16 years ago, and in fact they’re new new-style classes with somewhat different syntax (like not calling the __metaclass__ dunder anymore to set metaclasses).

So, maybe I missed a really good dunder list or __new__ explanation out there, but most of the ones I see either gave the dunder short shrift or were really technical (although This one in howto.lintel.in isn’t bad, and explains the other use case I’ve used it for, implementing the singleton design pattern, which btw you can extend to have a maximum of instances greater than or equal to one, so many it should be called the n-ton design pattern), as well as changing/validating subclasses (which is the use case I’ll cover here) and something I’ve never dared to to, totally change the object returned and passed to __init__. Again they discourage its use, saying “Try to make life easier use this method only if it is necessary to use.” DEFINITELY true in the third case, obviously for the first, but the second?

Let’s be clear, there’s a lot of ways to validate stuff in Python (like Decorators!), but what I’m about to describe has a particular use case where you’re creating classes (which are generic and may even be Abstract Base Classes) and subclassing them, and you want to validate a class attribute that is only defined in subclasses, not classes. Sticking to the DRY principle, you don’t want to write the same validation code every time you define a subclass, so they way to do this is to add functionality to the __new__ dunder in the class and subclass’s metaclass.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.