The Pythonic Emptiness

Why the Pythonic way of doing emptiness check on sequences is not necessarily ambiguous in most cases

Nov 09, 2024

When in Rome, do as the Romans do

How do you check a list for emptiness in Python? This simple question often sparks considerable debate, as I discovered firsthand.

Many Python programmers use the len() built-in function for this task, but a large population leverages the truthiness value of sequence type objects to achieve this more succinctly. The latter style is also recommended as part of PEP-8, and considered more Pythonic.

The recommendation from PEP-8 for handling empty sequences using their trutiness value

Apart from being Pythonic, this style also offers much better performance—an important factor for such a frequent operation:

A microbenchmark comparing the performance of the Pythonic style of doing emptiness check and None check versus the alternative styles. The microbenchmark was done by using the timeit module, with the number parameter set to 100_000_000

Despite its merits, some programmers find this style ambiguous and less readable. I argue that this perceived ambiguity often stems from poor coding style and inadequate engineering practices, rather than the idiom itself. These underlying issues need to be addressed instead of simply covering them with inappropriate solutions.

In this article, I am going to give a few arguments in support of the Pythonic way of doing emptiness checks, and show that the apparent ambiguity usually reflects broader code quality issues rather than the check itself.

Note: Using len() itself isn't wrong. My aim in this article is to demystify the perception that PEP-8's style is unclear; usually, ambiguity arises from poor coding practices rather than the idiom itself.

Type checks are not a solution for poor coding style

Why is Emptiness Check using Truthiness More Pythonic?

One of the reasons why the Python language designers adopted this style must be because it is much more efficient. They would have understood this performance difference and considered this to be an important factor for a frequently used operation such as emptiness check.

Of course, performance is not the only criteria. PEP-8 recommends a few more idioms which are slower than their alternatives. Simplicity and readability are preferred over performance. The designers of Python clearly believe that this style is simpler than using len(), and just as readable.

Let’s talk about why this style is simpler and also equally readable.

Simple is Better than Complex

One of the principles of the Pythonic style is: simple is better than complex. It means that if you can do something using a simpler tool then use it. When you use a more complicated tool to solve a simple problem, it misleads the reader to make the wrong assumptions.

As a concrete example, consider the following code which is processing a user based on their status. If a user is active, then that automatically means that they are subscribed. However, this code is extra cautious and unnecessarily doing the is_subscribed check for active users.

As a result, anyone reading this will make a false assumption that there must be a case where an active user may not be subscribed. And then they will write their code with this same assumption.

def process_user_status(user):
    if user.status == 'active':
        if user.is_subscribed:
            # Process active, subscribed user
            print(f"Processing active user: {user.name}")
        else:
            # Process active, unsubscribed user
            print(f"Processing active but unsubscribed user: {user.name}")
    elif user.status == 'suspended':
        # Process suspended user
        print(f"Processing suspended user: {user.name}")
    elif user.status == 'deleted':
        # Process deleted user
        print(f"Processing deleted user: {user.name}")

Similarly, when you use len() to check a sequence for emptiness, you are reaching out for a more powerful tool than necessary. As a result it may make the reader question the intent behind it, e.g. if there is a possibility of handling objects other than a built-in sequence type here?

Readability Counts

I’d also argue that the PEP-8 style of emptiness check is not less readable, if you know Python well. Yes, for a newcomer to Python who is still finding their way around the language’s syntax and semantics, it may look weird.

Python has been designed such that every object has a truthy value within a Boolean context (i.e. within an if or while condition). The built-in objects have well defined truthy values which are also documented. Most good introductory books also cover this aspect of the syntax.

Here are the truth values for the built-in types:

The zero value of all numeric types evaluates to False: 0, 0.0, 0j etc.
Empty sequences evaluate to False: things like lists, tuples, sets, dicts, strings
None evaluates to False
Everything else evaluates to True, unless they override the behavior via the __bool__() method.

This means that when you read a line such as:

if not items:
  # some code

You should be able to infer that items must be a sequence (the name indicates that), and it is being checked for emptiness (it’s inside a Boolean context).

See the official documentation about truth value testing for more details.

Upcoming Live Session

Live Session: Live Coding a Bytecode Interpreter for Python

Abhinav Upadhyay

October 30, 2024

Live Session: Live Coding a Bytecode Interpreter for Python

Join me in our next live session where we will live code a compiler and bytecode interpreter for a subset of Python syntax

Read full story

Ambiguity Concerns

I understand that many people like to make their own minds about what is readable, instead of taking PEP-8 as the Gospel. So let me address the concerns about the ambiguity of the Pythonic way of doing emptiness checks.

There are two concerns here that people generally raise about it:

Using if not mylist is ambiguous while if len(mylist) == 0 is clearer, because in the former case it may not be clear if we are testing a Boolean flag, a sequence, or something else.
When you do an empty check using if not mylist, it will silently pass for objects of other types such as 0, None etc., leading to bugs.

I believe that whenever you find this style of code ambiguous, it is more likely due to the surrounding code smells. Let me address what kind of smells I’m talking about.

Poorly Named Variables

One of the primary reasons for finding the PEP-8 style of emptiness check ambiguous is poorly named variables, and not the check itself. Variables should have meaningful names that also indicate their underlying type, it doesn’t matter if the language is statically typed or dynamically typed. The sequence and Boolean types should especially stand out just based on their names.

Here are a few conventions that I’ve seen in most high quality code bases:

Collection type objects should be easy to tell from their names. For example: cars (a collection of cars), customer_list (a list of customer objects), user_item_map (a map of user and item objects), nodes (collection of node objects) etc.
Integer typed values are harder to name well but there are a few conventions many projects follow. For example: user_count (count of users), nproceses (number of processes), room_size, file_size_bytes etc.
Boolean flags are always easy to tell if named well: is_shipped, has_failed, do_retry, etc.

String typed variables are harder to differentiate from other typed objects just based on names and that indeed becomes a source of nasty bugs in dynamically typed languages.

So, if you name your variables well, you can almost always tell when you are doing an empty sequence check on a list versus on another type. For instance:

if not items:
  # handle empty items
# else do something with items

Contrast this with the case when checking a Boolean flag:

if not is_subscribed:
  # handle users without subscription

If you are concerned about the None value, you should always check for None using the is operator. But I will talk about it a bit later.

Poor Function Names, Missing Docstrings and Lack of Type Hints

If you are still not convinced, then let me add to the previous point. You will never read an emptiness check in isolation, there is always going to be the surrounding context to help you understand the code.

For instance, if that sequence is being passed as a function argument, then the function name, docstring and type hint should clearly signal that this argument is a list. See the following example:

def process_users(users):
    """
    Processes a list of user objects.

    Args:
        users (list): A list of user objects with fields:
            - 'name' (str): The user's name.
            - 'email' (str): The user's email address.
            - 'age' (int): The user's age.
            - 'active' (bool): Whether the user is active or not.

    Returns:
        list: A list of processed user objects.
    """
    if not users:
      return []

Here the argument name users indicates that it is some sort of a sequence of user objects, and the docstring clearly mentions that it is a list.

Often, after an empty check, the else part will be processing that sequence. For example it might be slicing or indexing the sequence, or it might be iterating over it. For example:

    if not users:
      return []
    processed_users = []
    for user in users:
        processed_user = {
            'name': user['name'].strip(),
            'email': user['email'].lower(),
            'age': max(0, user['age']),
            'active': user['active']
        }
        processed_users.append(processed_user)
    return processed_users

So, if you are dealing with functions with unclear names, and missing docstrings, then you should address those issues, and the empty sequence check will automatically become unambiguous.

When Dealing with Return Value of Other Functions

The above example considers the case when you receive a sequence type object as a parameter. But what if you are calling another function and you need to check its return value for emptiness?

If you are reading such a code, and unable to tell whether the returned object is a sequence type, it again hints towards other problems in that code.

The called function should be clearly named to indicate what it does. For example if it produces some sort of a sequence, or if it performs a Boolean check.
It should also have a docstring to tell you what its return type is, whether it returns None, and if it raises any exceptions.
Based on the docstring, you should store the result of that function call in a properly named variable. Especially, if that function returns a sequence type or a Boolean type then you should name the variable well.

If you fix these issues in your code, then the emptiness check should not look ambiguous. The following is an example of this from CPython’s mimetypes module.

An example of an unambiguous Pythonic emptiness check from the mimetypes module in CPython. The function name, the variable name and the surrounding context, all clearly indicate that the extensions variable is a sequence being checked for emptiness

The self.guess_all_extensions() method name is clearly indicating that it produces some sort of a collection of extension objects
The variable name extensions also indicates that it is a collection of extensions
The indexing of the 0th item in the return statement is a clear cut signal that extensions is indeed a sequence.

Yet another example from the difflib module in CPython:

Another example of a clear emptiness check done in Pythonic style. Taken from the difflib module code in CPython. The called function name clearly tells that it returns a sequence, the variable name `codes` is also commonly used for holding sequences and the surrounding code also uses it like a list.

get_opcodes() clearly indicates that it returns a sequence of opcodes
Within the if block, the codes variable is being set to a list, which clearly tells that it is supposed to be a list.

Handling None

Now, let’s talk about None values. When you have the possibility of receiving a None value then the simple truthiness check becomes ambiguous because it will work for both an empty sequence and also the None value.

But you should always check for None using the is operator, which is also a PEP-8 recommended style. For instance, if I am calling a function which can return None, or a list, then I would do something like this:

extensions = self.guess_all_extensions(type, strict)
if extensions is None:
  # this is an error
  raise InvalidConfigError() 

if not extensions:
  # no extensions configured so we create a default one
  return DefaultExtension()

return extensions[0]

This is also a better practice because in most situations, you are very likely to handle a None value differently than an empty value, just like in the above example.

Possibility of Silent Bugs due to Type Mixups

Finally, there is the possibility of silent bugs creeping in the system because the truthiness value based emptiness check can let objects of other types pass, whereas when you use len(), it will fail with TypeError.

This is quite valid. But again, it hints towards a poorly designed system with overall poor coding style. You might say that, despite naming variables well, and having proper docstrings there is always a possibility of human errors. For example, somewhere someone writes 0, instead of [0] and it works.

I’d argue that when you face issues where wrong types of objects can be passed to your code, you’ve much bigger problems.

If you used len() to do an emptiness check and it failed with TypeError at runtime, are you always going to fix these issues after a crash in production? That sounds terrible.

You need to design your system with better engineering practices to minimize the possibility of such problems, instead of depending on runtime crashes to learn about them. Here are a few suggestions:

Adopt Type Hints and Static Type Checking

Type hints have been part of Python for a long time now. They solve these kinds of problems. You can be as explicit about the possible types of values an object can receive as possible.

And if you combine it with static type checking using mypy, you can catch many type errors ahead of time, instead of running into them in production.

Validate User Input

Real-world applications need to interface with user provided data. For instance, when handling JSON payloads as part of a REST API endpoint.

You should have a well defined schema that the clients should adhere to. Apart from that, you can also have validations to detect invalid inputs and fail early. If you let invalid inputs flow through the system, then you cannot blame anyone else.

Unit Tests

You can add unit tests to check that your code fails with a TypeError when passed an object of unsupported type. Following is an example from within CPython for doing this kind of testing:

An example of testing for unsupported types from the unit test for the select module in CPython

This not only serves as a more alive form of documentation of the types of values you are not expected to receive, but it also ensures that no one accidentally makes a change to support these types.

Using len() is not Wrong

All said, I am not against the style of using len(). Sometimes the situation demands it, such as when you are dealing with user defined classes that may implement __len__() but not __bool__().

Also, readability is very subjective, and different teams and individuals have different tastes for it. So if you use this style, it’s totally fine.

However, my primary point is that using the truthiness value of sequence types for emptiness checks is not inherently ambiguous. What often appears as ambiguity is usually rooted in poor coding style, which should be addressed directly.

Final Thoughts

Let’s round things up. Doing emptiness checks on sequences is a very fundamental and frequent operation. In Python, a language known for its simplicity and readable syntax, the idiomatic approach is to leverage the truthiness of sequences—where if not mylist: succinctly checks for emptiness, making a more elaborate len() call unnecessary unless the context demands.

For seasoned Python programmers, this style is well-recognized. However, some see it as less readable or even ambiguous. The truth is, the Pythonic style is clear and efficient. Ambiguity likely stems from other code issues, such as unclear variable names, absence of docstrings, missing type hints, lack of unit tests, and insufficient input validation.

That said, my intention here is not to claim that using len() is wrong. If you prefer it, by all means, use it. The goal of the article was to dispel the myth that the PEP-8 recommended style is unclear.

In the next article, I will explain exactly why the Pythonic style of emptiness check is many times faster than using len(), stay tuned!

Support Confessions of a Code Addict

If you find my work interesting and valuable, you can support me by opting for a paid subscription (it’s $6 monthly/$60 annual). As a bonus you get access to monthly live sessions, and all the past recordings.

Many people report failed payments, or don’t want a recurring subscription. For that I also have a buymeacoffee page. Where you can buy me coffees or become a member. I will upgrade you to a paid subscription for the equivalent duration here.

Buy me a coffee

I also have a GitHub Sponsor page. You will get a sponsorship badge, and also a complementary paid subscription here.

Sponsor me on GitHub