The Pythonic Emptiness
Why the Pythonic way of doing emptiness check on sequences is not necessarily ambiguous in most cases
When in Rome, do as the Romans do
How do you check a list for emptiness in Python? This simple question often sparks considerable debate, as I discovered firsthand.
Many Python programmers use the len()
built-in function for this task, but a large population leverages the truthiness value of sequence type objects to achieve this more succinctly. The latter style is also recommended as part of PEP-8, and considered more Pythonic.
Apart from being Pythonic, this style also offers much better performance—an important factor for such a frequent operation:
Despite its merits, some programmers find this style ambiguous and less readable. I argue that this perceived ambiguity often stems from poor coding style and inadequate engineering practices, rather than the idiom itself. These underlying issues need to be addressed instead of simply covering them with inappropriate solutions.
In this article, I am going to give a few arguments in support of the Pythonic way of doing emptiness checks, and show that the apparent ambiguity usually reflects broader code quality issues rather than the check itself.
Note: Using
len()
itself isn't wrong. My aim in this article is to demystify the perception that PEP-8's style is unclear; usually, ambiguity arises from poor coding practices rather than the idiom itself.
Why is Emptiness Check using Truthiness More Pythonic?
One of the reasons why the Python language designers adopted this style must be because it is much more efficient. They would have understood this performance difference and considered this to be an important factor for a frequently used operation such as emptiness check.
Of course, performance is not the only criteria. PEP-8 recommends a few more idioms which are slower than their alternatives. Simplicity and readability are preferred over performance. The designers of Python clearly believe that this style is simpler than using len()
, and just as readable.
Let’s talk about why this style is simpler and also equally readable.
Simple is Better than Complex
One of the principles of the Pythonic style is: simple is better than complex. It means that if you can do something using a simpler tool then use it. When you use a more complicated tool to solve a simple problem, it misleads the reader to make the wrong assumptions.
As a concrete example, consider the following code which is processing a user based on their status. If a user is active
, then that automatically means that they are subscribed
. However, this code is extra cautious and unnecessarily doing the is_subscribed
check for active
users.
As a result, anyone reading this will make a false assumption that there must be a case where an active user may not be subscribed. And then they will write their code with this same assumption.
def process_user_status(user):
if user.status == 'active':
if user.is_subscribed:
# Process active, subscribed user
print(f"Processing active user: {user.name}")
else:
# Process active, unsubscribed user
print(f"Processing active but unsubscribed user: {user.name}")
elif user.status == 'suspended':
# Process suspended user
print(f"Processing suspended user: {user.name}")
elif user.status == 'deleted':
# Process deleted user
print(f"Processing deleted user: {user.name}")
Similarly, when you use len()
to check a sequence for emptiness, you are reaching out for a more powerful tool than necessary. As a result it may make the reader question the intent behind it, e.g. if there is a possibility of handling objects other than a built-in sequence type here?
Readability Counts
I’d also argue that the PEP-8 style of emptiness check is not less readable, if you know Python well. Yes, for a newcomer to Python who is still finding their way around the language’s syntax and semantics, it may look weird.
Python has been designed such that every object has a truthy value within a Boolean context (i.e. within an if or while condition). The built-in objects have well defined truthy values which are also documented. Most good introductory books also cover this aspect of the syntax.
Here are the truth values for the built-in types:
The zero value of all numeric types evaluates to
False
:0
,0.0
,0j
etc.Empty sequences evaluate to
False
: things like lists, tuples, sets, dicts, stringsNone
evaluates toFalse
Everything else evaluates to
True
, unless they override the behavior via the__bool__()
method.
This means that when you read a line such as:
if not items:
# some code
You should be able to infer that items
must be a sequence (the name indicates that), and it is being checked for emptiness (it’s inside a Boolean context).
See the official documentation about truth value testing for more details.
Upcoming Live Session
Ambiguity Concerns
I understand that many people like to make their own minds about what is readable, instead of taking PEP-8 as the Gospel. So let me address the concerns about the ambiguity of the Pythonic way of doing emptiness checks.
There are two concerns here that people generally raise about it:
Using
if not mylist
is ambiguous whileif len(mylist) == 0
is clearer, because in the former case it may not be clear if we are testing a Boolean flag, a sequence, or something else.When you do an empty check using
if not mylist
, it will silently pass for objects of other types such as0
,None
etc., leading to bugs.
I believe that whenever you find this style of code ambiguous, it is more likely due to the surrounding code smells. Let me address what kind of smells I’m talking about.
Poorly Named Variables
One of the primary reasons for finding the PEP-8 style of emptiness check ambiguous is poorly named variables, and not the check itself. Variables should have meaningful names that also indicate their underlying type, it doesn’t matter if the language is statically typed or dynamically typed. The sequence and Boolean types should especially stand out just based on their names.
Here are a few conventions that I’ve seen in most high quality code bases:
Collection type objects should be easy to tell from their names. For example:
cars
(a collection of cars),customer_list
(a list of customer objects),user_item_map
(a map of user and item objects),nodes
(collection of node objects) etc.Integer typed values are harder to name well but there are a few conventions many projects follow. For example:
user_count
(count of users),nproceses
(number of processes),room_size
,file_size_bytes
etc.Boolean flags are always easy to tell if named well:
is_shipped
,has_failed
,do_retry
, etc.
String typed variables are harder to differentiate from other typed objects just based on names and that indeed becomes a source of nasty bugs in dynamically typed languages.
So, if you name your variables well, you can almost always tell when you are doing an empty sequence check on a list versus on another type. For instance:
if not items:
# handle empty items
# else do something with items
Contrast this with the case when checking a Boolean flag:
if not is_subscribed:
# handle users without subscription
If you are concerned about the None
value, you should always check for None
using the is
operator. But I will talk about it a bit later.
Poor Function Names, Missing Docstrings and Lack of Type Hints
If you are still not convinced, then let me add to the previous point. You will never read an emptiness check in isolation, there is always going to be the surrounding context to help you understand the code.
For instance, if that sequence is being passed as a function argument, then the function name, docstring and type hint should clearly signal that this argument is a list. See the following example:
def process_users(users):
"""
Processes a list of user objects.
Args:
users (list): A list of user objects with fields:
- 'name' (str): The user's name.
- 'email' (str): The user's email address.
- 'age' (int): The user's age.
- 'active' (bool): Whether the user is active or not.
Returns:
list: A list of processed user objects.
"""
if not users:
return []
Here the argument name users
indicates that it is some sort of a sequence of user objects, and the docstring clearly mentions that it is a list.
Often, after an empty check, the else part will be processing that sequence. For example it might be slicing or indexing the sequence, or it might be iterating over it. For example:
if not users:
return []
processed_users = []
for user in users:
processed_user = {
'name': user['name'].strip(),
'email': user['email'].lower(),
'age': max(0, user['age']),
'active': user['active']
}
processed_users.append(processed_user)
return processed_users
So, if you are dealing with functions with unclear names, and missing docstrings, then you should address those issues, and the empty sequence check will automatically become unambiguous.
When Dealing with Return Value of Other Functions
The above example considers the case when you receive a sequence type object as a parameter. But what if you are calling another function and you need to check its return value for emptiness?
If you are reading such a code, and unable to tell whether the returned object is a sequence type, it again hints towards other problems in that code.
The called function should be clearly named to indicate what it does. For example if it produces some sort of a sequence, or if it performs a Boolean check.
It should also have a docstring to tell you what its return type is, whether it returns None, and if it raises any exceptions.
Based on the docstring, you should store the result of that function call in a properly named variable. Especially, if that function returns a sequence type or a Boolean type then you should name the variable well.
If you fix these issues in your code, then the emptiness check should not look ambiguous. The following is an example of this from CPython’s mimetypes module.
The
self.guess_all_extensions()
method name is clearly indicating that it produces some sort of a collection of extension objectsThe variable name
extensions
also indicates that it is a collection of extensionsThe indexing of the 0th item in the return statement is a clear cut signal that extensions is indeed a sequence.
Yet another example from the difflib module in CPython:
get_opcodes()
clearly indicates that it returns a sequence of opcodesWithin the if block, the
codes
variable is being set to a list, which clearly tells that it is supposed to be a list.
Handling None
Now, let’s talk about None
values. When you have the possibility of receiving a None
value then the simple truthiness check becomes ambiguous because it will work for both an empty sequence and also the None
value.
But you should always check for None
using the is
operator, which is also a PEP-8 recommended style. For instance, if I am calling a function which can return None
, or a list,
then I would do something like this:
extensions = self.guess_all_extensions(type, strict)
if extensions is None:
# this is an error
raise InvalidConfigError()
if not extensions:
# no extensions configured so we create a default one
return DefaultExtension()
return extensions[0]
This is also a better practice because in most situations, you are very likely to handle a None
value differently than an empty value, just like in the above example.
Possibility of Silent Bugs due to Type Mixups
Finally, there is the possibility of silent bugs creeping in the system because the truthiness value based emptiness check can let objects of other types pass, whereas when you use len()
, it will fail with TypeError
.
This is quite valid. But again, it hints towards a poorly designed system with overall poor coding style. You might say that, despite naming variables well, and having proper docstrings there is always a possibility of human errors. For example, somewhere someone writes 0
, instead of [0]
and it works.
I’d argue that when you face issues where wrong types of objects can be passed to your code, you’ve much bigger problems.
If you used len()
to do an emptiness check and it failed with TypeError
at runtime, are you always going to fix these issues after a crash in production? That sounds terrible.
You need to design your system with better engineering practices to minimize the possibility of such problems, instead of depending on runtime crashes to learn about them. Here are a few suggestions:
Adopt Type Hints and Static Type Checking
Type hints have been part of Python for a long time now. They solve these kinds of problems. You can be as explicit about the possible types of values an object can receive as possible.
And if you combine it with static type checking using mypy, you can catch many type errors ahead of time, instead of running into them in production.
Validate User Input
Real-world applications need to interface with user provided data. For instance, when handling JSON payloads as part of a REST API endpoint.
You should have a well defined schema that the clients should adhere to. Apart from that, you can also have validations to detect invalid inputs and fail early. If you let invalid inputs flow through the system, then you cannot blame anyone else.
Unit Tests
You can add unit tests to check that your code fails with a TypeError when passed an object of unsupported type. Following is an example from within CPython for doing this kind of testing:
This not only serves as a more alive form of documentation of the types of values you are not expected to receive, but it also ensures that no one accidentally makes a change to support these types.
Using len() is not Wrong
All said, I am not against the style of using len()
. Sometimes the situation demands it, such as when you are dealing with user defined classes that may implement __len__()
but not __bool__()
.
Also, readability is very subjective, and different teams and individuals have different tastes for it. So if you use this style, it’s totally fine.
However, my primary point is that using the truthiness value of sequence types for emptiness checks is not inherently ambiguous. What often appears as ambiguity is usually rooted in poor coding style, which should be addressed directly.
Final Thoughts
Let’s round things up. Doing emptiness checks on sequences is a very fundamental and frequent operation. In Python, a language known for its simplicity and readable syntax, the idiomatic approach is to leverage the truthiness of sequences—where if not mylist:
succinctly checks for emptiness, making a more elaborate len()
call unnecessary unless the context demands.
For seasoned Python programmers, this style is well-recognized. However, some see it as less readable or even ambiguous. The truth is, the Pythonic style is clear and efficient. Ambiguity likely stems from other code issues, such as unclear variable names, absence of docstrings, missing type hints, lack of unit tests, and insufficient input validation.
That said, my intention here is not to claim that using len()
is wrong. If you prefer it, by all means, use it. The goal of the article was to dispel the myth that the PEP-8 recommended style is unclear.
In the next article, I will explain exactly why the Pythonic style of emptiness check is many times faster than using len(), stay tuned!
Support Confessions of a Code Addict
If you find my work interesting and valuable, you can support me by opting for a paid subscription (it’s $6 monthly/$60 annual). As a bonus you get access to monthly live sessions, and all the past recordings.
Many people report failed payments, or don’t want a recurring subscription. For that I also have a buymeacoffee page. Where you can buy me coffees or become a member. I will upgrade you to a paid subscription for the equivalent duration here.
I also have a GitHub Sponsor page. You will get a sponsorship badge, and also a complementary paid subscription here.