The F.I.R.S.T. Principles of Testing

Unit testing is one of the most useful things one do when engineering software. If done well, it could act as a safety net by catching bugs early and helps one refactor more easily. Unit testing ensures that parts of our software works as expected, but how can one create useful unit tests?

The F.I.R.S.T. principles of testing serve as something to hold over one’s head as one make sure that their tests are reliable and maintainable as time goes on. In this article I will try to explore these principles one by one.

1. Fast

For some, speed is not just a luxury but it is essential. When tests are slow, they become a chore. And when something’s a chore, we tend to avoid it. The result? We stop testing as often as we should. If the tests are fast then the feedback loop would be tight, which is especially crucial when a tricky issue is being debugged.

See the example below:

def test_addition_is_fast():
    assert add(2, 3) == 5  # This should finish in milliseconds

A good rule thumb would be that good tests uses in-memory logic while bad tests perform actual network calls or database operations. Fast tests usually test small, isolated pieces of logic. This way, they can be run after every code change.

2. Independent

Tests that depend on shared resources or on the outcome of previous tests are fragile. They can pass or fail unpredictably depending on the order they’re run. That unpredictability makes debugging difficult. This principle advises that unit tests be independent and follow a clear demarcation: given -- when -- then.

Given (or Arrange) means that all the necessary data should be provided to the test when the test is supposed to be run, regardless of the environment from which one’s testing from.

When (or Act) means that the unit tests would invoke the method to the behavior specified in the demarcation of the test.

Finally, Then (or Assert) means that unit tests describe the changes one would expect from the specified part of the test. Multiple assertions of the same object can be done as long as they share the same demarcated part of the code.

See the example below:

# Arrange
account = BankAccount()
account.deposit(100)

# Act
account.withdraw(40)

# Assert
assert account.balance == 60

Each test should create its own state and clean up afterward. It should not depend on test #2 having run successfully before it. Though if one’s test relies on a database or API, consider mocking those interactions or using fixtures that isolate test data.

3. Repeatable

A test that only passes on one’s laptop but fails on the CI server is worse than no test at all. One’s tests should produce the same result every time, regardless of the environment or time of day.

Randomness, time-based functions, and environment-specific configurations are common sources of faulty tests. Thankfully, we can use mocking to control these variables and stabilize our tests.

Consider we have the following function and test:

from datetime import datetime

def greet_user():
    current_hour = datetime.now().hour
    if current_hour < 12:
        return "Good morning"
    elif current_hour < 18:
        return "Good afternoon"
    else:
        return "Good evening"

def test_greet_user():
    assert greet_user() == "Good morning"  # This may fail if run at night

This function will behave differently depending on when it’s called, making it hard to write a consistent test. We can use unittest.mock to patch the time and make our test repeatable using the example below:

from unittest.mock import patch

@patch("your_module.datetime")
def test_greet_user_morning(mock_datetime):
    mock_datetime.now.return_value = datetime(2023, 1, 1, 9, 0, 0)  # 9 AM
    mock_datetime.now.return_value.hour = 9
    assert greet_user() == "Good morning"

4. Self-Validating

Imagine running a test and needing to scroll through logs to figure out if it passed. That’s wasted time. Tests should either pass or fail, telling one exactly what went wrong when they fail. See the example below:

def test_uppercase_conversion():
    assert to_uppercase("hello") == "HELLO"

The assertion above should either pass or raise an error. In automated pipelines, self-validating tests are non-negotiable. They either pass and let the build continue, or fail and stop it cold.

5. Thorough

Software often fails not during normal use but at the edge cases. Users do strange things. Systems break in unexpected ways. That’s why one’s test suite should stress the limits, not just the center. If one can imagine it breaking, write a test for it.

Normal input:

assert divide(10, 2) == 5

Edge cases:

assert divide(10, 0) == "Cannot divide by zero"

Large inputs:

assert process_large_list([1] * 10**6) == expected_output

Invalid inputs:

with pytest.raises(TypeError):
    divide("ten", 2)

A good rule of thumb would be: Code coverage is just a number. What really matters is covering meaningful variations in behavior, not just hitting every line once. Bugs are sneaky. They often slip through in: Rarely executed code, complex, branching logic, optional or default arguments, poorly understood features, legacy systems. These are the areas that deserve extra testing love.

Recap

Principle What It Means
Fast Tests should complete in seconds, so developers actually run them
Isolated Each test stands alone, without relying on external systems
Repeatable Produces consistent results in any environment
Self-Validating Automatically tells one if it passed or failed
Thorough Tests all types of inputs, not just the obvious ones


By grounding one’s test suite in these principles, one creates a development environment that’s safer, faster, and more resilient. The better one’s tests, the more fearlessly one can build.

Discussion and feedback