Diff-based testing

AI Disclaimer

I’ve used an LLM to generate the code examples in this article otherwise it would have never seen the light of the day (because that’s how lazy I am). The goal of the examples is to give the reader an idea of what the tests I’m talking about look like (since our industry has completely broken test taxonomy for the foreseeable future) and by no means meant to be accurate or production-ready.

I’ve also used an LLM to generate the article structure from my notes. However, after doing so, I’ve reviewed it carefully and made the necessary changes so it reflects my thoughts, opinion, and style. Any resemblance to any other articles published anywhere, if any, is not intentional.

It’s the first time I’m experimenting with LLM-assisted writing and my goal was to see what the experience and end result would be.

Feedback is always welcome.

Introduction

Effective and efficient  testing is crucial to ensure quality, reliability, and smooth functionality in software applications as well as a healthy software delivery process. In this article, I outline best practices and methodologies for different testing types (or categories) based on my experience testing backend applications. I call this approach diff-based testing.

Diff-based testing advocates that in order to avoid repetition (efficiency) and ensure high-quality test coverage (effectiveness), each category (or layer) of tests must focus on what can be tested only through that category. I call it diff-based testing because the main idea behind it is to have each layer of tests validate only the behavior that can’t be validated in the previous layer. It’s heavily influenced by the test pyramid paradigm however it’s more opinionated. Diff-based testing also attempts to minimize the time spent running all test suites by having as much as possible of the test coverage being achieved by the fastest test types.

For example, diff-based testing states that unit tests must focus on validating the core business domain logic whereas integration testing must focus on validating only what can’t be covered by unit tests like, for example, HTTP requests and responses, database operations, and messaging, to name a few.

Following this approach will naturally lead to a test pyramid where most of the tests are going to be implemented as unit tests which are the fastest types of tests to execute since they are executed in-memory without exercising any IO operations.

Imagine we have a Calculator API. If we can validate the sum operation behaves correctly using unit tests, why should we validate the same behavior again using integration or any other type of test? That would be a waste of time and energy due to having to implement and maintain the same test logic in multiple places. That means if you’re building a library with no IO operations whatsoever, all you need is unit tests and its extensions like mutation and property testing (and maybe performance testing depending on your library use case).

But since this article is focused on testing backend applications like REST APIs, I’ll be covering the other category of tests I believe are necessary to cover all of the functionality usually implemented by this type of application.

Without further ado, here we go.

Unit Testing

Unit testing focuses on validating the correctness of individual components of business logic without requiring us to run the entire application. These tests aim to ensure that each part of the code works as expected in isolation. In order to maintain precise and efficient coverage, which are the goals of diff-based testing, we must ensure core business domain logic is validated only through unit tests, as mentioned before. 

Unit tests should be limited to public methods and functions, as this approach provides clear insight into how well each component performs independently. Focusing on public methods helps ensure that tests remain aligned with how the code is used in practice, promoting better maintainability. Private methods or functions will be tested indirectly when testing the public ones.

By concentrating on the external behavior rather than internal implementation details, developers can refactor code with confidence, knowing the tests will continue to validate core functionality without needing constant updates.

To ensure clarity and maintainability, each class or module being tested should have its corresponding test class or module. Comprehensive coverage, including both typical (happy paths) and edge cases, ensures that a wide range of potential issues are captured early. Testing edge cases is crucial to identify behavior that might break under less common scenarios, thereby strengthening the reliability of the component.

Dependencies should not be tested directly within unit tests; instead, they should be mocked or stubbed to maintain the focus on the specific logic of the component being tested. Incorporating unit tests as part of the same code repository ensures cohesion and enables seamless code management.

Finally, unit tests can lead to better design as we make an effort to ensure our code is testable and modularized.

Example:

// Unit test for addition operation in a Calculator REST API

public class CalculatorServiceTest {

    @Test
    public void testAddition() {

        CalculatorService calculatorService = new CalculatorService();

        int result = calculatorService.add(3, 7);

        assertEquals(10, result, "Addition should return the correct sum.");
    }
}

Integration Testing

Integration testing ensures that the application can communicate and interact effectively with its external dependencies like databases, messaging platforms, external APIs, etc. Again, it is crucial to note that the core domain business logic should not be validated through integration tests since those should already have been validated through unit tests.

This ensures that integration tests remain focused on interactions and data flow rather than duplicating the work of unit tests. This type of testing is essential for verifying that the integration between architectural components works as intended, providing confidence that the system under test behaves as expected when integrated with its dependencies.

These tests should confirm that valid requests produce appropriate responses, while ensuring that anticipated errors, such as a database going offline or receiving invalid inputs, are handled gracefully. 

Additionally, they should verify that expected error messages are returned for various issues, including invalid routes, parameters, or contracts. To simulate real-world scenarios without using actual production data, stubbing external services and databases is recommended. The test data should resemble production conditions as closely as possible to ensure realistic results.

Each functionality being tested should have its designated test class or module to keep the tests organized and maintainable. Integration tests should be able to create an application context or its equivalent. Integration tests must be fully automated to ensure they can be executed as part of the CI/CD pipeline, supporting continuous integration and delivery.

Another benefit of integration tests is to allow the validation of the integration of a system with its dependencies before it’s deployed to the infrastructure where it’s going to be executed. This allows the delivery team to obtain quick feedback on the behavior of the application and potential defects early in the delivery process.

Example:

// Integration test using Testcontainers and WireMock for the Calculator API

// This example assumes we're testing an API which depends on an

// external notification API for notifying the user of completed operations

// This API also publishes an event to Kafka for every operation performed.

@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT)
@EmbeddedKafka(partitions = 1, topics = {"operation-performed"})
public class CalculatorIntegrationTest {

    @Autowired
    private TestRestTemplate restTemplate;

    @Autowired
    private KafkaTemplate<String, String> kafkaTemplate;

    @BeforeEach
    public void setupMocks() {
        // WireMock setup for external notification service

        WireMockServer wireMockServer = new WireMockServer(8081);

        wireMockServer.start();

        wireMockServer.stubFor(post(urlEqualTo("/notify"))

            .willReturn(aResponse().withStatus(200)));
    }

    @Test
    public void testAdditionEndpoint() {
        String response = this.restTemplate.postForObject("/calculate/add", new OperationRequest(3, 7), String.class);

        assertThat(response).isNotNull();

        assertThat(response).contains("operation", "result");

        assertThat(responseEntity.getStatusCode()).isEqualTo(HttpStatus.OK);

        // Verify that an OperationPerformed event is consumed from Kafka

        Consumer<String, String> consumer = createKafkaConsumer();

        consumer.subscribe(Collections.singletonList("operation-performed"));

        ConsumerRecord<String, String> record = KafkaTestUtils.getSingleRecord(consumer, "operation-performed");

        assertThat(record.value()).contains("operation", "addition", "result", "10");

        consumer.close();
    }

    @Test
    public void testDivideByZero() {

        ResponseEntity<String> response = this.restTemplate.postForEntity("/calculate/divide", new OperationRequest(10, 0), String.class);

        assertThat(response.getStatusCode()).isEqualTo(HttpStatus.BAD_REQUEST);

        assertThat(response.getBody()).contains("Cannot divide by zero");
    }
}

End-to-End (E2E) Testing

End-to-end testing aims to validate the entire application flow and ensure that all components in the system interact seamlessly. However, traditional E2E tests can be complex, fragile, and time-consuming. E2E tests often face challenges such as high complexity, flakiness, and significant maintenance overhead. 

Beyond being time-consuming, those tests are prone to failures at the slight changes in the system’s infrastructure, test data, and dependencies, making them difficult to rely on for continuous integration.

To address these limitations, contract testing can be a more efficient alternative. Contract tests offer focused validation of interactions between services without the extensive infrastructure or fragile nature of full E2E testing. For these reasons, it is often better to replace them with more focused contract tests that, together with unit and integration tests, provide the same assurances with less overhead.

I won’t be presenting an example since I consider such tests a bad practice or anti-pattern.

Contract Testing

Contract testing ensures that different components of an architecture, such as services and clients, interact correctly based on predefined agreements or “contracts.” These tests validate that both the producer (data provider) and the consumer (data user) adhere to these contracts. This includes both synchronous and asynchronous communication between components.

The consumer defines the data structure it needs, while the producer guarantees it can deliver this data format. By versioning contracts alongside the codebase and storing them in a shared repository, both sides can stay in sync. The most well-known contract test framework is Pact.

Contract tests should be executed at every stage of the CI/CD pipeline, validating published contracts in each environment e.g., Dev, QA, Pre-Prod, and Prod, ensuring that changes in one component do not unexpectedly impact another, keeping producers and consumers aligned.

None of the other test categories covered in this article can provide the guarantees of contract tests. Contract tests are essential when implementing distributed systems.

Example:

// Consumer-side contract test using Pact

@ExtendWith(PactConsumerTestExt.class)
public class CalculatorConsumerContractTest {

    @Pact(consumer = "CalculatorConsumer", provider = "CalculatorProvider")

    public Pact createPact(PactDslWithProvider builder) {

        return builder
            .given("Calculator provides addition operation")
            .uponReceiving("A request for addition")
                .path("/calculate/add")
                .method("POST")
                .body("{\"num1\": 3, \"num2\": 7}")
            .willRespondWith()
                .status(200)
                .body("{\"operation\": \"addition\", \"result\": 10}")
            .toPact();
    }

    @Test
    @ConsumerPactTest(pactMethod = "createPact")
    public void testConsumerPact() {

        RestTemplate restTemplate = new RestTemplate();

        String response = restTemplate.postForObject("http://localhost:8080/calculate/add", new OperationRequest(3, 7), String.class);

        assertThat(response).contains("operation", "addition", "result");
    }
}

// Producer-side contract test using Pact

@ExtendWith(SpringExtension.class)

@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.DEFINED_PORT)
public class CalculatorProviderContractTest {

    @BeforeEach
    void before(PactVerificationContext context) {
        context.setTarget(new HttpTestTarget("localhost", 8080));
    }

    @TestTemplate
    @ProviderPactVerification("CalculatorProvider")
    void verifyPact(PactVerificationContext context) {
        context.verifyInteraction();
    }
}

Exploratory Testing

Exploratory testing is performed to examine aspects of the application that are challenging to automate, such as user interface behavior and user experience. This testing type relies on the skills and intuition of QA professionals to identify unexpected behaviors and potential usability issues.

Conducted in a controlled QA environment, exploratory testing leverages the creativity and expertise of testers to investigate various scenarios. This approach helps uncover issues that structured test scripts might miss, ensuring a more holistic evaluation of the software.

Smoke Testing

Smoke testing serves as a quick validation method to verify that a recent deployment was successful. It is a lightweight test that checks basic application functionality without diving into deeper, more detailed testing.

This testing type focuses on ensuring that the application is accessible, responding as expected, and available at the correct routes. Typically performed after deployments in UAT and production, smoke tests provide immediate feedback on deployment success.

At this level we want to validate what can’t be validated by the integration and unit tests, i.e., that our application is capable of running in the provisioned infrastructure and can talk to the real dependencies deployed to that environment.

Example:

// Smoke test to verify basic functionality of the Calculator API after deployment

@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT)

public class CalculatorSmokeTest {

    @Autowired
    private TestRestTemplate restTemplate;

    @Test
    public void testServiceIsUp() {
        ResponseEntity<String> response = restTemplate.getForEntity("/health", String.class);
        assertThat(response.getStatusCode()).isEqualTo(HttpStatus.OK);
        assertThat(response.getBody()).contains("status", "UP");
    }
}

Synthetic Monitoring

Synthetic monitoring involves running a subset of automated tests in a live production environment to ensure the system continues to work as expected. This proactive measure helps detect issues before users encounter them.

It involves running a subset of automated tests in a live production environment to ensure the system continues to work as expected. This proactive measure helps detect issues before users encounter them.

These tests use innocuous data, such as fake client profiles, dummy accounts, or synthetic transactions, that do not interfere with real transactions or analytics. By integrating synthetic tests with monitoring tools, organizations can receive alerts if these tests detect problems, allowing for quick intervention.

Example:

// Synthetic monitoring test example to run a health check that performs a synthetic operation in production

@SpringBootTest
public class CalculatorSyntheticMonitoringTest {

    @Autowired
    private RestTemplate restTemplate;

    @Test
    public void testProductionHealthCheckWithSyntheticOperation() {
        // URL for a health endpoint that performs a synthetic operation
        String syntheticTestUrl = "https://production-url.com/health";
        ResponseEntity<String> response = restTemplate.getForEntity(syntheticTestUrl, String.class);
        assertThat(response.getStatusCode()).isEqualTo(HttpStatus.OK);
        assertThat(response.getBody()).contains("status", "up");
    }
}

// Controller code for the health endpoint with synthetic operation flag

@RestController
public class HealthController {

    @Autowired
    private CalculatorService calculatorService;

    @GetMapping("/health")
    public ResponseEntity<Map<String, Object>> performSyntheticOperation() {
        Map<String, Object> response = new HashMap<>();
        boolean isSynthetic = true;
        int result = calculatorService.add(5, 10, isSynthetic); //won't publish an event
        response.put("operation", "addition");
        response.put("result", result);
        response.put("status", "UP");
        return ResponseEntity.ok(response);
    }
}

Performance Testing

Performance testing aims to assess how the system performs under expected and peak load conditions. Shifting performance testing to the left—incorporating it early during the development phase—helps identify and resolve potential bottlenecks sooner.

Incorporating performance tests as part of the continuous delivery pipeline ensures that each new version of the software meets performance benchmarks, preventing performance degradation over time.

Performance is usually considered a non-functional requirement or, as I prefer, a cross-functional requirement. In the book Evolutionary Architecture, the authors present the concept of Fitness Functions which are a way to ensure such requirements are met throughout the lifecycle of the system’s architecture.

When implementing fitness functions, I believe it’s totally fine to consider this category of tests as Fitness Functions or cross-functional tests, where Performance Tests are just a subset of this larger category. Logging and security, to name a few, are other potential subset of tests that would belong in this category.

Example:

// Performance test using JUnit and a load testing approach for the REST API endpoint

@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT)

public class CalculatorPerformanceTest {

    @Autowired
    private TestRestTemplate restTemplate;

    @Test
    public void testAdditionEndpointPerformance() {
        long startTime = System.nanoTime();

        for (int i = 0; i < 1000; i++) {
            ResponseEntity<String> response = restTemplate.postForEntity("/calculate/add", new OperationRequest(i, i + 1), String.class);
            assertThat(response.getStatusCode()).isEqualTo(HttpStatus.OK);
        }

        long endTime = System.nanoTime();
        Duration duration = Duration.ofNanos(endTime - startTime);
        assertThat(duration.getSeconds()).isLessThan(10).withFailMessage("Performance test failed: Took too long to complete.");
    }
}

Mutation Testing

Mutation testing is a method of testing the test suite itself. By introducing small changes (mutations) to the code, this practice ensures that the existing tests can detect and fail appropriately when the code is altered.

Mutation testing helps assess the effectiveness and coverage of the test suite, revealing areas where additional tests may be necessary to improve robustness.

Those tests are usually performed by a library or framework which mutates the application code and then run an existing suite of tests with the goal of validating the test suite fails when it should.

I don’t consider mutation testing as a category of its own. I think of it as an extension to unit testing.

Example:

// Mutation testing example using PIT (Pitest) library

public class CalculatorMutationTest {

    @Test
    public void testAddition() {
        CalculatorService calculatorService = new CalculatorService();

        int result = calculatorService.add(2, 3);

        assertEquals(5, result, "Mutation test: ensure the addition logic is intact.");
    }

    // Note: The actual mutation testing is conducted using PIT by running

    // the PIT Maven plugin or configuring it in your build tool.

    // This code example represents a standard unit test that PIT will mutate

    // to check if the test fails when the code is altered.
}

// To run mutation testing with PIT, add the following to your Maven POM file:

// <plugin>

//     <groupId>org.pitest</groupId>

//     <artifactId>pitest-maven</artifactId>

//     <version>1.6.8</version>

//     <configuration>

//         <targetClasses>your.package.name.*</targetClasses>

//         <targetTests>your.package.name.*Test</targetTests>

//     </configuration>

// </plugin>

Property Testing

Property testing focuses on verifying that the system holds true to specified properties over a range of inputs. This type of testing is designed to explore edge cases and input variations that a developer may not have initially considered.

In property testing, instead of specifying exact input and output pairs, the properties or invariants that the function should uphold are defined. The test framework then generates random input data and checks that the properties are always met. This method ensures that the software can handle a broader range of conditions and helps reveal hidden bugs that traditional examples-based testing might miss.

Property testing complements unit and integration tests by pushing beyond predetermined cases and validating the system’s behavior in unexpected scenarios. Integrating property testing into the existing testing framework can be done by selecting tools that support property-based testing, such as QuickCheck or Hypothesis, and incorporating them into the test suite.

Developers should start by identifying key properties that functions or modules should satisfy and implement these as tests. This approach helps ensure that, across a variety of inputs, the software consistently meets the defined invariants, bolstering the overall reliability of the codebase.

By incorporating property testing, developers can gain greater confidence in the robustness of their code and discover vulnerabilities early in the development cycle.

Similar to mutation testing, I don’t consider property testing as a category of its own. I also think of it as an extension to unit testing.

Example:

// Property testing example using a property-based testing library

public class CalculatorPropertyTest {
    @Property
    public void testAdditionProperties(@ForAll int a, @ForAll int b) {
        CalculatorService calculatorService = new CalculatorService();
        int result = calculatorService.add(a, b);

        assertThat(result).isEqualTo(a + b);

        assertThat(result).isGreaterThanOrEqualTo(a).isGreaterThanOrEqualTo(b);
    }
}

Testing Multi-Threaded and Asynchronous Code

I don’t consider multi-threaded and asynchronous tests as a separate category of testing, but since I’ve seen many teams struggle with it, I believe it deserves its own section.

Testing multi-threaded and asynchronous code presents unique challenges due to issues like non-determinism, where the order of execution can vary between runs. This variability can make tests flaky and difficult to trust.

To mitigate these challenges, it is essential to design tests that focus on the individual behavior performed by each thread or asynchronous task. A rule of thumb I use is to ensure the scope of a given test scenario ends at the boundary of a thread or async call. A way to detect if there’s something wrong when testing multi-threaded or async behavior is if there’s a need to add a wait or sleep call in order for the test to be successful.

Non-determinism can also be avoided by using synchronization mechanisms or testing frameworks that simulate controlled environments, ensuring that the tests remain predictable. Additionally, tests should isolate and validate smaller, independent units of work to avoid race conditions.

By adopting these practices, developers can build confidence that tests that validate multi-threaded and asynchronous code won’t result in flaky and untrustworthy tests.

Example:

@Testcontainers
public class KafkaPublisherIntegrationTest {

    @Container
    private static KafkaContainer kafkaContainer = new KafkaContainer("confluentinc/cp-kafka:latest");

    private static KafkaProducer<String, String> producer;
    private static KafkaConsumer<String, String> consumer;

    @BeforeAll
    public static void setUp() {
        kafkaContainer.start();

        // Producer properties
        Properties producerProps = new Properties();
        ...
        producer = new KafkaProducer<>(producerProps);

        // Consumer properties
        Properties consumerProps = new Properties();
        ...
        consumer = new KafkaConsumer<>(consumerProps);
        consumer.subscribe(Collections.singletonList("test-topic"));
    }

    @AfterAll
    public static void tearDown() {
        producer.close();
        consumer.close();
        kafkaContainer.stop();
    }

    @Test
    public void testEventPublication() throws ExecutionException, InterruptedException {
        String topic = "test-topic";
        String key = "test-key";
        String value = "test-value";

        // Publish the event to Kafka
        Future<RecordMetadata> future = producer.send(new ProducerRecord<>(topic, key, value));
        RecordMetadata metadata = future.get();

        assertNotNull(metadata);
        assertEquals(topic, metadata.topic());
    }

    @Test
    public void testEventConsumption() {
        String topic = "test-topic";
        String key = "test-key";
        String value = "test-value";

        // Publish an event to set up the test
        producer.send(new ProducerRecord<>(topic, key, value));
        producer.flush(); // Ensure the event is sent before consuming

        // Poll the consumer to validate the message was published
        ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(5));
        assertEquals(1, records.count());
        
        ConsumerRecord<String, String> record = records.iterator().next();
        assertEquals(key, record.key());
        assertEquals(value, record.value());
    }
}

General Best Practices for Automated Testing

To maintain the reliability of automated testing, flaky tests should be fixed, quarantined, or removed immediately. This practice prevents inconsistent failures that can erode trust in the test suite and the pipeline. Tests that fail inconsistently compromise trust in the test suite and the CI/CD pipeline. Failing tests should stop the pipeline until they are resolved, ensuring that issues are not overlooked.

Running a subset of tests locally before committing code helps developers identify potential issues early and prevents surprises during CI/CD runs. Lastly, tests should never be commented out, ignored, or removed to pass a failing pipeline, as this quick-fix approach undermines the integrity of the testing process and can mask underlying issues.

By adhering to these best practices, development teams can create robust, maintainable, and high-quality software products while minimizing risks and ensuring a seamless user experience.

Conclusion

Diff-based testing is an approach to testing that is heavily based on the test pyramid paradigm but goes one step further and states that:

  1. We should always test a functionality using the fastest type of test possible, e.g., unit tests over integration tests, integration tests over smoke tests, and so on.
  2. We shouldn’t duplicate test logic in different layers of tests. Each layer should add coverage to the behavior that couldn’t be tested in the previous layer.

By doing so, we ensure we end up with a healthy suite of tests that’s both effective, efficient, easy to execute, and maintain.

Optimizing Software Development

Introduction

The goal of this article is to demonstrate how we can apply the principles of Mathematical Optimization to improve the software development process, but first, let’s step back a little bit to take a look at some optimization problems.

A Cautionary Tale

A school wanted to increase literacy among young students. They decided to create a program that would offer one dollar for each book read. After a couple of days of running the program, the teachers were impressed with the success of the initiative. Each child had read ten books per day on average.

However, after performing a more in-depth analysis they noticed a problem. The total number of pages read by each student was way below the expected. The children with their amazing brains had quickly found a way to optimize their gains. All books read were less than ten pages long.

“Tell me how you measure me, and I will tell you how I will behave.”

E.M. Goldratt (1990) The Haystack Syndrome: Sifting Information Out of the Data Ocean. North River Press, Croton-on-Hudson, NY, 1990

The anecdote above is a cautionary tale of how systems tend to optimize themselves (increase of entropy) towards a state of equilibrium. In physical systems this optimization has limits dictated by one or more dimensions like space, time, mass, velocity, temperature, energy and so on.

In the example above, the system optimized itself against a single dimension: book count. The fewer pages a book has, the more books I’ll read. A better dimension (or metric) might have been page count. Although, one could argue that choosing page count as the single metric to be evaluated could lead to students reading few large books which could decrease diversity of authors, topics and styles the students would be exposed to.

One solution to this problem would be to pick both book and page counts as metrics (or dimensions) to be evaluated. A formula to compute how much a student would get paid by the end of the initiative could have the shape:

Reward=b\log\left(\frac{p}{b}\right)

where b and p are the number of books and pages read respectively.

With the formula above we tie both variables together so the students would be incentivized not just to read as many books as possible but to also keep a high average of pages per book (the log function is only used to put a cap on how much money a student can make).

A More Classical Problem

A famous optimization problem is to compute the dimensions of an aluminum can in order to minimize material cost and maximize its volume. If you tried to minimize material cost only without taking volume into consideration you’d end up with the tiniest can your machines could fabricate. Conversely, if you tried to maximize the can volume without taking any other dimensions into consideration you’d end up with the largest can your machines could manufacture.

If you recall from your Calculus classes you probably remember the solution is to combine the formulas for the area and volume of the can, creating a function where the area is the dependent variable. Then you find the function’s minimum by using the first and second derivatives, but I’m sure you already knew that.

A=2\pi{r}{h}+2\pi{r}^2
(area of the cylinder)

V=\pi{r}^2{h}
(volume of the cylinder)

A=2\pi{r}\frac{V}{\pi{r^2}}+2\pi{r}^2
(area in terms of the volume)

I’ll leave the rest of the solution as an exercise to the reader. The main takeaway from this example is the idea of combining two dimensions (area and volume) in order to optimize both of them at the same time.

The point I’m trying to make here is that if we’re not careful with the dimensions we choose to evaluate a system and/or its components (software, people, project, business, etc) we might end up obtaining unexpected and/or undesired results. Let’s take a look at other optimization examples, but this time applied to the software development practice.

Organizational Structure and Architecture

Conway’s Law states that organizations design systems which mirror their own communication structure.

Imagine the scenario where development, infrastructure and security teams are siloed from each other. Let’s also assume the development teams are measured by the number of features delivered, the infrastructure team is measured by the number of incidents in production and total cost of ownership, and the security team is measured by the total number of incidents. What’s likely to happen in this scenario?

Well, chaos, for sure! Since the development teams are only concerned with the number of features being delivered, they’re more than likely to ignore quality, cost and security concerns, to name a few. This in turn will result in more defects in production which will burden the infrastructure team.

The infrastructure team on the other hand, in an effort to minimize incidents as well as costs, would probably come up with a very rigid process for provisioning new infrastructure, since more infrastructure means higher costs as well as a higher likelihood of something going wrong. This would have a direct impact on the development teams since getting that new server for that new functionality could take a long time (if ever approved).

In a similar fashion, the security team would tend to “lock everything down” in an effort to minimize the chance of incidents impacting both the development and infrastructure teams (after all security needs to sign off on that new server for that new functionality).

The end result is an organization where IT is perceived as incapable of delivering (which is indeed the truth) while the business becomes frustrated as its ideas don’t come to fruition. The organization struggles to innovate and is eventually surpassed by its competitors. Everyone loses their jobs. It’s really sad. I liked working with Dave.

However, paradoxically, if you asked the individual IT teams (dev, infra, security) how they perceived their own work, they would probably say they were crushing it. After all, they were able to meet the goals the organization set to each one of them.

But how can we solve this problem in order to save everyone’s jobs? Well, by following the same approach we used when solving the literacy initiative and aluminum can problems earlier in this article, i.e., by combining the different metrics into one.

For that we don’t need to come up with a fancy formula like before. Instead, we “simply” need to combine all the different teams into one (actually, one per domain vertical) and evaluate the new team(s) with the same metrics of the individual ones, thus, optimizing the system (development team) for multiple dimensions (success metrics).

The benefits of structuring development teams this way is twofold: first, you minimize communication overhead and other friction points between teams, and secondly, you make sure “everyone is in the same boat” and have common, shared goals. This strategy is an example of the Inverse Conway Maneuver.

Tech Stack Standardization

Another use of optimization in software development, involves the notorious tech stack standardization. The tech stack standardization spectrum can either be too coarse or too granular.

A too coarse standardization happens, for example, when the CTO/IT manager/Architect decides all databases used in all projects should be the same no matter the use case. It leads to projects using suboptimal technologies that don’t meet their requirements leading to incidental complexity and/or inability of delivering a given set of functionalities.

On the other end of the spectrum, one can decide each and every project has the freedom to pick whichever database technology they see fit. This leads to issues like the proliferation of technologies that are not well known across the organization and thus not maintained properly (if at all).

How can we solve this problem? The solution is to find the sweet spot in the spectrum (optimize) so we don’t end up in any of the ends. At this point, we (should) have already learned that optimizing against a single dimension is usually a bad idea. Therefore, we need to identify the dimensions that make sense to standardize against.

Two possible (and common) dimensions used for tech stack standardization are use case and load. In our example of databases, we could decide that for transactional systems (use case) that are not write-heavy (load) we want to standardize on PostgreSQL and for write-heavy (load) transactional systems (use case) we want to go with Cassandra.

Similarly we could decide that for analytical systems (use case) with lots of data (load) we want to adopt Apache Impala. If our friend Dave wanted to adopt, let’s say, MongoDB, he would have to prove his use case and load combination is not covered by the ones identified above. Sorry Dave.

Project Management

On average, the smaller the scope of a project, the simpler its implementation, thus increasing its chances of success. However, if we try to optimize for a minimum scope only, we would end up with a single-feature microproject™ that doesn’t deliver much value. Not very helpful.

On the other hand, optimizing for maximum business value only doesn’t make sense either. We would end up with a multi-year project that tries to deliver every possible feature and wouldn’t be completed in a timely fashion. Does it sound familiar? So the questions is, what dimensions can we choose when optimizing a project?

As you probably have already guessed, one option is to optimize for both minimum scope and maximum business value. In other words, we want a minimum viable product, or MVP. Another dimension that’s commonly used when optimizing a project is cost. If we add cost to an MVP we end up with a MVAP™ or minimum viable affordable product which in some (most) cases might not be feasible.

You can keep adding dimensions to your project optimization matrix but be mindful that the more dimensions, the more difficult is to find a sweet spot (or local minimum). The tradeoff sliders is a tool that helps with prioritization across multiple dimensions and I encourage you to check it out.

Machine Learning

Usually, when we think about Machine Learning and optimization we think about the optimization of the cost function. The next example is not a cost function optimization problem and I’m probably stretching the concept of optimization a little bit here. My goal is to demonstrate how pervasive and useful it’s to think in terms of optimization across dimensions. If you’re a data scientist feel free to jump to the conclusion section. There’s nothing for you to see here. Move on. Go.

One common machine learning use case is anomaly detection. Imagine a financial institution wants to monitor credit card transactions for fraudulent operations. In this (overly) simplified example imagine they decide to build a ML model that analyzes the following transaction properties (dimensions): date and time, amount, and merchant.

Let’s say our friend Dave is a client of this financial institution. Dave usually shops at lunch time, spends no more than U$ 50.00 and buys everything on stuffidontneed.com. Suddenly the system sees a U$ 500.00 transaction at 1am on iwasrobbed.com. Our friend Dave immediately receives a text messaging asking him to confirm the transaction. Crisis averted.

The next day, a hacker on the other side of the globe is able to obtain Dave’s credit card info as well as his purchase behavior and makes a series of purchases under U$ 50.00, during Dave’s lunch time on stufidontneed.com. The only difference between the hacker’s transactions and Dave’s is the delivery address. However, since the fraud detection model doesn’t take the delivery address into account the transactions are processed successfully and Dave will have to spend delightful hours with his bank’s customer service in order to prove he wasn’t responsible for the fraudulent transactions. Poor Dave.

One could argue the anomaly detection model wasn’t optimized for the task at hand (or that it was optimizing for the dimensions it was aware of). By including another dimension (or feature) the model would be able to improve its optimization algorithm and yield better results.

Conclusion

Despite not being a mathematician, statistician, and the like, I was always passionate about Calculus and applied mathematics, more specifically, optimization problems. I think many of the problems in life can be treated as optimization problems, e.g., work-life balance, vacation time (too little it’s not enough, too much and you get tired), buying a house, your investment portfolio, amount of sugar in your coffee, and so on.

Software development is no different. However, in order to optimize a problem, you need to be able to identify its dimensions so you can be sure you’re optimizing for the things that matter. This skill, like any other, requires practice and the more you practice it, the sooner you’ll master it.

Thanks for reading so far. Can you think of other applications of optimization in software development? Feel free to leave your comments. All constructive feedback is welcome.

PS: In case you are really curious and too lazy to solve (i.e. google) the aluminum can optimization problem, the answer is a square-shaped can (height equals the top/bottom diameter).