Author: JP Silva

I was born in 1980 in Rio de Janeiro, Brazil. Currently, I live in Mason, Ohio, US. I'm a software delivery consultant at Thoughtworks and I'm passionate about software delivery in general. My areas of interest shift over time but currently my interest lies in Domain-Driven Design, functional and reactive programming, Clojure, high performance and distributed computing, and data architectures and governance. You can expect most of my posts to be around those topics but I reserve the right to write pretty much about anything I want to.

Diff-based testing

AI Disclaimer

I’ve used an LLM to generate the code examples in this article otherwise it would have never seen the light of the day (because that’s how lazy I am). The goal of the examples is to give the reader an idea of what the tests I’m talking about look like (since our industry has completely broken test taxonomy for the foreseeable future) and by no means meant to be accurate or production-ready.

I’ve also used an LLM to generate the article structure from my notes. However, after doing so, I’ve reviewed it carefully and made the necessary changes so it reflects my thoughts, opinion, and style. Any resemblance to any other articles published anywhere, if any, is not intentional.

It’s the first time I’m experimenting with LLM-assisted writing and my goal was to see what the experience and end result would be.

Feedback is always welcome.

Introduction

Effective and efficient testing is crucial to ensure quality, reliability, and smooth functionality in software applications as well as a healthy software delivery process. In this article, I outline best practices and methodologies for different testing types (or categories) based on my experience testing backend applications. I call this approach diff-based testing.

Diff-based testing advocates that in order to avoid repetition (efficiency) and ensure high-quality test coverage (effectiveness), each category (or layer) of tests must focus on what can be tested only through that category. I call it diff-based testing because the main idea behind it is to have each layer of tests validate only the behavior that can’t be validated in the previous layer. It’s heavily influenced by the test pyramid paradigm however it’s more opinionated. Diff-based testing also attempts to minimize the time spent running all test suites by having as much as possible of the test coverage being achieved by the fastest test types.

For example, diff-based testing states that unit tests must focus on validating the core business domain logic whereas integration testing must focus on validating only what can’t be covered by unit tests like, for example, HTTP requests and responses, database operations, and messaging, to name a few.

Following this approach will naturally lead to a test pyramid where most of the tests are going to be implemented as unit tests which are the fastest types of tests to execute since they are executed in-memory without exercising any IO operations.

Imagine we have a Calculator API. If we can validate the sum operation behaves correctly using unit tests, why should we validate the same behavior again using integration or any other type of test? That would be a waste of time and energy due to having to implement and maintain the same test logic in multiple places. That means if you’re building a library with no IO operations whatsoever, all you need is unit tests and its extensions like mutation and property testing (and maybe performance testing depending on your library use case).

But since this article is focused on testing backend applications like REST APIs, I’ll be covering the other category of tests I believe are necessary to cover all of the functionality usually implemented by this type of application.

Without further ado, here we go.

Unit Testing

Unit testing focuses on validating the correctness of individual components of business logic without requiring us to run the entire application. These tests aim to ensure that each part of the code works as expected in isolation. In order to maintain precise and efficient coverage, which are the goals of diff-based testing, we must ensure core business domain logic is validated only through unit tests, as mentioned before.

Unit tests should be limited to public methods and functions, as this approach provides clear insight into how well each component performs independently. Focusing on public methods helps ensure that tests remain aligned with how the code is used in practice, promoting better maintainability. Private methods or functions will be tested indirectly when testing the public ones.

By concentrating on the external behavior rather than internal implementation details, developers can refactor code with confidence, knowing the tests will continue to validate core functionality without needing constant updates.

To ensure clarity and maintainability, each class or module being tested should have its corresponding test class or module. Comprehensive coverage, including both typical (happy paths) and edge cases, ensures that a wide range of potential issues are captured early. Testing edge cases is crucial to identify behavior that might break under less common scenarios, thereby strengthening the reliability of the component.

Dependencies should not be tested directly within unit tests; instead, they should be mocked or stubbed to maintain the focus on the specific logic of the component being tested. Incorporating unit tests as part of the same code repository ensures cohesion and enables seamless code management.

Finally, unit tests can lead to better design as we make an effort to ensure our code is testable and modularized.

Example:

// Unit test for addition operation in a Calculator REST API

public class CalculatorServiceTest {

    @Test
    public void testAddition() {

        CalculatorService calculatorService = new CalculatorService();

        int result = calculatorService.add(3, 7);

        assertEquals(10, result, "Addition should return the correct sum.");
    }
}

Integration Testing

Integration testing ensures that the application can communicate and interact effectively with its external dependencies like databases, messaging platforms, external APIs, etc. Again, it is crucial to note that the core domain business logic should not be validated through integration tests since those should already have been validated through unit tests.

This ensures that integration tests remain focused on interactions and data flow rather than duplicating the work of unit tests. This type of testing is essential for verifying that the integration between architectural components works as intended, providing confidence that the system under test behaves as expected when integrated with its dependencies.

These tests should confirm that valid requests produce appropriate responses, while ensuring that anticipated errors, such as a database going offline or receiving invalid inputs, are handled gracefully.

Additionally, they should verify that expected error messages are returned for various issues, including invalid routes, parameters, or contracts. To simulate real-world scenarios without using actual production data, stubbing external services and databases is recommended. The test data should resemble production conditions as closely as possible to ensure realistic results.

Each functionality being tested should have its designated test class or module to keep the tests organized and maintainable. Integration tests should be able to create an application context or its equivalent. Integration tests must be fully automated to ensure they can be executed as part of the CI/CD pipeline, supporting continuous integration and delivery.

Another benefit of integration tests is to allow the validation of the integration of a system with its dependencies before it’s deployed to the infrastructure where it’s going to be executed. This allows the delivery team to obtain quick feedback on the behavior of the application and potential defects early in the delivery process.

Example:

// Integration test using Testcontainers and WireMock for the Calculator API

// This example assumes we're testing an API which depends on an

// external notification API for notifying the user of completed operations

// This API also publishes an event to Kafka for every operation performed.

@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT)
@EmbeddedKafka(partitions = 1, topics = {"operation-performed"})
public class CalculatorIntegrationTest {

    @Autowired
    private TestRestTemplate restTemplate;

    @Autowired
    private KafkaTemplate<String, String> kafkaTemplate;

    @BeforeEach
    public void setupMocks() {
        // WireMock setup for external notification service

        WireMockServer wireMockServer = new WireMockServer(8081);

        wireMockServer.start();

        wireMockServer.stubFor(post(urlEqualTo("/notify"))

            .willReturn(aResponse().withStatus(200)));
    }

    @Test
    public void testAdditionEndpoint() {
        String response = this.restTemplate.postForObject("/calculate/add", new OperationRequest(3, 7), String.class);

        assertThat(response).isNotNull();

        assertThat(response).contains("operation", "result");

        assertThat(responseEntity.getStatusCode()).isEqualTo(HttpStatus.OK);

        // Verify that an OperationPerformed event is consumed from Kafka

        Consumer<String, String> consumer = createKafkaConsumer();

        consumer.subscribe(Collections.singletonList("operation-performed"));

        ConsumerRecord<String, String> record = KafkaTestUtils.getSingleRecord(consumer, "operation-performed");

        assertThat(record.value()).contains("operation", "addition", "result", "10");

        consumer.close();
    }

    @Test
    public void testDivideByZero() {

        ResponseEntity<String> response = this.restTemplate.postForEntity("/calculate/divide", new OperationRequest(10, 0), String.class);

        assertThat(response.getStatusCode()).isEqualTo(HttpStatus.BAD_REQUEST);

        assertThat(response.getBody()).contains("Cannot divide by zero");
    }
}

End-to-End (E2E) Testing

End-to-end testing aims to validate the entire application flow and ensure that all components in the system interact seamlessly. However, traditional E2E tests can be complex, fragile, and time-consuming. E2E tests often face challenges such as high complexity, flakiness, and significant maintenance overhead.

Beyond being time-consuming, those tests are prone to failures at the slight changes in the system’s infrastructure, test data, and dependencies, making them difficult to rely on for continuous integration.

To address these limitations, contract testing can be a more efficient alternative. Contract tests offer focused validation of interactions between services without the extensive infrastructure or fragile nature of full E2E testing. For these reasons, it is often better to replace them with more focused contract tests that, together with unit and integration tests, provide the same assurances with less overhead.

I won’t be presenting an example since I consider such tests a bad practice or anti-pattern.

Contract Testing

Contract testing ensures that different components of an architecture, such as services and clients, interact correctly based on predefined agreements or “contracts.” These tests validate that both the producer (data provider) and the consumer (data user) adhere to these contracts. This includes both synchronous and asynchronous communication between components.

The consumer defines the data structure it needs, while the producer guarantees it can deliver this data format. By versioning contracts alongside the codebase and storing them in a shared repository, both sides can stay in sync. The most well-known contract test framework is Pact.

Contract tests should be executed at every stage of the CI/CD pipeline, validating published contracts in each environment e.g., Dev, QA, Pre-Prod, and Prod, ensuring that changes in one component do not unexpectedly impact another, keeping producers and consumers aligned.

None of the other test categories covered in this article can provide the guarantees of contract tests. Contract tests are essential when implementing distributed systems.

Example:

// Consumer-side contract test using Pact

@ExtendWith(PactConsumerTestExt.class)
public class CalculatorConsumerContractTest {

    @Pact(consumer = "CalculatorConsumer", provider = "CalculatorProvider")

    public Pact createPact(PactDslWithProvider builder) {

        return builder
            .given("Calculator provides addition operation")
            .uponReceiving("A request for addition")
                .path("/calculate/add")
                .method("POST")
                .body("{\"num1\": 3, \"num2\": 7}")
            .willRespondWith()
                .status(200)
                .body("{\"operation\": \"addition\", \"result\": 10}")
            .toPact();
    }

    @Test
    @ConsumerPactTest(pactMethod = "createPact")
    public void testConsumerPact() {

        RestTemplate restTemplate = new RestTemplate();

        String response = restTemplate.postForObject("http://localhost:8080/calculate/add", new OperationRequest(3, 7), String.class);

        assertThat(response).contains("operation", "addition", "result");
    }
}

// Producer-side contract test using Pact

@ExtendWith(SpringExtension.class)

@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.DEFINED_PORT)
public class CalculatorProviderContractTest {

    @BeforeEach
    void before(PactVerificationContext context) {
        context.setTarget(new HttpTestTarget("localhost", 8080));
    }

    @TestTemplate
    @ProviderPactVerification("CalculatorProvider")
    void verifyPact(PactVerificationContext context) {
        context.verifyInteraction();
    }
}

Exploratory Testing

Exploratory testing is performed to examine aspects of the application that are challenging to automate, such as user interface behavior and user experience. This testing type relies on the skills and intuition of QA professionals to identify unexpected behaviors and potential usability issues.

Conducted in a controlled QA environment, exploratory testing leverages the creativity and expertise of testers to investigate various scenarios. This approach helps uncover issues that structured test scripts might miss, ensuring a more holistic evaluation of the software.

Smoke Testing

Smoke testing serves as a quick validation method to verify that a recent deployment was successful. It is a lightweight test that checks basic application functionality without diving into deeper, more detailed testing.

This testing type focuses on ensuring that the application is accessible, responding as expected, and available at the correct routes. Typically performed after deployments in UAT and production, smoke tests provide immediate feedback on deployment success.

At this level we want to validate what can’t be validated by the integration and unit tests, i.e., that our application is capable of running in the provisioned infrastructure and can talk to the real dependencies deployed to that environment.

Example:

// Smoke test to verify basic functionality of the Calculator API after deployment

@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT)

public class CalculatorSmokeTest {

    @Autowired
    private TestRestTemplate restTemplate;

    @Test
    public void testServiceIsUp() {
        ResponseEntity<String> response = restTemplate.getForEntity("/health", String.class);
        assertThat(response.getStatusCode()).isEqualTo(HttpStatus.OK);
        assertThat(response.getBody()).contains("status", "UP");
    }
}

Synthetic Monitoring

Synthetic monitoring involves running a subset of automated tests in a live production environment to ensure the system continues to work as expected. This proactive measure helps detect issues before users encounter them.

It involves running a subset of automated tests in a live production environment to ensure the system continues to work as expected. This proactive measure helps detect issues before users encounter them.

These tests use innocuous data, such as fake client profiles, dummy accounts, or synthetic transactions, that do not interfere with real transactions or analytics. By integrating synthetic tests with monitoring tools, organizations can receive alerts if these tests detect problems, allowing for quick intervention.

Example:

// Synthetic monitoring test example to run a health check that performs a synthetic operation in production

@SpringBootTest
public class CalculatorSyntheticMonitoringTest {

    @Autowired
    private RestTemplate restTemplate;

    @Test
    public void testProductionHealthCheckWithSyntheticOperation() {
        // URL for a health endpoint that performs a synthetic operation
        String syntheticTestUrl = "https://production-url.com/health";
        ResponseEntity<String> response = restTemplate.getForEntity(syntheticTestUrl, String.class);
        assertThat(response.getStatusCode()).isEqualTo(HttpStatus.OK);
        assertThat(response.getBody()).contains("status", "up");
    }
}

// Controller code for the health endpoint with synthetic operation flag

@RestController
public class HealthController {

    @Autowired
    private CalculatorService calculatorService;

    @GetMapping("/health")
    public ResponseEntity<Map<String, Object>> performSyntheticOperation() {
        Map<String, Object> response = new HashMap<>();
        boolean isSynthetic = true;
        int result = calculatorService.add(5, 10, isSynthetic); //won't publish an event
        response.put("operation", "addition");
        response.put("result", result);
        response.put("status", "UP");
        return ResponseEntity.ok(response);
    }
}

Performance Testing

Performance testing aims to assess how the system performs under expected and peak load conditions. Shifting performance testing to the left—incorporating it early during the development phase—helps identify and resolve potential bottlenecks sooner.

Incorporating performance tests as part of the continuous delivery pipeline ensures that each new version of the software meets performance benchmarks, preventing performance degradation over time.

Performance is usually considered a non-functional requirement or, as I prefer, a cross-functional requirement. In the book Evolutionary Architecture, the authors present the concept of Fitness Functions which are a way to ensure such requirements are met throughout the lifecycle of the system’s architecture.

When implementing fitness functions, I believe it’s totally fine to consider this category of tests as Fitness Functions or cross-functional tests, where Performance Tests are just a subset of this larger category. Logging and security, to name a few, are other potential subset of tests that would belong in this category.

Example:

// Performance test using JUnit and a load testing approach for the REST API endpoint

@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT)

public class CalculatorPerformanceTest {

    @Autowired
    private TestRestTemplate restTemplate;

    @Test
    public void testAdditionEndpointPerformance() {
        long startTime = System.nanoTime();

        for (int i = 0; i < 1000; i++) {
            ResponseEntity<String> response = restTemplate.postForEntity("/calculate/add", new OperationRequest(i, i + 1), String.class);
            assertThat(response.getStatusCode()).isEqualTo(HttpStatus.OK);
        }

        long endTime = System.nanoTime();
        Duration duration = Duration.ofNanos(endTime - startTime);
        assertThat(duration.getSeconds()).isLessThan(10).withFailMessage("Performance test failed: Took too long to complete.");
    }
}

Mutation Testing

Mutation testing is a method of testing the test suite itself. By introducing small changes (mutations) to the code, this practice ensures that the existing tests can detect and fail appropriately when the code is altered.

Mutation testing helps assess the effectiveness and coverage of the test suite, revealing areas where additional tests may be necessary to improve robustness.

Those tests are usually performed by a library or framework which mutates the application code and then run an existing suite of tests with the goal of validating the test suite fails when it should.

I don’t consider mutation testing as a category of its own. I think of it as an extension to unit testing.

Example:

// Mutation testing example using PIT (Pitest) library

public class CalculatorMutationTest {

    @Test
    public void testAddition() {
        CalculatorService calculatorService = new CalculatorService();

        int result = calculatorService.add(2, 3);

        assertEquals(5, result, "Mutation test: ensure the addition logic is intact.");
    }

    // Note: The actual mutation testing is conducted using PIT by running

    // the PIT Maven plugin or configuring it in your build tool.

    // This code example represents a standard unit test that PIT will mutate

    // to check if the test fails when the code is altered.
}

// To run mutation testing with PIT, add the following to your Maven POM file:

// <plugin>

//     <groupId>org.pitest</groupId>

//     <artifactId>pitest-maven</artifactId>

//     <version>1.6.8</version>

//     <configuration>

//         <targetClasses>your.package.name.*</targetClasses>

//         <targetTests>your.package.name.*Test</targetTests>

//     </configuration>

// </plugin>

Property Testing

Property testing focuses on verifying that the system holds true to specified properties over a range of inputs. This type of testing is designed to explore edge cases and input variations that a developer may not have initially considered.

In property testing, instead of specifying exact input and output pairs, the properties or invariants that the function should uphold are defined. The test framework then generates random input data and checks that the properties are always met. This method ensures that the software can handle a broader range of conditions and helps reveal hidden bugs that traditional examples-based testing might miss.

Property testing complements unit and integration tests by pushing beyond predetermined cases and validating the system’s behavior in unexpected scenarios. Integrating property testing into the existing testing framework can be done by selecting tools that support property-based testing, such as QuickCheck or Hypothesis, and incorporating them into the test suite.

Developers should start by identifying key properties that functions or modules should satisfy and implement these as tests. This approach helps ensure that, across a variety of inputs, the software consistently meets the defined invariants, bolstering the overall reliability of the codebase.

By incorporating property testing, developers can gain greater confidence in the robustness of their code and discover vulnerabilities early in the development cycle.

Similar to mutation testing, I don’t consider property testing as a category of its own. I also think of it as an extension to unit testing.

Example:

// Property testing example using a property-based testing library

public class CalculatorPropertyTest {
    @Property
    public void testAdditionProperties(@ForAll int a, @ForAll int b) {
        CalculatorService calculatorService = new CalculatorService();
        int result = calculatorService.add(a, b);

        assertThat(result).isEqualTo(a + b);

        assertThat(result).isGreaterThanOrEqualTo(a).isGreaterThanOrEqualTo(b);
    }
}

Testing Multi-Threaded and Asynchronous Code

I don’t consider multi-threaded and asynchronous tests as a separate category of testing, but since I’ve seen many teams struggle with it, I believe it deserves its own section.

Testing multi-threaded and asynchronous code presents unique challenges due to issues like non-determinism, where the order of execution can vary between runs. This variability can make tests flaky and difficult to trust.

To mitigate these challenges, it is essential to design tests that focus on the individual behavior performed by each thread or asynchronous task. A rule of thumb I use is to ensure the scope of a given test scenario ends at the boundary of a thread or async call. A way to detect if there’s something wrong when testing multi-threaded or async behavior is if there’s a need to add a wait or sleep call in order for the test to be successful.

Non-determinism can also be avoided by using synchronization mechanisms or testing frameworks that simulate controlled environments, ensuring that the tests remain predictable. Additionally, tests should isolate and validate smaller, independent units of work to avoid race conditions.

By adopting these practices, developers can build confidence that tests that validate multi-threaded and asynchronous code won’t result in flaky and untrustworthy tests.

Example:

@Testcontainers
public class KafkaPublisherIntegrationTest {

    @Container
    private static KafkaContainer kafkaContainer = new KafkaContainer("confluentinc/cp-kafka:latest");

    private static KafkaProducer<String, String> producer;
    private static KafkaConsumer<String, String> consumer;

    @BeforeAll
    public static void setUp() {
        kafkaContainer.start();

        // Producer properties
        Properties producerProps = new Properties();
        ...
        producer = new KafkaProducer<>(producerProps);

        // Consumer properties
        Properties consumerProps = new Properties();
        ...
        consumer = new KafkaConsumer<>(consumerProps);
        consumer.subscribe(Collections.singletonList("test-topic"));
    }

    @AfterAll
    public static void tearDown() {
        producer.close();
        consumer.close();
        kafkaContainer.stop();
    }

    @Test
    public void testEventPublication() throws ExecutionException, InterruptedException {
        String topic = "test-topic";
        String key = "test-key";
        String value = "test-value";

        // Publish the event to Kafka
        Future<RecordMetadata> future = producer.send(new ProducerRecord<>(topic, key, value));
        RecordMetadata metadata = future.get();

        assertNotNull(metadata);
        assertEquals(topic, metadata.topic());
    }

    @Test
    public void testEventConsumption() {
        String topic = "test-topic";
        String key = "test-key";
        String value = "test-value";

        // Publish an event to set up the test
        producer.send(new ProducerRecord<>(topic, key, value));
        producer.flush(); // Ensure the event is sent before consuming

        // Poll the consumer to validate the message was published
        ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(5));
        assertEquals(1, records.count());
        
        ConsumerRecord<String, String> record = records.iterator().next();
        assertEquals(key, record.key());
        assertEquals(value, record.value());
    }
}

General Best Practices for Automated Testing

To maintain the reliability of automated testing, flaky tests should be fixed, quarantined, or removed immediately. This practice prevents inconsistent failures that can erode trust in the test suite and the pipeline. Tests that fail inconsistently compromise trust in the test suite and the CI/CD pipeline. Failing tests should stop the pipeline until they are resolved, ensuring that issues are not overlooked.

Running a subset of tests locally before committing code helps developers identify potential issues early and prevents surprises during CI/CD runs. Lastly, tests should never be commented out, ignored, or removed to pass a failing pipeline, as this quick-fix approach undermines the integrity of the testing process and can mask underlying issues.

By adhering to these best practices, development teams can create robust, maintainable, and high-quality software products while minimizing risks and ensuring a seamless user experience.

Conclusion

Diff-based testing is an approach to testing that is heavily based on the test pyramid paradigm but goes one step further and states that:

We should always test a functionality using the fastest type of test possible, e.g., unit tests over integration tests, integration tests over smoke tests, and so on.
We shouldn’t duplicate test logic in different layers of tests. Each layer should add coverage to the behavior that couldn’t be tested in the previous layer.

By doing so, we ensure we end up with a healthy suite of tests that’s both effective, efficient, easy to execute, and maintain.

The eight deadly sins of (failed) software delivery initiatives

Featured

In my 20 plus years developing and delivering software, I was part of many software delivery initiatives that, despite all efforts, ended up in failure. Each failure was due to many different reasons but some of those reasons were present in most of those failed initiatives.

My goal with this article is to list a few of the problems that, in my opinion and experience, will doom any software delivery initiative to failure. I’ll also offer solutions to those problems so the next time you are faced with such issues you’ll (hopefully) be able to identify and avoid them.

I must say that by no means this is meant to be an exhaustive list of reasons that can lead a project to failure. I’ve simply picked the eight most recurring and impactful scenarios that, in my experience, will throw any software delivery initiative off the rails.

Without further ado, here’s the list of problems (in no specific order) I like to call “The eight deadly sins of (failed) software delivery initiatives”.

Sin #1 – Don’t know what to build and/or how it should work

Have you ever been part of a software delivery initiative where things were a bit fuzzy? Everyone involved seems to have a high-level idea of what needs to be done but when details are needed and people start asking each other questions it quickly becomes clear that there’s a lot of unknowns that still need clarification. Those unknowns can be both about the what and the how.

Not knowing what the software you’re spending millions to build is supposed to do may seem foolish but it happens more often than you might think. Ideally, the need to build a software should come from the users of the software or from people that know the users really, really well.

Unfortunately, that’s not always the case. It’s not uncommon to have people higher up on the organization chart, both on the business and IT sides, deciding what initiatives get funded and what they’re supposed to achieve. The problem in this case is that the higher someone’s position in the organization is, the more likely it’ll be they’re detached from the reality of the business and the people that actually run it.

Add to that the fact that the larger the organization is, the more communication issues it’s likely to have. This can lead to people making decisions without having a complete understanding of the problems that (they think) must be solved.

Finally, the people that the software is being built for, be it employees or customers, are rarely involved in the software delivery process. In fact, I’ve seen organizations that intentionally isolate end users from delivery teams, only God knows why. The result is software delivery teams building solutions to problems that don’t exist or they don’t fully understand. This problem is also known as “building the wrong thing”.

Now let’s say that somehow the delivery teams got the what right and they’re on track to build the right thing (or solve the right problems). Unfortunately, it’s still possible they’ll build the thing wrong. In this case what usually happens is that there’s a lack os subject matter experts to tell the teams how things should work. This is very common in complex domains and/or old organizations where the knowledge of the how is embedded in the code of legacy systems built decades ago and whose developers are now long gone (hopefully retired and not dead).

The new developers that are now maintaining those systems usually only know small parts of it that they were forced to learn by painstakingly reverse engineering the code over many years of sweat, blood, and tears. If only they knew how valuable they are they’d be making more money than the organization’s CEO.

So, if the delivery team gets the what wrong, even if they get the how right they’re doomed to failure by building the wrong thing. On the other hand, even if they get the what right but the how wrong they’re doomed to failure by building the thing wrong. It’s also possible the eventually get the how right but only after a long and arduous process of reverse engineering legacy code as well as trial and error where they release a solution that costumers will complain doesn’t work as expected. Of course, it’s also possible that the perfect storm can hit and the delivery teams end up getting both the what and the how wrong, leading to a spectacular failure.

Solution

So, what do then? How to avoid such a catastrophe?

In order to get the what and the how right, a delivery team must collaborate with all the areas of the organization impacted by the software to be built so they can ask questions and understand what problems they’re trying to solve and how it can be achieved. This will usually require a workshop where representatives from the business (including SMEs, product owners, and end users) and from IT (including infrastructure, operations, security and any other impacted systems) must be present. Those workshops can take from a couple of days to a couple of weeks, depending on the complexity of the business, system architecture and the problems being solved. The organizations that already run those workshops usually only do so in the beginning of the initiative which is a big mistake. Those workshops must happen multiple times during the entire life of the initiative in order to validate assumptions and course correct if necessary.

It’s also important that the end users (or their representatives in the case of users external to the organization) are also part of the delivery process and have access to (frequent) production releases of the solution being built in order to give feedback to the delivery team on the usefulness of what’s being built.

Finally, communication across the delivery teams and the organization silos must be streamlined as much as possible. The best way of achieving this is by having no silos at all. Ideally the delivery teams should contain not only developers, QAs, BAs and PMs but also infrastructure, security, operations, architects, end users, product owners and whatever other roles are required in order to deliver the right thing right.

Sin #2 – Trying to do too much at once

Human beings are amazing but they lack the ability to grasp lots of things in their heads at once. Someone once said that whatever problem you’re working on, “it must fit in your head”. This also apply to software initiatives.

Big organizations seem to be a big fan of huge, long-lived software delivery initiatives or the infamous “programs”. They usually spend months analyzing all existing problems and trying to fund a software delivery program to rule them all. Such programs are meant to solve all the organization’s problems both from a business and technical perspectives. They can take from 2 to sometimes 10 years with the average length being around 5 years (in my experience).

However, despite all the planning, most of the time those programs fail. I’d love to have some statistics to share with you here but unfortunately I don’t.

Failure has different modes. A software delivery initiative can fail because it build the wrong thing right, the right thing wrong or because it ran over budget and/or over time. Or, in some (not rare) cases, all of the above combined.

Startups might not always be successful but they usually succeed in delivering software. Whether the software is successful or not will depend on a viable business model, right timing, among other things. The startups that are successful in delivering software usually have something in common: focus. They’re trying to solve a very specific problem and need to prove they are capable of delivering working software as soon as possible in order to guarantee funding.

They’re also very flat organizations hierarchically speaking with very streamlined communication channels simply because everyone sits next to each other no matter their role in the organization. Have a question? The person that has the answer is a few feet away.

However, the most important thing is startups are constrained. Time constrained. Money constrained. They know they can’t come up with the perfect product with all the possible whistles and bells in a timely fashion. They need to iterate. They need to focus on the minimum viable product that’s good enough to showcase their ideas to the world. Then they can add the next feature. And the next. And so on. That’s exactly the opposite of what big organizations do.

Solution

The big enterprise need to break away from their current approach to software delivery including how they plan, fund and and deliver software. They need to act like startups and think iteratively, focusing on specific problems that can be solved in a timely fashion. No more big software delivery programs. No more boiling the ocean. Whatever you’re trying to solve needs to fit in the heads of the people solving the problem.

Billions have been spent and different methodologies have tried (and failed) to make software delivery predictable. The uncomfortable truth people don’t want to face is that software will take whatever time it takes for that software to be built given the organization’s current structure (both organizationally and technically speaking) as well as the resources available to the delivery teams. There’s no magic here. No amount of pressure will magically make a software to be built in six months when it actually would take it one year to be built (given the same constraints). Something needs to give and it’s usually quality and people’s morale. The longer a software initiative is, the more likely it’s to get off track.

Organizations must stop pretending estimates are not commitments and burning people to the ground trying to deliver the impossible. Instead, focus on micro-initiatives. Whatever problems you’re trying to solve, break it in such a way that each piece can be delivered in three months or less. Fail or succeed fast. Collect feedback then iterate, course correcting as needed. If resources allow it, nothing prevents you from running multiple micro-projects in parallel as long as they are not intrinsically dependent on each other (otherwise it’s just a huge software program in disguise).

Sin #3 – Organizational silos

The truth is, silos create bottlenecks and friction in communication. For a long time organizations tried to approach software delivery by applying a factory (or production line) mindset where each step of the process is performed by a specialized individual then handed over to the next step until you finally get the final product at the end of the line.

That just doesn’t work, period. Software cannot be built through a repeatable process where parts are assembled into a final working product. We as an industry tried it and failed. With the advent of Agile and Lean practices much has been improved but we still see self-denominated Agile organizations following a similar model where business analysts or similar write requirements that are then analyzed by architects that then come up with the technical design so developers can code so QAs (or QEs or testers) can test so operations can deploy, each one with their own skills and responsibilities. If something goes wrong? It’s not my problem, I did my part, it’s someone else’s fault.

Let’s imagine for a second that such approach works. There’s still an impedance mismatch problem here. Developers might be able to code faster than BAs can write requirements and architects can come up with the technical design. Then QAs might not be able to keep up with the amount of stories developer are developing. Releasing to production requires a lot of work and coordination so operation folks bundle as many changes as possible into a single release. Releasing every two weeks? Are you crazy?

A user then finds a bug in production. They must call through all support levels until someone can create a bug in the backlog that’s going to be prioritized by the business so developer can work on it. What if developers need a new piece of infrastructure provisioned or modified? They submit a ticket to the infrastructure team that’s already overwhelmed with tickets from many other teams not to mention any work related to improvements to the infrastructure which ends up always deprioritized.

I haven’t even mentioned all the meetings and long email threads where folks across silos try without success to communicate their problems and align on possible solutions. An example of that happens when developers and/or BAs have questions on what needs to be done and/or how to do it but can only schedule a meeting with the business and/or architects for the next week in order to have their questions answered. If someone is on vacation or really busy then the meeting gets pushed for the next month. Meanwhile, time flies.

[insert your preferred version of the “it’s fine” meme here].

Solution

Organizations must embrace cross-functional, autonomous and self-sufficient software delivery teams. Even if a resource is scarce or not needed 100% of the time, they must have a percentage of their time assigned to the delivery team. They must be part of the team. That means participating in team ceremonies, discussing problems and solutions and, above all, feel accountable (and be rewarded) for the success of the team as a whole.

Business people, SMEs, and end users should be available and easily reachable to answer any questions the delivery team might have. The same is true for infrastructure and QA resources needed by the delivery teams as we’ll see in a moment.

Sin #4 – No one can make a decision

If I never make a decision then I can’t be blamed for anything bad that might happen, right?

That’s the culture in most of the organizations that struggle to deliver software. People live in fear. Fear of being fired, fear of not being promoted, fear of missing their bonuses at the end of the year, i.e., fear of failure. In the Westrum’s model of organization culture, those organizations would be classified either as pathological or bureaucratic, or somewhere in between.

One of the many bad outcomes of living in fear is that it can paralyze you. It can prevent you from making any decisions or mislead you on making bad ones. When it comes to software delivery initiatives it usually leads to analysis paralysis. People get stuck in a never-ending series of meetings waiting for a decision to be made. When it doesn’t happen, the responsibility bubbles up the organization’s hierarchy until it finds someone empowered enough to make the call. There are two obvious problems with this approach.

One is that it can take a long time until a decision is made and this can happen for every decision that must be made during the life of the software delivery initiative, which can be a lot. The second problem is the telephone game issue. Usually, the higher in the hierarchy, the least context someone has about the problem being solved which increases the chances of them making a bad decision. Add to that the noise introduced due to the information having to pass through many layers of hierarchy which can lead to people making decisions based on wrong or incomplete information. Assuming they’re able to make a decision at all.

Like many of the other problems mentioned here, it can lead to the wrong thing being built or the right thing being built wrong or, in the worst case, nothing worth being built at all.

Solution

Once again, cross-functional, autonomous and self-sufficient teams are our best chance at solving this problem. Also, the organizational culture must be one that empowers people to make decisions at the level they need to be made without fear of punishment in case something goes wrong. Delivery teams must have easy access to the information they need, at the time it’s needed, so they can make well-informed decisions. Important decisions must be documented and easy accessible, so in the future people can have context on why certain decisions were made.

Sin #5 – Ineffective test strategy

When an organization performs tests too late in the delivery process it increases developer cognitive load, since by the time defects are found, prioritized, and ready for developers to work on, developers will have long moved to another story and have lost the necessary context to quickly figure out what the cause of a given defect is. The cognitive load can get even worse if a developer gets assigned to work on a defect in a section of the code they’re not familiar with.

Reliance on manual tests which are usually error-prone, slow to perform, and dependent on a individual’s knowledge in order to be executed correctly, will result in tests that will take a long time to execute, will very likely miss important edge cases which in turn will lead to a big number of errors in production. Manual tests are also hard to scale and can lead to an starvation of resources and a scenario where organizations find themselves in the need of more and more QAs and can never keep up with the increasing backlog of stories in need of testing.

Expensive and brittle end-to-end tests that require lots of setup and coordination and take too long to be executed are also a source of delays and potential defects in production. Since those tests touch multiple systems usually maintained by different teams, the ownership of those tests and the environments where they are executed is hard to establish leading to complex coordination across those teams. Maintaining the right state (data) across all systems is a very cumbersome and frustrating process leading to the common “who touched my data?” blaming game between teams. The necessity of having the right data and the right versions of different systems in order to execute end-to-end tests make them brittle, usually leading to a lot of false negatives. This, combined with the long time those tests take to be executed usually leads to teams abandoning them since they’re slow and not reliable.

QAs (or QEs or testers) who are not involved in the other phases of the software delivery process can lead to a lack of context on what the solution being built is supposed to do and how it should behave, This usually leads to costly knowledge transfer between BAs, developers and QAs, delaying tests and even leading to QAs testing the wrong thing (or testing things wrong) which in turn can lead to false negatives and waste of developer time in case they have to investigate defects that are invalid.

The issues above are among the main mistakes I’ve seen delivery teams make when it comes to test strategy.

Solution

Testing should be moved earlier in the development process (shift-left), fully leverage test automation and follow the test pyramid approach. Manual testing should be left to (a few) ad-hoc scenarios and sanity checks. Costly end-to-end tests should be replaced by unit, integration and contract tests (all automated) which are more efficient and reliable. QAs should participate in story writing as well as any other team ceremonies. A story can only be considered done when it’s fully tested and ready to be deployed to production. Developers (and QAs) should feel comfortable with writing (automated) tests of all kinds (including performance and UI tests) as well as pairing with each other, even during the coding phase of a story.

Sin #6 – Cumbersome path to production

Many organizations are required to follow strict regulations and/or operate in risky industries where failures can be catastrophic which leads them to introduce lots of check and balances (aka red tape) before releasing any software changes to production.

The problem is that it’s also common for those governance process to become unnecessarily complex and not really help to prevent the issues they’re supposed to. Human beings hate doing things that are repetitive, take too long and/or they cannot perceive the value of.

This leads to people doing things in an automated manner (the bad kind of automation) without really thinking about what they’re doing, rushing things so they can get done with the boring stuff quickly and/or finding workarounds to a process they don’t agree with or cannot understand the value of. In the case people are somehow forced to stick to the process what usually happens is that the time to release a chunk of work to production can be delayed by weeks, if not months, until all check and balances are met.

A natural response to doing something that’s hard and/or complex is to do it less frequently. The less frequent software is released to production, the more changes get accumulated and the longer it takes for the delivery team to get feedback from the users on the work that has been done. This can lead to a big number of defects and/or negative user feedback to a given release which in turn can overwhelm delivery teams throwing their planning for the next release off.

Most of the time those process are not there to avoid errors getting to production but to cover people’s backs in case something goes wrong. After all, if the process has been followed, no one is to blame, right?

Solution

Reevaluating why a given step in the path to production process is required and how this requirement can be effectively and efficiently met needs to be a constant in the life of software delivery teams. We need to avoid the monkeys and the ladder behavior and avoid doing things just because it has always been the way things have always been done.

Another crucial solutions to this problem is to automate everything. No more filling soul-eating-life-crushing forms by copying-and-pasting data all over the place and trying to answer questions that won’t get reviewed by any intelligent life form in the next century or so. If some piece of data doesn’t add any value to the release process but it’s required due to some regulation, find a way to automate the collection of this data. If there’s no regulation requiring such data, don’t even bother collecting it.

For all valuable data that’s used by people during sign offs, automate how it’s collected. All sign offs must also be automated and registered for future auditing.

On top of that, the whole path to production should be automated and practices like blue/green deployment and smoke tests in production can help avoid any defects to reach end users.

Releasing software to production should be considered a non-event and people shouldn’t be kept awake and/or working over the weekends (and nights) to guarantee a successful release.

Sin #7 – Obscure integration points

Whenever initiating a new software delivery initiative, one of my first things I try to learn is how many existing integration points this new solution will have to be integrated with as well as how well-documented those existing integration points are.

Integrations are risky and most of the time, complex. Things get even worse if we’re integrating with existing (and possibly legacy) systems. What data must I send for this specific API call? Where can I get this data from? What does the data mean? In which order must I call the different endpoints? Is there even an API to integrate with or must we implement some sort of CDC to extract the data we need? Should we go with SOAP, REST, gRPC, CORBA or file transfer over SFTP? Is there any documentation available on how those existing systems behave or is it all in the heads of now long gone SMEs, architects and developers?

I’ve seen developers (almost?) going crazy trying to reverse engineer decades-old codebases to try and answer the questions above then scheduling meetings with that one person who knows everything to only be pointed to another person who’s supposed to know everything to finally realize no one really knows anything about anything.

This is not only frustrating and soul-crushing but is also (in my experience) one of the leading causes of developer attrition and can risk throwing a software delivery initiative off the rails completely by delaying delivery by months if not years.

Solution

I’m going to be honest with you, this is a hard one. The right thing to do can be really costly and take a long time to implement and not all organizations have the appetite to take on such challenges. It’s not uncommon for c-level executives to kick the can down the road and let the mess to their successor to deal with.

Sometimes the solution includes hiring people to exclusively work on reverse engineering the codebases and document how things work. Sometimes the best solution is to throw everything away and start anew. This might require hiring SMEs that know how things should work which can be difficult and expensive. That’s one of the reasons we still see huge (and rich) organizations relying on decades-old mainframe systems they’d rather get rid off.

In some cases it’s worth to apply a domain-driven approach and create clean business interfaces in front of those existing systems so at least the new software being built has an easy time integrating with other components. This approach also helps decoupling the new components being built from the legacy ones making it easier (or possible) to replace the old systems at some point.

However, this latter approach doesn’t solve the fact that those new well-defined interfaces still must be integrated with the existing (potentially old, legacy) systems. Again, there’s no easy way here, I’m sorry.

If your organization find itself in such a position it might be the case to hire a consultancy specialized in solving this kind of issues to help you find the best way out of this mess. (My apologies for the shameless plug but one got to live).

Sin #8 – Lack of infrastructure ownership

This one could be considered a subsection of organizational silos since it has to do with the software delivery team not being empowered to provision new (or make changes to) the infrastructure which can happen when there’s no infrastructure and/or operation resources readily available to the team. This can also include lack of access to debugging capabilities like some types of logs, monitoring and observability tools.

For example, a scenario that’s very common is to have the infrastructure team providing access to application logs via a log aggregation tool. This might sound enough but if the application is running in containers and for some reason the container fails to start up, the application won’t have the chance to have any logs collected since it hasn’t even started.

When something like that happens, infrastructure teams become a bottleneck to delivery and, similarly to the QA bottleneck, it slows down delivery and can be difficult to scale since the more delivery teams you have, the more infrastructure folks will be needed.

A most recent gate keeping strategy I’ve seen some organizations implement is to hide their cloud capabilities behind a custom portal so they can have better control over their cloud resources. In my opinion, besides being expensive and potentially delaying delivery (after all, it can take a long time for those portals to be implemented) this completely defeats the purpose and one of the main advantages of the cloud which is to enable teams to self-serve and experiment with new technologies quickly.

Solution

The delivery teams must be able to provision and modify their own infrastructure. A well designed and implemented platform is of great help here. The focus of the infrastructure team should be on improving developer experience through automation and a self-service platform.

Every developer should be empowered (with knowledge and access) to perform any infrastructure and/or operations tasks within the scope of the software they’re delivering.

Infrastructure and operations teams should focus on building and maintaining the aforementioned self service platforms that automate all tasks that are usually performed manually by the infra and ops teams. By doing so, they will be empowering developers to practice DevOps (yes, DevOps is not a role but a way of working) and become autonomous, self sufficient, and, above all, productive.

When it comes to cloud governance, one can always uses their cloud provider tools to restrict and/or control access to any resources that need this kind of control, be it because they’re not approved for use within the organization or just to keep costs under control. Having said that, delivery teams should be trusted (and made accountable) with deciding which technologies and tools work best for them otherwise one would be curbing innovation.

Conclusion

There are many reasons why software delivery initiative might fail. The ones I’ve mentioned in this article are the ones that hit closest to home based on my professional experience. The solutions I present to each one of those problems is also very opinionated and based on what I’ve seen work best in the initiatives I saw succeed.

It’s not a coincidence that those solutions come from the Agile and Lean communities since for the past 9 years I’ve worked as consultant at Thoughtworks where I not just learned the why and how of those practices but also had the opportunity to apply them and see the real impact they have on the success of software delivery initiatives.

I hope this article has resonated with you. I’ll consider myself successful if at any point while reading it you caught yourself thinking “yeah, that already happened (or is happening) to me and my team and it sucked (or still sucks)!”.

If that’s the case, please share in the comments what resonates (or not) with you and share this article with others, especially people in your organization you think is empowered to make the required transformations happen so our industry can become more productive and a less soul-eating-life-crushing one.

Acknowledgements

I’d like to thank my Thoughtworks colleagues Brandon Byars and Premanand Chandrasekaran for reviewing the draft of this article. I appreciate all their comments and suggestions, even the ones that were not incorporated in the final article.

Silos are dead. Long live silos.

If you’re a software developer working for a big enterprise, chances are high that whenever you need to provision a database, or a build pipeline, or an SSL certificate, or pretty much any piece of infrastructure, you have to submit a ticket to another specialized team so they can do it for you. Then you wait. And wait. In the meantime, the story you were supposed to be working on is blocked until the required piece of infrastructure is provisioned for you. In such a scenario, the infrastructure teams are a bottleneck to the delivery teams’ software delivery life cycle (SDLC).

If you’re a software developer working for a big enterprise, chances are high that whenever you are done implementing a (big) chunk of functionality and are ready to deploy it to production, you have to wait for a separate team of QAs (testers) to check your code for bugs. Then you wait. And wait. After a few days (or weeks), you get a report with all the bugs that need fixing. Then you fix them. Then you wait again for another round of tests to be completed. This cycle repeats over and over until the number and/or types of bugs meet a desired threshold. In such a scenario, the QA teams are a bottleneck to the delivery teams’ SDLC.

If you’re a software developer working for a big enterprise, chances are high that whenever you want to deploy your application(s) to production, you have to fill a bunch of forms and/or spreadsheets with details about your application, have meetings with governance and operations teams to explain what your application does and/or which changes it introduces. Finally, you submit a ticket to the operations team so they can deploy your application for you. This cycle repeats over and over until operations and governance teams are satisfied with the information you’ve provided. In such a scenario, the operations and governance teams are a bottleneck to the delivery teams’ SDLC.

Then there’s performance tests, security assessments, architecture, user requirements, all performed by separate groups in the organization, each one very specialized on their specific tasks, each one becoming a bottleneck to the organization’s SDLC.

All those siloed functions add up to the delivery teams’ lead time and can in fact comprise most of it. A two-week worth of software development can take weeks if not months to see the light of day. Delivery teams will learn to optimize for that and try to pack as much functionality as possible into a single release. Then, instead of shipping small chunks of functionality and getting quick feedback from users, the organization ends up resorting to big, infrequent, and risky deployments, delaying the feedback cycle and killing innovation.

One solution to this problem is to break all silos and have those specialized folks as members of the delivery teams. One issue with this approach is that it’s hard to scale. Those specialized roles are hard to find and therefore, expensive. If your organization has multiple delivery teams (which is often the case in big enterprises) it will be really difficult to staff those specialized roles for all teams.

Another (and better) approach is to optimize the way those specialized teams work. Work smarter, not harder they say. We don’t want to get rid of those silos, we just want better silos. In order to achieve that organization must focus on shift left, automation and platform thinking.

Take the example of the dependency on the operations teams for provisioning new infrastructure. In order to reduce or even eliminate this dependency, a couple of things need to happen. First, the focus of the operation teams need to shift away from fulfilling tickets to providing a platform where delivery teams can self-serve. That’s how an organization can fully utilize the knowledge and capacity of its operation folks. Automation plays a central role on building such platforms. An organization should try and adopt tools and practices that support this kind of automation. If Infrastructure as Code (IaC) came to your mind you’re on the right track. I won’t get into the details of IaC in this article but suffice to say it comes with its own challenges. However, organizations that successfully adopt it will reap its benefits like increased productivity, security, and stability, among others. I recommend you to check Kief Morris’s book if you want to learn more on this topic.

In the examples of the dependencies on the QA team for testing and the operations team for deploying an application to production, automation and shift-left are your best friends. In both cases you’ll face the issue of having a single team responsible for serving multiple delivery teams likely creating a scarcity issue which will translate to waiting queues. Here’s how we can address those two scenarios.

When it comes to delivering software, quality is not something you can leave for last. It needs to be embedded into your SDLC from the beginning (shift-left). In order to achieve that, developers need to feel comfortable writing all kinds of tests (unit, integration, functional, performance, and so on). It’s also crucial to embed the QAs into the delivery teams. No more long develop -> test -> fix loops. To be considered done, each story/functionality must be fully tested. It’s also important those tests are fully automated so they can be executed by developers on their local machines as well as part of a CI/CD pipeline.

When it’s time to go to production, throwing an application over the wall to the operations team introduces a lot of friction and risks. The operation teams will have to learn how the application works, how to monitor and troubleshoot it. Even if the required knowledge transfer is done successfully (which is rarely the case) a huge amount of time and energy is required which could have been better spent somewhere else. Organizations must shift from a project mindset to a product one. Product teams are (among other things) long-lived teams responsible for the software they build. In such a world, operations teams are responsible for providing the means for the delivery teams to safely deploy, monitor and troubleshoot their applications. Again, platforms and automation are key to successfully implement this mindset change.

Similarly, governance, security and architecture should also work closely with delivery teams. The most effective way to achieve that is to focus on establishing guidelines and metrics as opposed to prescribing what delivery teams should do. Their mantra should be “Tell delivery teams what you need as opposed to how to do it”. Governance and security requirements can be verified through automated reports executed by the CI/CD pipeline. The same is true for architecture if you adopt fitness functions as a way to measure architectural alignment for your applications.

There’s one area I haven’t mentioned yet that also benefits from working closely with the delivery teams. In some organizations they’re called product or simply business. Those are the people that (supposedly) know what customers want. They define and prioritize which features to deliver next. A gap in communication between business and delivery teams is cause for many failed projects (and products). If the delivery team doesn’t understand what they’re a building and why, it can lead to bad technical decisions. Similarly, if the product team doesn’t understand the technical constraints or needs (e.g.: tech debt) of the delivery team, it can lead to bad product decisions. It’s crucial to have a product person embedded in the delivery team to make sure such communication gaps don’t happen and the delivery team understands the purpose and goals of the software they’re building.

In all those scenarios the pattern is the same: Relieve very specialized and scarce people from ordinary work that can be automated and shift the responsibility for this work to delivery teams. Embed crucial roles into the delivery team to eliminate communication gap and enhance it with new capabilities (e.g.: testing). By doing this, you reduce or even eliminate the dependency of the delivery teams on those silos which in turn eliminates bottlenecks and reduces lead time. Your organization delivers better software faster and your very specialized people can focus on what really matters for them and the organization (better governance, better security, better architecture, better product, better infrastructure and better quality).

So, what do you think? Do you agree or disagree with the importance of breaking silos in order to improve the SDLS in big enterprises? Let me know your thoughts in the comments. Until next time.

Unit Testing: Focus on the What, not the How

TL;DR;

When writing unit tests, test the state of your program, not the implementation. That means, most of the time, avoiding mocks at all costs. Sometimes it can be hard to achieve that but there are a few strategies we can use to avoid the use of mocks.

Unit Tests Refresher

Much has been written on this topic already, but since I continue to see developers making this mistake, I felt like a refresher on why this is a bad practice could be helpful to many.

Let’s start with why this is a problem in the first place.

The goal of unit tests is to make sure that a given component of a program (a unit) behaves as expected. In order to test this component, and only this component, we must isolate it from the rest of our program so we can be sure its behavior (and state) is not influenced by other components it integrates with.

In order to perform our tests, we must treat a component as an opaque box and only interact with it via its interface. It’s pretty much like test-driving a car, where we only need to use the pedals and steering wheel and see if it moves accordingly, but we don’t need to check the engine or other mechanic parts. For the purposes of our test, if the car moves (and brakes), it works.

Going back to computer programs, imagine we want to test a function that sums an array of integers and returns the sum as the result:

public Integer sum(numbers: Array[Integer]) {
//implementation goes here
}

My question to you is, can we write a unit test for this function without knowing how it’s implemented? The answer is, of course we can:

public void shouldReturn10() {
  //given
  var input = new Integer[] {1, 2, 3, 4};

  //when
  var result = sum(input);

  //then
  assertEquals(result, 10);
}

There are many ways we could implement the function sum but from the unit tests perspective it doesn’t matter how we do it as long as we return the correct result. Doing so allows us to change the implementation without impacting our tests.

Imagine for example we decide to delegate the actual computation of the sum to an external web API:

public Integer sum(numbers: Array[Integer]) {
  //web client setup goes here
  return webClient.getRequest(numbers);
}

Ignore for a moment how useless the sum function became. If you want, you can pretend it performs a lot of complicated business logic instead of just returning the result of the api call right away. What I’d like you to focus instead is the fact that it now has an external dependency that will make our test more complicated.

So, how can we test the sum function now? If we use a valid url and the web api is up and running, our test might pass. Otherwise it would fail since it wouldn’t be able to get a response from the api.

We might be tempted to use a valid url and run our test against the “real thing” but then we have a problem:

OUR TEST IS NO LONGER A UNIT TEST

It became an integration test instead. There’s a place and moment for integration tests but since we’re talking about unit tests here, what can we do to keep our test “unitary” without breaking it?

Mocks

If you’re a seasoned developer, you’re probably familiar with the concept of mocks. If not, you can check Martin Fowler’s article for a detailed discussion between mocks and stubs. Before moving forward though, I wanted to clarify what I mean by mock.

To me, mocks are fake implementations of external dependencies that are validated against during tests, with the “validated against” part being important here.

Accordingly to the definition I use, if we have a fake implementation of an external dependency that we don’t validate against, then we’re talking about stubs. Stubs only exist to “fill in the blanks”, or, in other words, to allow our tests to run successfully by learning how to respond to our method calls.

Mocks instead, can have their behavior verified after an interaction with the component under test.

With that knowledge in hand, let’s rewrite our test to make use of mocks:

public void shouldReturn10() {
  //given
  var input = new Integer[] {1, 2, 3, 4};
  
  //mock setup
  when(webClient.getRequest(input)).return(10);

  //when
  var result = sum(input);

  //then
  assertEquals(result, 10);
}

Ok, we have a passing test again, but at what cost?

What if we want to change the request type from a GET to a POST? What if we want to use the asynchronous version of the getRequest method? What if we want to replace the webClient class entirely?

Now our test knows too much and you know what happens to folks who know too much, right? They end up at the bottom of a river or seven feet under the ground.

You might think this is not a big deal because it’s just one test and the change required wouldn’t be that complicated. However, in a real-life project it’s very likely that we would end up with dozens of tests for a single function and hundreds of functions requiring multiple mocks each, where replacing a dependency like our webClient could prove to be a challenge.

Side Effects

Before we discuss solutions to our mocking problem, let’s analyze another common scenario. Imagine now we need to test a function that contains some business logic but doesn’t return a result, that is, it only performs side effects:

public void alert(errorMessage: String) {
  String formattedMessage = "ERROR: " + errorMessage;
  println(formattedMessage);
}

Ok, we can’t. There’s nothing we can (easily) do here. What about this one:

public void alert(errorMessage: String) {
  String formattedMessage = "ERROR: " + errorMessage;
  logger.error(formattedMessage);
}

We could try using a mock. Let’s see how that goes:

public void shouldGenerateAlert() {
  //given
  String inputErrorMessage = "Oh crap!";
  String outputErrorMessage = "ERROR: Oh crap!";

  //mock setup
  when(loggerMock.error(outputErrorMessage)).doNothing();

  //when
  alert(inputErrorMessage);

  //then
  verify(loggerMock.error(outputErrorMessage)).wasCalled();
}

So, that works, I guess. However, what are we testing here?

Well, we’re definitely testing the business logic of the alert function which consists of prepending the string “ERROR: ” to the error message before passing it to the logger component.

We’re also verifying if the the logger component’s error method is being called with the expected argument but then we are back to the previous problem of having a test that knows too much about the things it’s testing.

The issue is that by testing the behavior of our function (the string concatenation bit) we’re forced to test how it’s implemented (the call to the logger component).

What if we did the following:

public class LoggerUtils {
  public static String formatErrorMessage(errorMessage: String) {
    return "ERROR: " + errorMessage;
  }
}

public void alert(errorMessage: String) {
  String formattedMessage = LoggerUtils.formatErrorMessage(errorMessage);
  logger.error(formattedMessage);
}

We’ve moved the business logic of prepending a string to the error message to an external class/method that we can now test without issues. However, we’re still not able to test the fact that the alert function itself must log a formatted message. The only thing we can do here is to test the function’s behavior which means testing its implementation through the use of mocks like we did before.

Our problem stems from the fact there’s no state for us to assert on. Or is there? Let’s ask ourselves, what changes in the world when we generate an alert? Well, somewhere in a server a string gets appended to a file, but we can’t test that (at least not from a unit test).

Spies

One solution to our problem of lack of state is to create an object that can represent that state so we can assert on. For that, we can use a special kind of stub called spy. A spy not just knows how to behave like the dependency we want to stub but also keeps track of the changes in its state:

public interface LoggerInterface {
  public void error(errorMessage: String);
}

public class LoggerSpy implements LoggerInterface {
  private List<String> messages = new ArrayList<String>();

  public List<String> getMessages { return this.messages; }

  public void error(errorMessage: String) {
    messages.add(errorMessage);
  }
}

Here we’re assuming the Logger class implements the LoggerInterface above. It would work the same way if it extended an abstract or non-sealed (non-final) class/method. If none of those are possible, we can always wrap our dependency into a class that we can extend:

public class LoggerWrapper implements LoggerInterface {
  private Logger logger = new Logger();

  public void error(errorMessage: String) {
    logger.error(errorMessage);
  }
}

public void alert(errorMessage: String) {
  String formattedMessage = LoggerUtils.formatErrorMessage(errorMessage);
  logger.error(formattedMessage);
}

Our function under test looks the same but now it’s calling the error method on an instance of the LoggerInterface (via LoggerWrapper) instead of the Logger class.

Another benefit of the wrapper approach is that you isolate the dependency from the code calling it. In our example, if we decide later to switch to a different logger class (or method) we could do so without impacting our tests.

Anyway, no matter how we end up implementing the spy, the consequence is that we can now implement our test that will assert on the state of the Spy:

public void shouldLogErrorMessage() {
  //given
  String inputErrorMessage = "Oh crap!";
  String outputErrorMessage = "ERROR: Oh crap!";

  //spy setup
  var loggerSpy = new LoggerSpy();

  //when
  alert(inputErrorMessage);

  //then
  assertEquals(1, loggeSpy.gettMessages().size());
  assertEquals(outputErrorMessage, loggerSpy.getMessages().get(0));
}

To keep things simple, we can assume in the code above there’s some form of dependency injection going on that makes the loggerSpy available to the logger function.

With this technique, we are able to move away from a behavior-based to a state-based testing approach. Even if we decide to change the component we use for logging, we’ll only need to change the Spy implementation and not the tests.

Complex Stubs

In some cases, creating stubs/spies manually can incur in a lot of work. A few examples are databases, message queues, caches, and the alike.

Thankfully, the software community has come up with fake implementations for most of the technologies we currently use in our projects, even cloud-native ones.

Localstack, for example, is a great solution for running the AWS stack locally. MySQL has an official mock implementation. I bet a search for “[tetchnology] fake mock” will return at least one result for the most widely-adopted technologies.

The good thing about using this kind of fakes is that you don’t need to change one line of code since they behave just like the real thing (mostly). The only thing we must do is to configure our unit tests to point to the local instances of those dependencies.

These fakes come in two flavors. Libraries or packages that we can embed in our tests and more elaborated services we must spin up separately (think containers). The good thing about the embedded libraries is that they don’t require any extra setup when running in a build pipeline as opposed to the container-based approach that usually requires deployment to a cluster. When running tests locally we shouldn’t see many differences between both approaches.

The Real Thing

Finally, depending on the technology (and as a last resort), we can run unit tests against real instances of our dependencies. Similar to the fakes mentioned in the previous section, they don’t require any code changes but might not (most likely not) be suited for tests running in a build pipeline. To be honest, I don’t think those are really an option and I’m already regretting mentioning it, but they might come in handy as a temporary solution while you and your team figure out a better and definitive approach.

Conclusion

In the end, the choice between state-based and behavior-based unit testing is up to you. I really prefer the state-based approach over the behavior-based one due to my bad experience with the latter over the years. I hope the tips in this article help you next time you’re faced with issues related to the use of mocks.

Thanks for reading so far! Feel free to leave a comment or feedback. Until next time!

Note

The code blocks in this post are pseudo-code inspired in Java but are not meant to be correct and/or compile successfully.

Getting CUDA to work on WSL2

Disclaimer

On this post I’m basically putting together the instructions from Microsoft and NVidia on how to get CUDA to work on WSL2. There’s nothing special I had to do other than following their instructions. The only issue I faced was that WSL2 and Docker stopped working when I upgraded to the Windows Insider’s build on Windows Home. I had to purchase Windows Pro in order to get WSL2 and Docker working again. This post assumes you already have WSL2 properly installed. If not, follow the instructions here before moving on. Make sure to check the requirements for installing WSL2.

Pre-requisites

Under the risk of stating the obvious, you’ll need a machine with a CUDA-enabled graphics card from NVidia. If, like me, you have a laptop machine, you can check your NVidia graphics card supports CUDA on this page.

You’ll need Windows 10 installed. In theory Windows Home should do, but I had issues with WSL2 and Docker after installing the Windows Insider build for the Home edition and only got it to work again after switching to (purchasing) Windows 10 Pro.

Installation Steps

You’ll then need to enroll on the Windows Insider Program. For that, you’ll need to sign up for a Microsoft account in case you don’t already have one.

After you enroll, you’ll need to enable the Windows Insider updates on your machine. For that you’ll need to open the Windows Insider Program settings by searching on the Windows search bar for “Insider”. Next, login with the Microsoft account you used to register on the Windows Insider program and when asked which channel you want to join, make sure to choose the Dev channel. Once you’re done, restart you computer.

After the computer restarts, you’ll need to configure your data settings to share all your data. I know, it sucks, but what can we nerds do? Wait? No way! You can click here to open the data settings directly. It looks like the most important settings that needs to be turned on is Optional Diagnostics Data.

Now you can finally go to Windows Updates and click on Check for Updates to download the latest Windows Insider build. As expected, your machine should restart multiple times while updating to the latest version.

The next step now is to install the NVidia drivers on your WSL2 Linux distro. If you’re using Ubuntu like me, you can follow the steps below:

$ apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub

$ sh -c 'echo "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /" > /etc/apt/sources.list.d/cuda.list'

$ apt-get update

Now you can move on to install CUDA. Do not choose the cuda, cuda-11-0, or cuda-drivers meta-packages under WSL 2 since these packages will result in an attempt to install the Linux NVIDIA driver under WSL 2. Instead, just run:

apt-get install -y cuda-toolkit-11-0

Now you can build the CUDA samples available under /usr/local/cuda/samples. For example:

$ /usr/local/cuda/samples/0_Simple/matrixMul$ ls
$ Makefile  NsightEclipse.xml  matrixMul.cu  readme.txt
$ /usr/local/cuda/samples/0_Simple/matrixMul$ sudo make

PS: I had to use sudo since my user doesn’t have write permissions under /user/local.

Now you can finally run the sample:

$ /usr/local/cuda/samples/0_Simple/matrixMul$ ls
Makefile  NsightEclipse.xml  matrixMul  matrixMul.cu  matrixMul.o  readme.txt
$ /usr/local/cuda/samples/0_Simple/matrixMul$ ./matrixMul
[Matrix Multiply Using CUDA] - Starting...
GPU Device 0: "Maxwell" with compute capability 5.0

MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel...
done
Performance= 114.26 GFlop/s, Time= 1.147 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS

NOTE: The CUDA Samples are not meant for performancemeasurements. Results may vary when GPU Boost is enabled.

That’s it! I hope you manage to get CUDA up and running on your WSL2 distro. The only caveat is that you can’t run the samples that include OpenGL visuals since WSL2 doesn’t support it yet.

If you still have some energy left, keep reading for the optional steps.

Optional Steps

Writing your own programs

The CUDA samples also come with a template to help you start writing your own GPU programs.

$ /usr/local/cuda/samples/0_Simple/template$ ls
Makefile  NsightEclipse.xml  doc  readme.txt  template.cu  template_cpu.cpp

Using an IDE

One option is to use NSight Visual Studio Edition on Windows and open the code files stored in your WSL2 file system, but I find it boring.

I’ve decided to go with NVidia’s NSight Eclipse Edition running on WSL2. However, since WSL2 doesn’t officially supports GUI application I had to hack my way around it.

I’ve followed the instructions on this post to get Linux GUI apps to run on WSL2. Then, I’ve installed NSight with apt-get:

sudo apt-get install -y nvidia-nsight

Then you can run NSight (make sure you have the X-Server up and running):

$ nsight

That’s all I had for today. Hope this post helps you get started with CUDA on WSL2. Let me know whether the instructions here work for you or not. Feedback is always welcome and feel free to report any mistakes or inaccuracies on this post.

Optimizing Software Development

Introduction

The goal of this article is to demonstrate how we can apply the principles of Mathematical Optimization to improve the software development process, but first, let’s step back a little bit to take a look at some optimization problems.

A Cautionary Tale

A school wanted to increase literacy among young students. They decided to create a program that would offer one dollar for each book read. After a couple of days of running the program, the teachers were impressed with the success of the initiative. Each child had read ten books per day on average.

However, after performing a more in-depth analysis they noticed a problem. The total number of pages read by each student was way below the expected. The children with their amazing brains had quickly found a way to optimize their gains. All books read were less than ten pages long.

“Tell me how you measure me, and I will tell you how I will behave.”

E.M. Goldratt (1990) The Haystack Syndrome: Sifting Information Out of the Data Ocean. North River Press, Croton-on-Hudson, NY, 1990

The anecdote above is a cautionary tale of how systems tend to optimize themselves (increase of entropy) towards a state of equilibrium. In physical systems this optimization has limits dictated by one or more dimensions like space, time, mass, velocity, temperature, energy and so on.

In the example above, the system optimized itself against a single dimension: book count. The fewer pages a book has, the more books I’ll read. A better dimension (or metric) might have been page count. Although, one could argue that choosing page count as the single metric to be evaluated could lead to students reading few large books which could decrease diversity of authors, topics and styles the students would be exposed to.

One solution to this problem would be to pick both book and page counts as metrics (or dimensions) to be evaluated. A formula to compute how much a student would get paid by the end of the initiative could have the shape:

$Reward=b\log\left(\frac{p}{b}\right)$

where b and p are the number of books and pages read respectively.

With the formula above we tie both variables together so the students would be incentivized not just to read as many books as possible but to also keep a high average of pages per book (the log function is only used to put a cap on how much money a student can make).

A More Classical Problem

A famous optimization problem is to compute the dimensions of an aluminum can in order to minimize material cost and maximize its volume. If you tried to minimize material cost only without taking volume into consideration you’d end up with the tiniest can your machines could fabricate. Conversely, if you tried to maximize the can volume without taking any other dimensions into consideration you’d end up with the largest can your machines could manufacture.

If you recall from your Calculus classes you probably remember the solution is to combine the formulas for the area and volume of the can, creating a function where the area is the dependent variable. Then you find the function’s minimum by using the first and second derivatives, but I’m sure you already knew that.

$A=2\pi{r}{h}+2\pi{r}^2$
(area of the cylinder)

$V=\pi{r}^2{h}$
(volume of the cylinder)

$A=2\pi{r}\frac{V}{\pi{r^2}}+2\pi{r}^2$
(area in terms of the volume)

I’ll leave the rest of the solution as an exercise to the reader. The main takeaway from this example is the idea of combining two dimensions (area and volume) in order to optimize both of them at the same time.

The point I’m trying to make here is that if we’re not careful with the dimensions we choose to evaluate a system and/or its components (software, people, project, business, etc) we might end up obtaining unexpected and/or undesired results. Let’s take a look at other optimization examples, but this time applied to the software development practice.

Organizational Structure and Architecture

Conway’s Law states that organizations design systems which mirror their own communication structure.

Imagine the scenario where development, infrastructure and security teams are siloed from each other. Let’s also assume the development teams are measured by the number of features delivered, the infrastructure team is measured by the number of incidents in production and total cost of ownership, and the security team is measured by the total number of incidents. What’s likely to happen in this scenario?

Well, chaos, for sure! Since the development teams are only concerned with the number of features being delivered, they’re more than likely to ignore quality, cost and security concerns, to name a few. This in turn will result in more defects in production which will burden the infrastructure team.

The infrastructure team on the other hand, in an effort to minimize incidents as well as costs, would probably come up with a very rigid process for provisioning new infrastructure, since more infrastructure means higher costs as well as a higher likelihood of something going wrong. This would have a direct impact on the development teams since getting that new server for that new functionality could take a long time (if ever approved).

In a similar fashion, the security team would tend to “lock everything down” in an effort to minimize the chance of incidents impacting both the development and infrastructure teams (after all security needs to sign off on that new server for that new functionality).

The end result is an organization where IT is perceived as incapable of delivering (which is indeed the truth) while the business becomes frustrated as its ideas don’t come to fruition. The organization struggles to innovate and is eventually surpassed by its competitors. Everyone loses their jobs. It’s really sad. I liked working with Dave.

However, paradoxically, if you asked the individual IT teams (dev, infra, security) how they perceived their own work, they would probably say they were crushing it. After all, they were able to meet the goals the organization set to each one of them.

But how can we solve this problem in order to save everyone’s jobs? Well, by following the same approach we used when solving the literacy initiative and aluminum can problems earlier in this article, i.e., by combining the different metrics into one.

For that we don’t need to come up with a fancy formula like before. Instead, we “simply” need to combine all the different teams into one (actually, one per domain vertical) and evaluate the new team(s) with the same metrics of the individual ones, thus, optimizing the system (development team) for multiple dimensions (success metrics).

The benefits of structuring development teams this way is twofold: first, you minimize communication overhead and other friction points between teams, and secondly, you make sure “everyone is in the same boat” and have common, shared goals. This strategy is an example of the Inverse Conway Maneuver.

Tech Stack Standardization

Another use of optimization in software development, involves the notorious tech stack standardization. The tech stack standardization spectrum can either be too coarse or too granular.

A too coarse standardization happens, for example, when the CTO/IT manager/Architect decides all databases used in all projects should be the same no matter the use case. It leads to projects using suboptimal technologies that don’t meet their requirements leading to incidental complexity and/or inability of delivering a given set of functionalities.

On the other end of the spectrum, one can decide each and every project has the freedom to pick whichever database technology they see fit. This leads to issues like the proliferation of technologies that are not well known across the organization and thus not maintained properly (if at all).

How can we solve this problem? The solution is to find the sweet spot in the spectrum (optimize) so we don’t end up in any of the ends. At this point, we (should) have already learned that optimizing against a single dimension is usually a bad idea. Therefore, we need to identify the dimensions that make sense to standardize against.

Two possible (and common) dimensions used for tech stack standardization are use case and load. In our example of databases, we could decide that for transactional systems (use case) that are not write-heavy (load) we want to standardize on PostgreSQL and for write-heavy (load) transactional systems (use case) we want to go with Cassandra.

Similarly we could decide that for analytical systems (use case) with lots of data (load) we want to adopt Apache Impala. If our friend Dave wanted to adopt, let’s say, MongoDB, he would have to prove his use case and load combination is not covered by the ones identified above. Sorry Dave.

Project Management

On average, the smaller the scope of a project, the simpler its implementation, thus increasing its chances of success. However, if we try to optimize for a minimum scope only, we would end up with a single-feature microproject™ that doesn’t deliver much value. Not very helpful.

On the other hand, optimizing for maximum business value only doesn’t make sense either. We would end up with a multi-year project that tries to deliver every possible feature and wouldn’t be completed in a timely fashion. Does it sound familiar? So the questions is, what dimensions can we choose when optimizing a project?

As you probably have already guessed, one option is to optimize for both minimum scope and maximum business value. In other words, we want a minimum viable product, or MVP. Another dimension that’s commonly used when optimizing a project is cost. If we add cost to an MVP we end up with a MVAP™ or minimum viable affordable product which in some (most) cases might not be feasible.

You can keep adding dimensions to your project optimization matrix but be mindful that the more dimensions, the more difficult is to find a sweet spot (or local minimum). The tradeoff sliders is a tool that helps with prioritization across multiple dimensions and I encourage you to check it out.

Machine Learning

Usually, when we think about Machine Learning and optimization we think about the optimization of the cost function. The next example is not a cost function optimization problem and I’m probably stretching the concept of optimization a little bit here. My goal is to demonstrate how pervasive and useful it’s to think in terms of optimization across dimensions. If you’re a data scientist feel free to jump to the conclusion section. There’s nothing for you to see here. Move on. Go.

One common machine learning use case is anomaly detection. Imagine a financial institution wants to monitor credit card transactions for fraudulent operations. In this (overly) simplified example imagine they decide to build a ML model that analyzes the following transaction properties (dimensions): date and time, amount, and merchant.

Let’s say our friend Dave is a client of this financial institution. Dave usually shops at lunch time, spends no more than U$ 50.00 and buys everything on stuffidontneed.com. Suddenly the system sees a U$ 500.00 transaction at 1am on iwasrobbed.com. Our friend Dave immediately receives a text messaging asking him to confirm the transaction. Crisis averted.

The next day, a hacker on the other side of the globe is able to obtain Dave’s credit card info as well as his purchase behavior and makes a series of purchases under U$ 50.00, during Dave’s lunch time on stufidontneed.com. The only difference between the hacker’s transactions and Dave’s is the delivery address. However, since the fraud detection model doesn’t take the delivery address into account the transactions are processed successfully and Dave will have to spend delightful hours with his bank’s customer service in order to prove he wasn’t responsible for the fraudulent transactions. Poor Dave.

One could argue the anomaly detection model wasn’t optimized for the task at hand (or that it was optimizing for the dimensions it was aware of). By including another dimension (or feature) the model would be able to improve its optimization algorithm and yield better results.

Conclusion

Despite not being a mathematician, statistician, and the like, I was always passionate about Calculus and applied mathematics, more specifically, optimization problems. I think many of the problems in life can be treated as optimization problems, e.g., work-life balance, vacation time (too little it’s not enough, too much and you get tired), buying a house, your investment portfolio, amount of sugar in your coffee, and so on.

Software development is no different. However, in order to optimize a problem, you need to be able to identify its dimensions so you can be sure you’re optimizing for the things that matter. This skill, like any other, requires practice and the more you practice it, the sooner you’ll master it.

Thanks for reading so far. Can you think of other applications of optimization in software development? Feel free to leave your comments. All constructive feedback is welcome.

PS: In case you are really curious and too lazy to solve (i.e. google) the aluminum can optimization problem, the answer is a square-shaped can (height equals the top/bottom diameter).

When I finally figured out Clojure.core.async

I’ve been playing with Clojure for a long time now but I’ve always had a hard time figuring out Clojure.async. Until it finally clicked!

In case you’re not familiar with Clojure.core.async, it’s a library for async programming in Clojure. (You: Really? I never would’ve guess it.)

Its main abstraction is the channel. We put messages into a channel and take messages out of it. A channel may have a buffer. If the channel doesn’t have a buffer or if the buffer is full, it won’t accept new messages. If we try to put a message into the channel, the calling thread will block until we take a message out of the channel making room in the buffer for the new message.

Conversely, if we try to take a message out of the channel and there’s none available, the calling thread will also block.

Another main component of Clojure.core.async is the go block. A go block executes code on a thread pool. A go block takes a block of code as its input and executes it while watching it for any channel blocking operations (put/take).

The code being executed by a go block in a given point in time is called a process. Therefore, a go block allows for a single thread (or thread pool) to execute multiple processes concurrently by interleaving them.

Whenever a go block encounters a blocking operation it will park it and release the actual thread of execution so it’s free to take on some other work. When the blocking operation completes, it will resume the execution of the code block on the thread pool.

Ok, enough theory, let me tell you what my question was and how I answered it.

So, what got me confused at first was the fact that Clojure.core.async has both blocking and non-blocking put and take calls (>!!, <!! and >!, <!) but at the same time a channel will always block any put operation if the channel’s buffer is full (or if no buffer is available) as well as block any take operation if there are no messages available.

Also, if the channel’s buffer is not full, it won’t block a put operation even if it’s a blocking one (>!!). Conversely, if the channel’s buffer is not empty, it won’t block a take operation even if it’s a blocking one (<!!).

So, what’s the point of having blocking and non-blocking operations if in the end the blocking behavior is defined by the channel itself?

In order to answer my question, let’s play with some Clojure.core.async code.

For the examples below I’m assuming we have Leiningen installed.

In order to start a repl with Clojure.core.async available and without having to create a project we can install the lein-try plugin.

Once we have everything installed you can run the following command to start a repl:

lein try org.clojure/core.async

Then run the following command to import the Clojure.core.async functions into your namespace:

(require '[clojure.core.async :refer :all])

We’ll get some warnings about functions being replaced but for the sake of this example it’s ok to ignore them.

Now we can finally start writing some Clojure.core.async code.

The following code snippet creates a channel:

(def a-channel (chan))

The channel above is unbuffered which means if we put something into it, it will block until “someone” reads from it.

The following code snippet puts a value into the channel and will block if the channel’s buffer is full (or doesn’t exist, which is the case here):

(>!! a-channel 1)

If you’re entering those commands in a repl it probably just got frozen. Sorry about that.

After restarting the repl, we can try the following (don’t forget to re-import the core.async functions):

(def b-channel (chan 1))

This time we created a channel with a buffer with size one. We can try putting a value into it and see what happens this time:

(>!! b-channel 1)

Yeah! No frozen repl this time! Let’s try it again:

(>!! b-channel 2)

Oh s@#$! But ok, it makes sense. We’ve created a channel with buffer size one and tried to put two values into it. Since we didn’t took the first value out it blocked because the buffer was already full.

Let’s see something cool now:

(def c-channel (chan))
(go (>!! c-channel 1) (do (println "\nPut 1")))

We’ve created another unbuffered channel and put a value into it like we did before. Then we used the do special form to print the message “Put 1”. Since c-channel is unbuffered, the call to >!! should block and we shouldn’t see the message being printed. Finally, we’ve wrapped both statements (>!! and do) within a go block so we don’t block our main thread.

As we saw before, a go block will execute the code block we pass to it asynchronously which will result in the calling thread not blocking (which means the repl won’t block, yeah!).

Now we can take a value our of the channel and see what happens:

(<!! c-channel)
1
Put 1

We can see in the output that the value that was put into the channel (integer one) was taken. This unblocks the call to >!! allowing the do statement (that was waiting for the >!! call to complete) to proceed and, finally, print the message.

So far we’ve covered the blocking >!! and <!! functions. What about the non-blocking versions >! and <! ? Let’s repeat the same example but with the non-blocking version of the put operation (>!) this time:

(def d-channel (chan))
(go (>! d-channel 1) (do (println "\nPut 1")))
(<!! d-channel)
1
Put 1

The code is pretty much the same and so is the result. That means that even using the non-blocking put operation the code block we pass to the go block is still blocking, waiting for us to take the value out of the channel so it could proceed to the do statement and print the message.

So, why is the non-blocking put operation actually blocking?

It turns out that non-blocking is a misleading term for the behavior of >! and <!.

Instead of blocking, >! and <! park the code block being executed. The end result might look the same but it’s actually really different. And it’s because >!! and <!! will block the execution thread while >! and <! will park the code block (process) being executed and free the thread to do some other work.

The difference becomes clearer if you compare the following code snippets below:

(def e-channel (chan))
(doseq [i (range 1000)] (go (>! e-channel i)))
(doseq [i (range 1000)] (println (<!! e-channel)))

(def f-channel (chan))
(doseq [i (range 1000)] (go (>!! f-channel i)))
(doseq [i (range 1000)] (println (<!! f-channel)))

The first code snippet will launch a thousand processes that will attempt to put a message into the channel and then will be parked until we start taking messages from it. The thread pool will remain free to take on some other work.

The end result is that we’ll see all integers from 0 up to 999 being printed to the console. The numbers are printed out of order since the processes run in parallel (eight at a time).

In the second code snippet, we’re starving the thread pool since we’re telling Clojure.core.async to block a thousand threads in the thread pool until we start taking messages from the channel. A given Clojure process has a single thread pool for all go blocks. By default, the thread pool contains a low number of threads (usually eight but it can be configured). The first 8 put operations will succeed since we have 8 threads available in the thread pool but the other 992 will fail since there are no threads left to process these operations.

The end result is that we’ll only see the integers from 0 up to 7 being printed to the console. The numbers are also printed out of order (because the 8 threads run in parallel) but we get the first as the result 8 since the put calls are blocking ones. Also, since we’re trying to take 1000 messages from the queue with a blocking operation and the channel only contains 8, the main thread will block (and so the repl, sorry).

So, the answer to the question “what’s the point of having blocking and non-blocking operations if in the end the blocking behavior is defined by the channel itself?” is the following:

The channel will always block when the buffer is full or non-existent when putting a new message into it and it will always block when there are no messages left in the channel when taking a message from it.

When we choose between blocking and non-blocking put and/or take operations what we’re really telling Clojure.core.async is how we want it to behave when a channel operation blocks.

Do we want it to block the entire thread or do we want it to park our code block and release the thread so it can work on something else?

There are cases where we want the former and other cases where we want the latter. It’s beyond the scope of this post to get into the details of each scenario.

For more on Clojure.core.async I recommend the following chapter of Clojure for the Brave and True and the Clojure.core.async ‘s official documentation.

Let me know your thoughts on this post and feel free to point any mistakes I might have made.

Don’t ask what monads can do for you, ask what your programming language can do with monads!

Ok, I know what you’re thinking, but no, this is not another tutorial about monads. During the entirety of this article, you’re not going to find a single definition of monads whatsoever. If you’re looking for definitions of monads I guarantee you can find plenty of them by using your search engine of choice.

Instead, my goal here is to show how some languages can leverage monads in order to allow for more composable, concise and cleaner code.

So, without further ado, let’s get started!

Every time I try to understand what a given technology does or how it works, I like to begin by understanding what problems it’s trying to solve. So, let’s start by looking at a problem that can be solved by using monads.

First, take a look at the following code which does a lot of null checks:

case class RegularPerson(name: String, reportsTo: RegularPerson) {
}

object RegularPerson {
  def getSupervisorsSupervisorName(employee: RegularPerson): String = {
    if (employee != null) {
      val supervisor = employee.reportsTo

      if (supervisor != null) {
        val supervisorsSupervisor = supervisor.reportsTo

        if (supervisorsSupervisor != null) {
          supervisorsSupervisor.name
        } else {
          null
        }
      } else {
        null
      }
    } else {
      null
    }
  }
}

For those not familiar with Scala, the RegularPerson case class defines a person with two attributes: a name, and a supervisor that person reports to (which in turn is a RegularPerson itself). That means one could create a chain of nested persons going on forever.

The getSupervisorsSupervisorName method receives a person as a parameter and tries to retrieve the person’s supervisor’s supervisor’s name. I could have stopped at the person’s supervisor but I think going down one more level helps to make the problem more obvious.

Before you say something, I know there are better ways of implementing such a function but the point I’m trying to make here is that this scenario (null checking) is a very common one and it can be solved by applying monads if your language is “monad-aware”.

So, let’s take a look at one possible solution:

case class Person(name: String, reportsTo: MyOption[Person]) {
}

object Person {
  def getSupervisorsSupervisorName(maybeEmployee: MyOption[Person]): MyOption[String] = {
    for {
      employee <- maybeEmployee
      supervisor <- employee.reportsTo
      supervisorSupervisor <- supervisor.reportsTo
    } yield supervisorSupervisor.name
  }
}

Much cleaner, right?

The first thing we should notice in the code above is that we’ve encapsulated the Person object inside a container called MyOption. We could’ve used Scala’s native Option class here but I wanted to show how we can create our own monads if we wanted/needed to.

Now, let’s focus on the for comprehension piece of the code.

for {
      employee <- maybeEmployee
      supervisor <- employee.reportsTo
      supervisorSupervisor <- supervisor.reportsTo
    } yield supervisorSupervisor.name

What this code does is to extract the value (Person) out of the MyOption container (three times) and then return a result encapsulated with the same container (MyOption[String]). If you are not familiar with Scala’s inners, the following code snippet shows what the compiler would translate that code to:

  maybeEmployee
    .flatMap(employee => employee.reportsTo)
      .flatMap(supervisor => supervisor.reportsTo)
        .map(supervisorSupervisor => supervisorSupervisor.name)

That means in order for this code to work our container class (MyOption) must implement those two methods (map and flatMap). Let’s take a look at those methods’ signatures:

sealed trait MyOption[+A] {
  def map[B](f: A => B): MyOption[B]

  def flatMap[B](f: A => MyOption[B]): MyOption[B]
}

We can see that the map function receives a function as a parameter and returns an instance of our container (MyOption). If you’re not familiar with Scala’s traits, you can think of it as an interface like the ones in C# and Java.

Now let’s take a look at one of the concrete implementations of the map and flatMap functions:

case class MySome[A](value: A) extends MyOption[A] {
  def map[B](f: A => B): MyOption[B] = MySome[B](f(value))

  def flatMap[B](f: A => MyOption[B]): MyOption[B] = f(value)
}

The first thing we must notice is that the MySome class extends the MyOption trait. Again, if you’re not familiar with Scala, that would be like a class implementing an interface or an abstract class.

By taking a closer look at the map function signature, we can notice that the function we pass to it receives an instance of the same type that the MyOption container encapsulates (in this case Person) and returns any type we want (in our case that would be a String). Finally, the map function encapsulates the result of the function we’re passing back into the MyOption container.

For our example, that means we need to pass to the map function a function which receives a Person instance and returns a String. In our example above (the one with the map and flatMap calls) we pass the following function (lambda):

supervisorSupervisor => supervisorSupervisor.name

Whose signature is:

Person => String

Which matches the signature the map function expects.

Let’s pause for a while to think about what that means… the map function takes a function that applies a transformation to a value of a given type and returns a value of a (possible) different type. After applying the transformation, the map function encapsulates the result of the transformation into the container type the map function is defined for.

Since the result type of the map function is guaranteed to be of the same type of the container we’ve defined map and flatMap for, we can be sure we can call map or flatMap in the result value as well. This is a similar pattern to the Builder design pattern where one can chain method calls in order to construct an object.

There’s one problem though. If we chain multiple calls to map passing a function that returns our container (which is exactly what we’re doing in the for comprehension) we’ll end up with nested instances of our container as the result (think of a Russian doll). Here’s an example:

val value = MySome("a").map(x => MySome(x)).map(x => MySome(x))
> value: MyOption[MySome[MySome[String]]] = MySome(MySome(MySome(a)))

The reason this behavior becomes a problem is that it doesn’t allow us to chain (or compose) function calls. At least not without adding a lot of extra work in order to access the contents of the nested containers. Finally, each statement of the for comprehension block expects a function that returns the same container type (not necessarily with the same content type though). That means we need a function with a different signature to be able to compose our function calls nicely.

Ok, so far, so good. Now let’s take a look at the implementation of the flatMap function:

def flatMap[B](f: A => MyOption[B]): MyOption[B] = f(value)

By looking at its signature we can see that it expects as a parameter a function that receives an instance of the same type our container encapsulates and returns another (any) type encapsulated in our container (MyOption).

For our example, that means we need to pass to the flatMap function a function which receives a Person instance and returns a MyOption[Person]. In our example above (the one with the map and flatMap calls) we pass the following functions (lambdas):

employee => employee.reportsTo

and

supervisor => supervisor.reportsTo

Which both have the following signature:

Person -> Person

Which matches the signature the flatMap function expects.

Ok, so let’s pause again and think about the flatMap function behavior for a while… Let f be the function we pass to flatMap as a parameter. Differently from the map function, flatMap doesn’t require us to encapsulate the result of the call to f into the container type since f must already return the container type as its result. Let’s see an example of a series of nested calls to flatMap where f returns the container type and notice how it differs from the same example using map:

val value = MySome("a")
  .flatMap(x => MySome(x))
  .flatMap(x => MySome(x))> value: MyOption[String] = MySome(a)

Pretty cool, huh? This looks a lot more composable. Actually, it’s just what the for comprehension expects. This shouldn’t be a surprise though since we’ve already seen that the for comprehension is just syntax sugar for a series of calls to flatMap with yield being translated to a call to map.

Ok, so you might now be asking: what does all this mean and how does it relate to monads?

Let’s start by recapping what we’ve seen so far:

Scala has a for comprehension operator that under the hoods, translate to a series of flatMap and map calls.
The for comprehension operator (as well as the flatMap function) allows us to compose function calls that otherwise would require extra boilerplate code in order to process the results of those function calls and pass them along to other functions.
We saw that by encapsulating our values into containers and using the flatMap function defined for those containers it’s possible to decouple our functions and make them composable.

Well, it turns out that our MyOption class is a monad. And so are Lists, Futures(?), Options and many other types implemented by the Scala standard library. One can think of monad as an interface with two methods (bind and unit) that must be implemented by its concrete types. By implementing the flatMap function (bind) and the apply function (unit) we’re implementing our own monadic type. For those not familiar with Scala, the apply function is like a constructor we get for free when defining a case class.

You might be wondering, what about the map function though? Why did we have to implement it if it’s not part of the monad “interface”? The answer is because it’s required by the for comprehension statement. Remember that the for comprehension statement is translated to a series of flatMap calls with the yield statement being translated to a map call. Without the map function we wouldn’t be able to call the yield statement. There are other functions that are also not part of the monad “interface” but could be implemented in order to get more functionality out of the for comprehension statement, like, for example, filtering.

C# is another language which also leverages monads. You can achieve a similar behavior to the one provided by Scala’s for comprehension by using LINQ’s from statement. If you’re curious, you can take a look at the source code for this article on this Github repo. It contains examples in both Scala and C#.

That’s it for now! What I wanted you to take from this article is that some languages can take advantage of the behavior provided by monads even if you don’t know what monads are. Whenever you use a for comprehension to iterate over a List or access the content of an Option you’re leveraging the power of monads and making your code cleaner and simpler and less susceptible to errors.

However, I’d be lying if I said that’s all. There’s a lot more to monads though. Beyond implementing those two functions, a monad needs to comply with a set of laws or rules. Also, monads allow us among other things, to make our functions pure by isolating side effects, etc, but I’ll leave that to articles that focus on those aspects since I promised you this is was not going to be a monads tutorial. If you’re curious about what the monads’ laws are you can take a look at this article or Google for one of the many monads tutorials available.

Let me know your thoughts on this article! You can reach me on Twitter or Linkedin. Thanks for reading and till next time!

Asp.Net MVC 3, HTML.BeginForm e muita dor de cabeça

Olá!!! Estou de volta para compartilhar mais um aprendizado com o mundo. Faz tempo que não escrevo, pois acho que só vale a pena escrever algo quando não encontro similar no Google (o que convenhamos é raro).

Dessa vez o perrengue que passei está relacionado com o envio de formulários utilizando asp.net mvc 3, mas pelo que entendi o problema pode ocorrer com qualquer linguagem/framework MVC.

O problema era daquele tipo que não ocorre na sua máquina durante o desenvolvimento, mas somente quando se efetua o deploy no servidor de hospedagem (no meu caso na Locaweb). A causa do problema permanece uma incógnita pra mim, e se alguém vier a saber qual é, favor me avisar. O cenário do meu terror era o seguinte:

Formulário gerado via scaffolding (automaticamente) pronto pra ser preenchido no navegador.
O formulário é preenchido corretamente e em seguida é enviado via POST.
O formulário é recarregado com os campos em branco sem que o código do método responsável pela action seja chamado.
Outros formulários da mesma aplicação funcionam normalmente.
Erro só ocorre quando o site é publicado (no meu caso na Locaweb).

Seguindo a (minha) lógica, a primeira coisa a investigar é o que há de diferente entre este formulário e os outros formulários da aplicação que funcionam normalmente. As diferenças encontradas foram as seguintes:

O formulário problemático possui o atributo enctype diferente dos outros, pois o mesmo efetua o upload de arquivos.
Por este motivo o método HTML.BeginForm utilizado na construção do formulário era de uma versão diferente (overload) do que foi utilizado nos outros formulários.

Fora isso tudo era similar. Algumas coisas que cheguei a investigar, mas que não tinham nada a ver com o problema foram:

Problema com a validação cliente (javascript) do formulário.
Configuração (web.config) do site. Algo relacionado ao upload de arquivos.
Outras coisas que nem me lembro mais.

Seguindo o próximo passo lógico, removi o código diferente usado nesse formulário do arquivo cshtml e do controller responsável pelo mesmo, comparando o html gerado. Os formulários que funcionavam utilizavam o método BeginForm sem parâmetros, enquanto que o formulário com problema utilizava um overload aonde era passado o enctype do mesmo.

Como resultado da comparação verifiquei que quando é utilizado o método BeginForm sem parâmetros, o atributo action do formulário é terminado com uma barra “/”.

Na versão com overload essa barra final não é adicionada.

Após ler meio Google atrás de algo que pudesse esclarecer a importância dessa barra (a)final, descobri que quando ela é incluída na url, o servidor busca pelo diretório que aparece antes da barra. Quando a mesma não é incluída o servidor tenta descobrir se o último parâmetro da url é um arquivo ou um diretório e retorna a url encontrada (caso haja) ao navegador. Daí o navegador realiza uma segunda chamada a essa url.

O problema, meus amigos, é que como a maldita barra não era incluída na action do formulário, o mesmo realizava um post no servidor, que (creio eu) identificava o alvo da url como um arquivo, retornando a url ao navegador que na segunda requisição não usava mais o método POST, e sim o GET. Como no meu controller (e provavelmente nos seus também) existem dois métodos com o mesmo nome, um para o GET, que retorna o formulário a ser preenchido, e um para o POST, responsável por tratar o envio dos dados, o navegador acabava solicitando o formulário em branco (GET), ao invés de enviar o conteúdo do mesmo (POST).

Solução gambiosa (POG): dar um jeito de sempre incluir a barra no final da url do atributo action do formulário.

No meu caso abandonei o uso do BeginForm e usei a tag do formulário harded-coded mesmo (só para as views deste controller, já que nos outros formulários não era necessário o atributo enctype e o BeginForm sem parâmetros gera a url da action com a maldita barra no final).

Para o cshtml de inserção (create) ao invés de:

@using (Html.BeginForm(“Create”, “Product”, FormMethod.Post, new { enctype = “multipart/form-data” })) {

Usei:

Para o cshtml de edição ao invés de:

@using (Html.BeginForm(“Edit”, “Product”, FormMethod.Post, new { enctype = “multipart/form-data” })) {

Usei:

Lembrando de fechar a tag form no lugar correto.

E pronto. Tudo funcionou as mil maravilhas. Entretanto, como disse anteriormente, só falta saber por que no servidor da minha máquina a falta da barra ao final da url não faz diferença, enquanto que no servidor da Locaweb faz. Provavelmente é alguma configuração de roteamento da url, ou algo parecido. Enfim, a quem souber e puder me informar, ficarei muito grato.

É isso aí, por hoje é só, e até a próxima!

Profiling no MySql – Parte II

Olá,

há algum tempo postei aqui uma dica para habilitar o trace ou log de consultas realizadas no MySql. Entretanto hoje ao tentar realizar a mesma configuração em uma versão diferente do MySql não obtive sucesso.

Foi então que descobri que dependendo da versão do MySql a configuração do log varia. Existe agora a opção de enviar o log para uma tabela ao invés de um arquivo.

A tabela se chama general_log e se encontra na base mysql. Resumindo a história, para a versão 5.1.32 adicionei as seguintes linhas no arquivo my.ini (na seção mysqld):

log-output=TABLE

general_log=1

Feito isso e reiniciado o serviço basta realizar a seguinte consulta para obter o conteúdo do log:

SELECT * FROM mysql.general_log g;

É isso aí, abraços e até a próxima!