Optimizing Software Development

Introduction

The goal of this article is to demonstrate how we can apply the principles of Mathematical Optimization to improve the software development process, but first, let’s step back a little bit to take a look at some optimization problems.

A Cautionary Tale

A school wanted to increase literacy among young students. They decided to create a program that would offer one dollar for each book read. After a couple of days of running the program, the teachers were impressed with the success of the initiative. Each child had read ten books per day on average.

However, after performing a more in-depth analysis they noticed a problem. The total number of pages read by each student was way below the expected. The children with their amazing brains had quickly found a way to optimize their gains. All books read were less than ten pages long.

“Tell me how you measure me, and I will tell you how I will behave.”

E.M. Goldratt (1990) The Haystack Syndrome: Sifting Information Out of the Data Ocean. North River Press, Croton-on-Hudson, NY, 1990

The anecdote above is a cautionary tale of how systems tend to optimize themselves (increase of entropy) towards a state of equilibrium. In physical systems this optimization has limits dictated by one or more dimensions like space, time, mass, velocity, temperature, energy and so on.

In the example above, the system optimized itself against a single dimension: book count. The fewer pages a book has, the more books I’ll read. A better dimension (or metric) might have been page count. Although, one could argue that choosing page count as the single metric to be evaluated could lead to students reading few large books which could decrease diversity of authors, topics and styles the students would be exposed to.

One solution to this problem would be to pick both book and page counts as metrics (or dimensions) to be evaluated. A formula to compute how much a student would get paid by the end of the initiative could have the shape:

Reward=b\log\left(\frac{p}{b}\right)

where b and p are the number of books and pages read respectively.

With the formula above we tie both variables together so the students would be incentivized not just to read as many books as possible but to also keep a high average of pages per book (the log function is only used to put a cap on how much money a student can make).

A More Classical Problem

A famous optimization problem is to compute the dimensions of an aluminum can in order to minimize material cost and maximize its volume. If you tried to minimize material cost only without taking volume into consideration you’d end up with the tiniest can your machines could fabricate. Conversely, if you tried to maximize the can volume without taking any other dimensions into consideration you’d end up with the largest can your machines could manufacture.

If you recall from your Calculus classes you probably remember the solution is to combine the formulas for the area and volume of the can, creating a function where the area is the dependent variable. Then you find the function’s minimum by using the first and second derivatives, but I’m sure you already knew that.

A=2\pi{r}{h}+2\pi{r}^2
(area of the cylinder)

V=\pi{r}^2{h}
(volume of the cylinder)

A=2\pi{r}\frac{V}{\pi{r^2}}+2\pi{r}^2
(area in terms of the volume)

I’ll leave the rest of the solution as an exercise to the reader. The main takeaway from this example is the idea of combining two dimensions (area and volume) in order to optimize both of them at the same time.

The point I’m trying to make here is that if we’re not careful with the dimensions we choose to evaluate a system and/or its components (software, people, project, business, etc) we might end up obtaining unexpected and/or undesired results. Let’s take a look at other optimization examples, but this time applied to the software development practice.

Organizational Structure and Architecture

Conway’s Law states that organizations design systems which mirror their own communication structure.

Imagine the scenario where development, infrastructure and security teams are siloed from each other. Let’s also assume the development teams are measured by the number of features delivered, the infrastructure team is measured by the number of incidents in production and total cost of ownership, and the security team is measured by the total number of incidents. What’s likely to happen in this scenario?

Well, chaos, for sure! Since the development teams are only concerned with the number of features being delivered, they’re more than likely to ignore quality, cost and security concerns, to name a few. This in turn will result in more defects in production which will burden the infrastructure team.

The infrastructure team on the other hand, in an effort to minimize incidents as well as costs, would probably come up with a very rigid process for provisioning new infrastructure, since more infrastructure means higher costs as well as a higher likelihood of something going wrong. This would have a direct impact on the development teams since getting that new server for that new functionality could take a long time (if ever approved).

In a similar fashion, the security team would tend to “lock everything down” in an effort to minimize the chance of incidents impacting both the development and infrastructure teams (after all security needs to sign off on that new server for that new functionality).

The end result is an organization where IT is perceived as incapable of delivering (which is indeed the truth) while the business becomes frustrated as its ideas don’t come to fruition. The organization struggles to innovate and is eventually surpassed by its competitors. Everyone loses their jobs. It’s really sad. I liked working with Dave.

However, paradoxically, if you asked the individual IT teams (dev, infra, security) how they perceived their own work, they would probably say they were crushing it. After all, they were able to meet the goals the organization set to each one of them.

But how can we solve this problem in order to save everyone’s jobs? Well, by following the same approach we used when solving the literacy initiative and aluminum can problems earlier in this article, i.e., by combining the different metrics into one.

For that we don’t need to come up with a fancy formula like before. Instead, we “simply” need to combine all the different teams into one (actually, one per domain vertical) and evaluate the new team(s) with the same metrics of the individual ones, thus, optimizing the system (development team) for multiple dimensions (success metrics).

The benefits of structuring development teams this way is twofold: first, you minimize communication overhead and other friction points between teams, and secondly, you make sure “everyone is in the same boat” and have common, shared goals. This strategy is an example of the Inverse Conway Maneuver.

Tech Stack Standardization

Another use of optimization in software development, involves the notorious tech stack standardization. The tech stack standardization spectrum can either be too coarse or too granular.

A too coarse standardization happens, for example, when the CTO/IT manager/Architect decides all databases used in all projects should be the same no matter the use case. It leads to projects using suboptimal technologies that don’t meet their requirements leading to incidental complexity and/or inability of delivering a given set of functionalities.

On the other end of the spectrum, one can decide each and every project has the freedom to pick whichever database technology they see fit. This leads to issues like the proliferation of technologies that are not well known across the organization and thus not maintained properly (if at all).

How can we solve this problem? The solution is to find the sweet spot in the spectrum (optimize) so we don’t end up in any of the ends. At this point, we (should) have already learned that optimizing against a single dimension is usually a bad idea. Therefore, we need to identify the dimensions that make sense to standardize against.

Two possible (and common) dimensions used for tech stack standardization are use case and load. In our example of databases, we could decide that for transactional systems (use case) that are not write-heavy (load) we want to standardize on PostgreSQL and for write-heavy (load) transactional systems (use case) we want to go with Cassandra.

Similarly we could decide that for analytical systems (use case) with lots of data (load) we want to adopt Apache Impala. If our friend Dave wanted to adopt, let’s say, MongoDB, he would have to prove his use case and load combination is not covered by the ones identified above. Sorry Dave.

Project Management

On average, the smaller the scope of a project, the simpler its implementation, thus increasing its chances of success. However, if we try to optimize for a minimum scope only, we would end up with a single-feature microproject™ that doesn’t deliver much value. Not very helpful.

On the other hand, optimizing for maximum business value only doesn’t make sense either. We would end up with a multi-year project that tries to deliver every possible feature and wouldn’t be completed in a timely fashion. Does it sound familiar? So the questions is, what dimensions can we choose when optimizing a project?

As you probably have already guessed, one option is to optimize for both minimum scope and maximum business value. In other words, we want a minimum viable product, or MVP. Another dimension that’s commonly used when optimizing a project is cost. If we add cost to an MVP we end up with a MVAP™ or minimum viable affordable product which in some (most) cases might not be feasible.

You can keep adding dimensions to your project optimization matrix but be mindful that the more dimensions, the more difficult is to find a sweet spot (or local minimum). The tradeoff sliders is a tool that helps with prioritization across multiple dimensions and I encourage you to check it out.

Machine Learning

Usually, when we think about Machine Learning and optimization we think about the optimization of the cost function. The next example is not a cost function optimization problem and I’m probably stretching the concept of optimization a little bit here. My goal is to demonstrate how pervasive and useful it’s to think in terms of optimization across dimensions. If you’re a data scientist feel free to jump to the conclusion section. There’s nothing for you to see here. Move on. Go.

One common machine learning use case is anomaly detection. Imagine a financial institution wants to monitor credit card transactions for fraudulent operations. In this (overly) simplified example imagine they decide to build a ML model that analyzes the following transaction properties (dimensions): date and time, amount, and merchant.

Let’s say our friend Dave is a client of this financial institution. Dave usually shops at lunch time, spends no more than U$ 50.00 and buys everything on stuffidontneed.com. Suddenly the system sees a U$ 500.00 transaction at 1am on iwasrobbed.com. Our friend Dave immediately receives a text messaging asking him to confirm the transaction. Crisis averted.

The next day, a hacker on the other side of the globe is able to obtain Dave’s credit card info as well as his purchase behavior and makes a series of purchases under U$ 50.00, during Dave’s lunch time on stufidontneed.com. The only difference between the hacker’s transactions and Dave’s is the delivery address. However, since the fraud detection model doesn’t take the delivery address into account the transactions are processed successfully and Dave will have to spend delightful hours with his bank’s customer service in order to prove he wasn’t responsible for the fraudulent transactions. Poor Dave.

One could argue the anomaly detection model wasn’t optimized for the task at hand (or that it was optimizing for the dimensions it was aware of). By including another dimension (or feature) the model would be able to improve its optimization algorithm and yield better results.

Conclusion

Despite not being a mathematician, statistician, and the like, I was always passionate about Calculus and applied mathematics, more specifically, optimization problems. I think many of the problems in life can be treated as optimization problems, e.g., work-life balance, vacation time (too little it’s not enough, too much and you get tired), buying a house, your investment portfolio, amount of sugar in your coffee, and so on.

Software development is no different. However, in order to optimize a problem, you need to be able to identify its dimensions so you can be sure you’re optimizing for the things that matter. This skill, like any other, requires practice and the more you practice it, the sooner you’ll master it.

Thanks for reading so far. Can you think of other applications of optimization in software development? Feel free to leave your comments. All constructive feedback is welcome.

PS: In case you are really curious and too lazy to solve (i.e. google) the aluminum can optimization problem, the answer is a square-shaped can (height equals the top/bottom diameter).

One thought on “Optimizing Software Development

Leave a comment