Operational Impact · These are questions for wise men with skinny arms

Operational Impact

I used to hate doing ops, it reminds me of doing front-line IT work when I had no idea what was going on and was constantly at the whim of a developer I could never contact or understand. Once I became a developer, I didn’t give ops enough credit until I did independent work. As a developer first, I favored build over buy because I’d rather make my own mess than spend time learning someone else’s. The level of control and understanding you gain by digging in and making it work from nearly the ground up is rewarding. Every time I made something new I learned how to make the next one better in a different way. The downside is that most development is glorified plumbing, where the interface dominates the logic even in environments that demonize boilerplate. Ops work is the extreme end of development plumbing, where the work is cleaning and assembly. I don’t find Ops work enjoyable in the same way I find dev rewarding. Tool configuration isn’t a general process, there are few common abstractions to learn. Once you understand the basics of system administration, the rest of your time is spent reading the documentation for specific applications. Then when doing the work, you only learn about the pitfalls of that specific system. You might be building something, but it’s like having to learn a new language for each component’s configuration. Working with just a few applications limits the scope of what you can build, even if you’re really good at knowing how to configure them. It’s a tough slog to work with a large application for a long time and not feel like you’re progressing because you’re doing something nominally different enough where your prior experience is minimally helpful. Configuring long-running startup services on Windows is very different from configuring the same thing on most flavors of Linux. It’s infuriating to almost start over when you feel like there should be some experience transfer.

The truth for many modern organizations is that Ops has a better effort vs impact tradeoff on the business than development does. With today’s tools, more and more development work resembles operations. Most organizations aren’t writing a custom platform whole cloth anymore, either you have unique constraints preventing you from leveraging existing tools or you just enjoy re-inventing the wheel. The toughest part of most business application projects isn’t saying ‘can I do this’, but ‘what tools do I want to use’. This changes the level of complexity in the final product because instead of a single integrated solution there are multiple steps and separate components that require configuring another application. Modern languages have development dependencies that need management. Some take this to mean that developers are taking over operations roles, but really it’s just the continual growth of developing a system with bigger components. There are more high-level configuration options than ever before, as each dependency can bring dependencies of its own. On some level, this was always the case. A compiler or runtime has a number of options that need to be configured to integrate components, but the level of customization available to run successfully versus robustly deploying and hosting a stack of separate applications is worlds apart in terms of the knowledge and steps required. Even with the additional configuration required at the lowest level, it’s often times faster to integrate a partially understood library than to develop a custom solution. How many people really understand the runtime they’re using more than the libraries they’ve included? While splitting every level out sounds like a micro headache, even monolithic architectures aren’t totally self-contained. The dependencies are just easier to manage because they shouldn’t change as often. When things do change, then the whole thing can fail. Managing the upgrades of these components is more important than ever. We were already living in a world of micro-services, but most of them were pre-configured by the OS and required minimal maintenance.

The focus on component level integration should change how systems are designed and built. To compare it to electronics, doing custom circuit design now means ‘reading the datasheets for the ICs’ instead of trying to figure out the fundamental building blocks with physics. It should be just as crazy to start a consumer electronics project with an RTL diagram as it should be to start a software project without a support system. I’ve seen so many projects start in a vacuum because the developers would rather code from scratch than learn some annoying external tools. The project pushes the complexity from the application’s development interface to the operational interface. If the project grows up, it then needs very powerful operational tools to integrate it into a larger system. As complexity tends to cause problems, the system integration becomes the most time-consuming component instead of the developed business logic. So instead of starting with the hardest part and working the way down to the easy parts, the system is made more complicated because the simpler parts are solved first. There are so many existing solutions for development out there that require operations-style learning that turn-off coders who’d rather do it themselves. Looking that the system architecture first will help with later integration. The new code development can then focus on the gaps between the existing systems and the business needs. Operations are the streamlined to support fewer and hopefully less complex application interfaces. To do this effectively, any developer needs to treat operations management as a first concern. Writing code all day without knowledge of where it is going or how it will get there only works with a large organization that can completely manage those concerns. To be effective in smaller teams, developers need to be comfortable managing their own environment and understanding the operational environment. If developers shy away from learning new libraries or frameworks, then they are limiting their view of other operational oriented tasks.

The key gain from this approach, which isn’t obvious, is that by leveraging simpler general purpose tools and shrinking the role of custom-built software, the support network effects share the operational overhead. This increases the development velocity for business needs. This isn’t a novel idea, but I think it’s often ignored as the pressing concerns of legacy maintenance make this a very difficult transition. DevOps transformation was a buzz-word rallying cry for so long that I think most technologies got lost in the tools to support complex application migrations instead of the system-design implications that come from putting popularly available components and operational concerns first.