F# Success with a Single Record · These are questions for wise men with skinny arms

F# Success with a Single Record

This is the next step in my journey with F# in writing a small application.

I decided to undo most of the reactive extensions work and get back to using simple functions wherever possible. I had let the project sit for long enough again to come back to it with fresh eyes. I saw where I could have made the FRP code more manageable by using bigger blocks in the data stream, but I was already committed to nearly start from scratch and apply everything that I had learned since I started in the hopes of having a complete solution from a single philosophy.

Instead of designing and coding at the module level and working down, I started with the core business logic that had been problematic in streams and worked my way up to put the whole program together. At first I found that I fell into my familiar OO patterns. I had big record types with lots of data members with awkward splits between member methods and module functions. I focused on clarifying the business logic in simple functions with descriptive types instead of thinking about the design of the system in the large. This data and logic first approach naturally started to separate those big types into more distinct structures and functions as I refactored to make coding easier along the way. Instead of deciding on the structure first and then coding for it, I was letting the format of the code drive the larger parts of the structure. It was far quicker to refactor after I saw patterns than attempting to plan for all eventualities at the start. Instead of making objects first and then filling them in, I was making functions first and only creating custom types or records when I needed that functionality to simplify the structure of the code.

I wasn’t specifically trying to be agile in how I coded, but working in that mindset didn’t cause me the growing pains I’d experienced with delaying big up-front design before. Creating fewer abstractions from the start meant I needed fewer changes to evolve my models, and the ability to create very powerful patterns easily meant I could save myself time with more tools going forward. When I did get to a point where I needed to make a decision about a design, I had concrete examples to compare, which was huge when I otherwise couldn’t yet ‘think’ in the language. I still had to translate ideas in my head from how I’d do it with OO patterns to how I’d do it with just functions. Normally this sided on making the code very procedural and single purpose instead of abstract and generic. In Python, this would have just made the code extremely brittle, as I didn’t have the conventions yet that would keep a consistent data and control flow. In C++ it would just mean that I’d be doing menial refactoring all the time since the easiest path there is generally the lowest level implementation and least extensible for changes. With F#, the code was ‘generic enough’, where it was robust yet flexible. I focused on simplicity and clarity, both in syntax and design, and found that it helped understanding the code when working with unfamiliar functional patterns.

What I noticed in my refactoring of F# was that I’d change larger and larger portions of code until I found a good design. Normally I’d complain that this was a sign of the code being inflexible, but instead it made me think that not doing this generational refactoring could cause consistency problems in style and intent. Instead of growing the code a bit at a time and leaving each piece untouched after I finished it, I could quickly iterate over the whole design as a unit. This led to a concise final result because the starting point and ending point were created with the same level of information about how they would be used. When I did make design-level changes with an impact across multiple modules, I didn’t feel the pain of spending large quantities of time to re-work how control or data flowed, even if it touched half the module. If my class modules were as massive as my average OO code, I wouldn’t have wanted to ever re-write it for each of the small conceptual changes I made. But every time I did re-flow the code, it came out clearer as I settled on design choices. I realized that I was finalizing my design and implementation in parallel. There was never a mismatch between how I thought about a data model and how it was actually implemented. Conceptually, this was also a big difference compared to how OO I handled refactoring in OO systems. A small change might be isolated in code behind an interface and require no changes elsewhere in the system, but it often had a wider hidden impact. The lack of data transparency for the sake of encapsulation was a much bigger roadblock than I ever considered. When the domain is data driven, small changes in data format should have a large impact on design.

The most common task in refactoring this process was to plumb new data through to a lower level function. I had heard horror stories about how bad this could get in Haskell, so one of the few up front design choices I made was to support this scenario. I made most of the top layers arguments’ records such that it was always straightforward to make quick changes to a record definition that would contain the new data. I could then insert the values at any higher layer, and then pull it out at the lower layer when it was needed. To accomplish this with objects, I would have saved the necessary data as state information within a class instance for later use. I liked making as much data as possible explicit because it was easy to understand the flow and data dependencies. When I needed to return additional data back to the caller, again I kept it consistent by nesting records all the way back up. The simplicity and consistently of this approach was fantastic. With the terse syntax for function composition, it wasn’t tedious or confusing to follow the multiple layers since I didn’t need and wasn’t using any dynamic dispatch. I could keep a similar level of separation of concerns as with classes, but with much smaller single purpose functions. The entire data flow for some paths could now fit in a single screen between a few files, whereas before it might have touched half a dozen or more and had twice as much code at each step. Using records as the primary transport between different layers naturally organized related values together as they moved around. Because each function started with the assumption that it would have access to certain data, it wasn’t as much work to try to pull in everything from different components at the end if there was a clear path of execution. Continually pushing the responsibility to the caller allowed for constraints to be handled where the most information was available instead of building restrictions at the lowest level without context. Once all of the information was available, it was much easier to work with just values and collections within a single level.

My first instinct was to keep data as separated as possible to reflect their separate concerns, but pieces started to glob together within records faster than I could break them apart. Without careful separation and design I thought it was going to be a mess of data when I finished, but the result had much less routing code to create the abstractions that were useful in working with subsets of values. Most of the functions that needed to read and break up a struct were the ones that eventually updated it. All of my instincts for keeping code ‘clean’ and organized were nearly working against me. The way to keep pieces organized wasn’t to introduce more layers of abstraction for separation but to allow the collections and values to bubble up where they were used. Instead of binding the data to a single object and attaching functions around the data for access, I was writing transformations of values and collections that happened to come out of a record. The transformations were obviously associated with certain records based on their type signatures, but they weren’t restricted to working on members of that record. This made it much easier to work with data across multiple sections of the application since I didn’t have to decide how the records would relate to one another. That whole thought process of modeling was sidestepped by wrapping the layers as they went up, attaching useful information along the way. I probably could have split out my unwrapping and logic handling more, but the most of it wasn’t generic enough to warrant re-use anyways. I was solving problems in a different way because I had much easier access to the whole program’s data. Instead of having to attach everything I wanted to do to the data around the data declaration, I put the work I did with the data closest to the pieces that needed it. This did introduce some interesting coupling aspects across the application, but it was also easy to avoid duplication by moving it to a struct member or a higher level helper if the use case was common enough. The split and usage of member functions over static members was more nuanced than what I knew from working with OO patterns, the availability and accessibility of the data changed how I thought about ownership. The data was as self descriptive as possible and that made separating it or combining it less worrisome.

The common step of looking at a class’s interface to determine what I could do was replaced with looking at the data structure I had and then directly performing the transformations I needed. The ability to access data not based on a single role or complete mental model, but on the implementation for the problem at hand was extremely liberating. When I first started trying this, it seemed almost primitive. It could have been a dangerous and painful practice to follow. The ability to break a function elsewhere in the application by changing the format of some data could have made the code annoyingly brittle and fragmented. It could have been the worst kind of spaghetti code where small changes cause unexpected breakages all over the code base. I didn’t want to have dependencies all over the code all relying on an implicit property of a collection that may change at any time, the sort of thing OO boundaries and the law of Demeter would strictly enforce. But I found that this didn’t naturally occur, changing an intermediate level wouldn’t normally affect the use of data in the layers above or below the function. The format of the data alone was descriptive enough that it was easy to quickly transform the data into the types expected throughout the application.

The intrinsic properties of the data could be made explicit as part of the type system instead of enforced by object contract and tests. The standard immutability prevented some sharing madness from causing havoc. I could easily tell what branch of my program was responsible for modifying each subset of the record that it took, just based on the type signature. If it did any modifications that needed to persist, it had to return a new one back to the caller. I never needed to worry about what functions had access to my data since I knew the subset of the functions that could modify it was clear by what was assigning back to it at the top level. The ubiquity of hidden updates where objects hide how they are affected by inputs was completely flipped on its head. This made the data dependency graph the same as the data flow graph and the control flow graph, it eliminated the problem of coordinating multiple moving parts to produce a consistent state for the application. I wasn’t applying any concurrency along the code paths, but if I needed to, this organizational flow would have made it trivial to implement without having to track down what was being shared.

When data did change formats and broke downstream uses, it was normally because the meaning of the data had changed as well and the downstream functions needed to be updated to match. For changes where only the presentation changed, wrapping another function around it could emulate the previous interface, but it at least exposed that the data was being consumed. I strived to have as few steps as possible between the data access and the results of the functions. I can’t count the number of times I’ve run across a chain of methods in a large OO project where data is transformed across multiple formats before ending up almost exactly as it started. This is because the true source of the data is often inaccessible in large OO systems, and the most accessible source that had the information is in a slightly different format than is necessary. I tried to limit the amount of data I stored to reduce this connectivity problem and this also reduced the amount of plumbing and state management required to keep it consistent. Saving intermediate results at multiple levels was more problematic than just performing the transformation from the source, although I did make some concessions when caching had a big performance impact.

Lessons in aggregate

All of these small changes in organization and consistency had a massive impact on how the format of the application grew. Repeating this pattern of starting small and building only the required logic at each layer eventually caused all of the pieces to come together into a single gigantic record that held the entirety of the program’s state. All of the layers were then responsible for breaking off a part of that record, doing work, breaking off a smaller part, doing more work, breaking off an even smaller part, doing the core logic, and then building all those pieces back up to replace the part that was originally chosen. Following the program flow in this way was trivial, everything the program needed was exposed in the top level function call. I could go up or down the chain from any point just based on the arguments without having to hunt down multiple objects that might have had a tangential effect on results. I knew the data had to come from higher up the chain and that only pieces that returned the data would be responsible for modifying it. This fit beautifully into the F# mailbox model, where the function signature allows a single piece of data and a function to operate on it. I finally felt like I had code that was both idiomatic to F# and understandable from the perspective of the business model.

I continued to enhance this program and eventually grew it into a working prototype platform for testing features and verifying logic in other programs. I wasn’t completely sold on F#, but with this I finally felt like I had enough perspective and experience to say that I was becoming confident in my proficiency. I still have more thoughts on this design strategy and F# in general, but this is the most progress I’ve made in terms of learning the language in practice.