Sharing my thoughts on software development

Monday, October 12, 2009

The problem of too many layers of indirection (abstraction)

So everyone has probably heard this by now
All problems in computer science can be solved by another level of indirection
and the corollary
...except for the problem of too many layers of indirection
While it sounds smart and witty, I have never quite figured out what the corollary is referring to. After all, our society is built based on layers and layers of indirection abstraction, and that's how we advanced into modern society. It is a proven concept.

Well, let me step back a little and talk about an interesting issue I had recently. It was decided that our enterprise contract management software is not responsive enough and a few engineers were tasked to take a look at the problem.

Thanks to the advancement of software development, we now have awesome tools like dotTrace, among others, to simplify the daunting task of inserting time stamping instructions into every single method in the application. Looking at the profiling result, an interesting method quickly grabbed my attention -- it is a heavy lifter which makes up 40% of the overall page load time. Upon closer investigation, I realized that it is our new navigation tree which loads a gigantic metadata file that contains all information there is to know (like data dependency, context relation, data validation rule, permission rule and etc). To be more specific:
  1. it was reloading the (static) data every page load.
  2. only a small amount of data (those related to current page) is actually required.
  3. it was executed twice on every page load.
Let's forget about the first two, and focus on the third problem. How did this happen? I mean, this looks like a simple problem that even junior programmers would know to avoid.
Well, the truth is, the original developer implemented the navigation module nicely, with caching and stuff, so the heavy-lifting method will never be called twice. Then a few months later, someone had to fix a caching bug -- the cached navigation tree became corrupted for some unknown reason. He looked around, and found a little method that was nicely packed and seems harmless, which will solve his problem by rebuilding the corrupted data. It was a perfectly logical choice on his side, although little did he know that about 3 abstraction layers down the road, it reads a gigantic xml file and create a few hundred objects on the fly.

Maybe the original developer should have documented this code better, maybe the other developer should have been more cautious when using other people's code. But the real issue is, abstraction hides so much detail that gives you a false sense of confidence. It makes you believe you know everything, after all, the method name and comment will be sufficient to describe what it does right? (hint, no) If it does, it would have to explain what all its function calls do, and the functions called by those functions it calls, and the functions called by those functions called by those functions it calls...

Every abstraction layer does not only adds a little over head to the CPU, but also to the poor human who has to read that code. Be careful, those little overheads may come and bite you one day.

No comments:

Post a Comment