Thursday, June 05, 2008

Problems in Configuration and Logging

" ...........
...........
Lock-Nah: The Book Of The Dead gives life.
Meela: And The Book Of The Living takes life away.
...........
...........
"


The dialogue between Lock-Nah and Meelah, from the movie The Mummy Returns sums up the state of the configuration files used in a software system. Often, I have observed that the most well-intentioned configuration files either end up as a "Book of the Dead" or as a "Book of the Living".

Configuration & Logging have always been the bone of contention & the heart of any discussion on software architecture. The questions always come up & hang in the air :-

- How will I configure that ?
- Where is the metadata for this ?
- How do I log this ?
- Can we use a universal format for the all the logging ?
- Where do I log the Exception Stack Traces ?
- .....
- .....
- .....

I just chanced upon this Blog entry by Dan Pritchett . The article talks about Metadata & the various decisions that need to be made while using Metadata. The article is nicely organized & well written.

Dan talks about two aspects of Metadata - Configuration & Telemetry ( Logging, actually ).

Configuration

Dan has clearly articulated the problem associated with distributed configurations, synchronizing between the configurations, etc. I have faced these issues many times in the past, especially with systems that contain lots of configuration files.

The key to get the system operational is to "turn the knobs" in the correct combination. Often, this kind of configuration mechanism falls into the realm of black magic : often, only a seasoned veteran master-architect can cast the correct arcane spell to tame the beast. The multitude of configuration files simply lay all the precise documentation to waste. The problem becomes even more complex when the system is replicated accross multiple servers : the arguments now move into a new space - how to maintain a "common" configuration & extend it to a specific "granular" configuration required by a single server.

Telemetry

Dan also touches upon the topic of Telemetry ( a term I haven't heard since my college days ! ). How do you measure / monitor the health of a system remotely ? Well, the only source of this information is the log files. The problem is simple - consider a System X that has parts A, B and C, & these parts emit log information in specific formats. How do you aggregate the information from these three different formats into a single format that can show up in a remote dashboard ?

I think these questions must be addressed very early in a Project. The questions raised by this article are extremely important & could save countless hours of effort involved during maintanence & debugging.

Overall, a very good article from Dan Pritchett. It's a must read for people who earn their pay by designing software systems.

No comments: