Chris Hoff, a former colleague now at Juniper Systems, and a great blogger in his own right, penned a piece last week about the weak underbelly of automation: our decreased opportunity to react manually to negative situations before they become a crisis. Hoff put the problem extremely well in the opening of the post:
I'm a huge proponent of automation. Taking rote processes from the hands of humans & leveraging machines of all types to enable higher agility, lower cost and increased efficacy is a wonderful thing.
However, there's a trade off; as automation matures and feedback loops become more closed with higher and higher clock rates yielding less time between execution, our ability to both detect and recover -- let alone prevent -- within a cascading failure domain is diminished.
I've stated very similar things in the past, but Hoff went on to give a few brilliant examples of the kinds of things that can go wrong with automation. I recommend reading his post and following some of the links, as they will open your eyes to the challenges we face in an automated IT future.
One of the things I always think about when I ponder the subject of cloud automation, however, is how we handle one of the most important--and difficult--things we have to control in this globally distributed model: legality and compliance.
If we are changing the very configuration of our applications--including location, vendors supplying service, even security technologies applied to our requirements--how the heck are we going to assure that we don't start breaking laws or running afoul of our compliance agreements?
It wouldn't be such a big deal if we could just build the law and compliance regulations into our automated environment, but I want you to stop and think about that for a second. Not only do laws and regulations change on an almost daily basis (though any given law or regulation might change occasionally), but there are so many of them that it is difficult to know which rules to apply to which systems for any given action.
In fact, I long ago figured out that we will never codify into automation the laws required to keep IT systems legal and compliant. Not all of them, anyway. This is precisely because humanity has built a huge (and highly paid) professional class to test and stretch the boundaries of those same rules every day: the legal profession.
How is the law a challenge to cloud automation? Imagine a situation in which an application is distributed between two cloud vendor services. A change is applied to key compliance rules by an authorized regulatory body.
That change is implemented by a change in the operations automation of the application within one of the cloud vendor's service. That change triggers behavior in the distributed application that the other cloud vendor sees as an anomalous operational event in that same application.
The second vendor triggers changes via automation that the first vendor now sees as a violation of the newly applied rules, so it initiates action to get back into compliance. The second vendor sees those new actions as another anomaly, and the cycle repeats itself.
Even changes not related to compliance run the risk of triggering a cascading series of actions that result in either failure of the application or unintentionally falling out of compliance. In cloud, regulatory behavior is dependent on technology, and technology behavior is dependent on the rules it is asked to adhere to.
Are "black swan" regulatory events likely to occur? For any given application, not really. In fact, one of the things I love about the complex systems nature of the cloud is the ability for individual "agents" to adapt. (In this case, the "agents" are defined by application developers and operators.) Developers can be aware of what the cloud system does to their apps, or what their next deployment might need to do to stay compliant, and take action.
However, the nature of complex systems is that within the system as a whole, they will occur. Sometimes to great detriment. It's just that the positive effect of the system will outweigh the cost of those negative events...or the system will die.
I stumbled recently on a concept called "systems thinking" which I think holds promise as a framework for addressing these problems. From Wikipedia:
Systems Thinking has been defined as an approach to problem solving, by viewing "problems" as parts of an overall system, rather than reacting to specific part, outcomes or events and potentially contributing to further development of unintended consequences. Systems thinking is not one thing but a set of habits or practices within a framework that is based on the belief that the component parts of a system can best be understood in the context of relationships with each other and with other systems, rather than in isolation. Systems thinking focuses on cyclical rather than linear cause and effect.
Dealing with IT regulation and compliance in an automated environment will take systems thinking--understanding the relationship between components in the cloud, as well as the instructed behavior of each component with regard to those relationships. I think that's a way of thinking about applications that is highly foreign to most software architects, and will be one of the great challenges of the next five to 10 years.
Of course, to what extent the cloud should face regulation is another nightmare entirely.