I was sitting on twitter today and a stream of tweets rolled past that made me… a little annoyed at first, then sort of intrigued because of the amount of discussion and brain wrangling they triggered.
To be fair to Chris, I agree with a lot of what he was saying in his Real DevOps stream. But I fear that among the good stuff were a few tweets were either worded badly or wildly off the mark. I’ll admit I was initially tempted to dissect individual tweets but I think it’s more interesting – and less churlish – to get the Yet Another What Is DevOps post out of the way.
So, let’s get one thing clear, and then I’ll get started. A lot of people don’t know what “Devops” actually means. Here’s what it is, and here’s an opportunity for you to get your “no it’s not” ready.
Devops is the application of lean manufacturing principles to the IT Lifecycle
Deceptively so, because it’s what that little sentence implies that makes this whole thing so interesting and contentious.
Lean is, at its heart, about reducing waste – or Mudas. Wasted time, wasted materials, wasted money, wasted inventory, wasted effort. These are all things we try to reduce in a good DevOps shop. Reduce waste, move faster, repeat.
Automation is one of the parts that IT folks focus on the most, to the extent that I’ve heard DevOps described as “implementing a Continuous Delivery Pipeline then moving on to the next company”. I nearly threw my beer at the stage. The only thing that stopped me was that the bar was closed during talks.
However I digress.
The sentiment expressed by that particular speaker at that particular meetup – that you put in CD and then you’re done - could not be more wrong. Yes, Continuous Delivery is integral to Devops, but it is not the be all and end all. Not even slightly.
CI and CD reduce wasted effort, and reduce waste generated through human error. They’re absolutely essential to a good Devops shop but they’re not the whole shebang.
Wasted effort appears in a number of other ways, such as in Chris’s assertion that you should be reading your logs every day. I disagreed with this, though it may well be a wording thing.
I certainly don’t mean “never read the logs”, that would be insane. If you’re never going to read them, just generating them is waste. But I do mean that you should employ automation to reduce and correlate your logs, then extract the interesting things that might actually need human intervention. Then you read that. Maybe every day. Maybe constantly. Maybe pump the interesting things into Slack (we do). Maybe alert your on-call guys to really bad things by integrating NewRelic or Raygun with OpsGenie (we do). Push them into LogStash and examine trends (we do that too). But reading your logs every day is very TradOps. Also, working at scale means you may have gigabytes or terabytes of logs across many different sources. You ain’t reading that shit. Make a robot do it.
If your log-digesting tools are really smart, they could even intervene on your behalf when they find an entry they know how to deal with. My team is currently looking into this possibility with a tool we’ve nicknamed RobotMedic, though I’ll admit that particular skunkworks project is not quite production-ready and is less Medic and more Butcher right now.
We also seek to reduce inventory, so Devops implies Cloudops. You can try to do some Devops things with physical hardware, and you can kind-of do Devops-lite on virtualised infrastructure, but ultimately you’re wasting inventory in the form of idle processor time, unused hardware and overprovisioned clusters, and possibly wasting money in the form of empty, overpurchased rack space. Unless your load profile is completely uniform, this is an inevitable, unavoidable fact of life. With Cloud, you can rightsize your infrastructure for an agreed minimum load, and automatically scale out as load increases, efficiently using compute cycles and saving money. And you can turn off your pre-production environments outside office hours and incur no cost, and spend the money you saved on Lego.
This is why I often opine that you really can’t do Devops without Cloudops.
Another of the Lean Wastes is wasted time. So Devops implies the removal or reduction of bottlenecks in processes, the extirpation of long lead times and a plethora of self-service tools, so that your manufacturing team – your developers – aren’t waiting around twiddling their thumbs while some bloke in a black t-shirt plods along installing servers for them.
Strong planning is another thing that can reduce wasted time, so if you’re doing Devops, you should probably be doing some kind of Agile process to plan out your work and reduce two other Mudas – overprocessing and overproduction.
You can also reduce the amount of unnecessary code you write (overproduction) by componentising and re-using code, so some kind of SOA, even microservices, are probably pivotal, depending on what kind of applications you’re writing. Duplication is evil and must be destroyed, especially if the duplication in question is re-implementation. Overprocessing can be dealt with by the startup mindset of Minimally-Viable Products, coupled with fault tolerance, feature flagging and A/B testing. Get the features out of the door, see how they perform, iterate.
Documentation is another place where waste often occurs. There are companies that insist that developers (and by extension in a ‘devops’ shop, ops guys) maintain code comments AND keep a wiki or SharePoint site – or worse, a fileshare of word docs – full of duplicated and often outdated documentation. Duplication is waste. Out-of-date documentation causes defects and distractions. Document your code in your code, including your servers because if there’s one thing the cloud has taught us, it’s that servers are software objects now. Then, if you need to surface that documentation, get a tool to generate it. If you’re building APIs, and you probably ought to – see my SOA point above - use something like swagger. If you’re using PowerShell, target Get-Help. Make robots do the work for you. You need documentation, but you need it close to the bone.
Defects are yet another waste, and a big one – so unit testing of code and test automation are a must, parallelised where possible so you can continue to be productive while your tests run. Our devs have been unit testing for ever, and we have dedicated test teams, and even our cloudops team are starting to get into unit testing using pester. We’ve even integrated pester testing into Octopus Deploy, halting the roll-out of ops code where tests fail. Ops code coverage is still patchy, but we’re getting there.
Reducing defects also leads me to take issue with Chris’s tweet seen here. Yes, I said I wouldn’t do this, but this one is kinda critical
#DevOps Real DevOps is: Knowing that a single error could force you to rewrite the entire system, so you build very very very carefully.
— Christopher Mahan (@chris_mahan) September 2, 2015
I get the sentiment, but this one is just not right. DevOps implies scale and velocity. You move fast, think big and roll forward, and this means errors will happen. The idea that a single error may wreck your entire system or force a rewrite is really, really bad.
Martin Fowler wrote a piece on tolerant readers as they apply to service-oriented, and particularly microservice – architecture back in 2011. These principles equally apply to Devops, which is deeply embedded in the service mindset. If your system can’t tolerate a few errors, it’s not a modern system. It’s a disaster.
A non-tolerant system amplifies small defects into large defects and does exactly the opposite of what a lean system should do. Tolerate errors, log them, fix them, roll forward. Be idempotent. Plan for failure. Build carefully, yes, but build tolerant too.
Move fast, and don’t be afraid to break shit.
Aggressive reduction of wastes is what allows my four-man team of Devops poster boys to manage more than 400 EC2 instances, 130 Octopus Deploy projects, up to 15 Elastic Search clusters, ~250,000 lines of PowerShell code and hundreds of terabytes of app code and content. Four guys. And we’ve got time to write pontificating, ideologically-driven quite-probably-wrong blog posts about the nature of Devops. We’re not perfect. We’re human. We still have lots of waste around the place, but Rome was not built in a day.
I could go on about this at great and tedious length, but let’s finish with a tweet from Chris with which I wholeheartedly agree:
Yes, but not only from companies. A lot of what you read about DevOps on twitter comes from people still doing TradOps but with a CI pipeline, or people with basic misconceptions about what DevOps is, or people who aren’t really paying attention. Couple that with a lot of nonsense from vendors trying to sell something or recruiters trying to make a commission, and it’s a mess out there. Don’t believe the hype. Devops as a principle is simple. But what emerges from Devops is both complex and kind of cool.