Every so often I get a request, from one or more of our developers, for Remote Desktop access to the servers running their code – be it for troubleshooting, configuration or some other arcane purpose.
My answer is almost uniformly “no”.
“But surely,” says the cat “you’re a super-futuristic DevOps shop spoken of in breathless terms by national IT publications? Don’t you trust your developers??”
Of course we do. But…
- We also have a highly-automated server build chain, which joins up with our highly-automated CI & CD chain. Making ad-hoc changes on any one box breaks that chain. Config changes, unless as tests, need to be made in the chain.
- We run on Windows Server Core Edition, so anyone interacting directly with a node had better have pretty good PowerShell chops, and despite all our best efforts, that’s still a slightly specialist skill
- We rely heavily on auto-scaling in AWS, meaning that at any given moment, a node may be scaled down and retired, or manually shot by a devops guy, or have its instance size adjusted. Or it may just die for no readily apparent reason while you’re messing around with it. Our boxes are cattle, not pets.
- We’re Highly Available. So you might be logging on to one node in a cluster of ten or more. Or you may end up with ten RDP windows open, at which point keeping track of what you’re doing where becomes… tricky
- We multi-tenant different services and sites together. So you might – unintentionally or otherwise – be interrupting someone else’s application while trying to troubleshoot your own
- We use other services than just EC2, and there’s an industry trend towards “serverless” hosting systems. Elastic Beanstalk, for instance, is meant to appear serverless. Lambda, while actually running on servers, is one step further in that there is no way at all to access the servers directly. It’s all abstracted away. New services like this emerge almost daily, and a number of our offline services, which previously ran on “servers” are being re-engineered into “serverless” applications.
- And lastly we want an accountability chain. Can’t just have nodes being manipulated by fallible humans all the time. How would we know what got changed? We want changes logged and RDP doesn’t give us the degree of oversight we want.
The upshot if this is that we think about compute differently, and we want our Devs and Ops guys to stop thinking about servers the way we used to think about servers. So we say no to RDP access.
Luckily, there are other ways to get information from our nodes, because we’ve thought about that in some detail.
- Application and System logs get surfaced into Cloudwatch Logs
- IIS logs get surfaced into LogStash*
- Access logs (from the load balancers) are pushed into an S3 bucket on a constant basis
- We have services like Raygun and NewRelic for exception monitoring and performance analysis.
- And most importantly, every windows node has an Octopus Deploy tentacle.
Let’s expand on that a little. Octopus Deploy, as regular readers of this blog know, is nominally a deployment engine which allows you to push code onto servers at scale in a reliable, controlled fashion.
This inherent capability means it’s also a great engine for pushing around ad-hoc code at scale, using the script console. By firing off commands in the console, we can simultaneous run PowerShell on any number of nodes, even the entire Windows fleet at once. We can pick up files off the disk and surface them using New-OctopusArtifact, or push them to S3 using the node’s instance profile for access. Or we can just crack them open and write them back to the console. We can make direct HTTP requests bypassing the load balancers. We can reset things, start things, stop things and generally do anything we want. We can even make configuration changes (though it’s generally not advisable for the reasons mentioned above). And it’s all logged.
And we don’t generally aim code at “servers”. We pick an Octopus Role and Environment, and fire it at every node that happens to match the criteria.
EC2 Run Command does something similar now, but so far doesn’t allow us to filter the way we want to filter. Octopus, for us, is still superior.
So that’s why we try to discourage thinking about servers specifically and think of them more in the abstract – and instead to think at the application level.
Servers are just a commodity now. A substrate on which the more interesting stuff happens You shouldn’t really think about them. Think of your compute environment, not your servers.