The Bigger Picture Blog

  • Engineers Care Less About The OS

    Last Week in AWS is another one of my favourite blogs, but this weeks one titled ‘Nobody Cares About the Operating System Anymore’ definitely got me thinking.

    If you’ve never read Last Week in AWS the thing to note is that scores really high on the snark factor, and like most blogs it tends towards the hyperbolic titles to attract readers and discussion, after all, it’s not as pithy, and not going to attract as much attention if the title is ‘In the cloud, Engineers don’t really care about the Operating System anymore’. Which is probably why my blog is much less snarky, and much more boring.

    However, in this case I think the premise is wrong. It’s not that even engineers no-longer really care about what Operating System you choose but that Cloud based Virtual Machines (as a catch all for EC2s, Azure VMs and whatever GCP calls compute) is no-longer used and for me is actually an indicator of ‘infrastructure smell’. I.e.: that something is old, or unpatched, or not cloud native. For me, if I have to use a VM in hyperscale cloud, something has gone wrong somewhere.

    There are instances where you might use VMs, or more commonly called VPS, and dedicated boxes, which you find on more commodity providers such as DigitalOcean, Scaleway, or RedStation (to broach the tip of the iceberg), but when you have SaaS, FaaS, and Serverless Docker environments such as Fargate and Kubernetes, I really struggle to see the value in using dedicated, always on, compute. If anything its technical debt that you need to monitor, maintain, patch and scale.

    As a side note, yes you sometimes have to choose a base Linux distribution for Docker images, but this is essentially a game of pick your preferred package manager, and then it’s done and you never ever change that FROM value except to update the version number.

    I can see why Corey would interpret that as nobody cares about the Operating System, but the broader picture to me is not that the Operating System doesn’t matter, it’s that we’re at a point in technology in hyperscale cloud that the technologies allow us to abstract away VMs, making the variety of choice of Operating System is less relevant, but it’s the cloud providers abstraction layers that enable that, rather than engineers not caring.

  • Chaos Engineering Your Team

    Mountain Goat Software runs possibly one of my favourite blogs at the moment. This recent post “Should Your Team Adopt No-Meeting Weeks” really resonated with me.

    I read an article while back, but of course I cannot find now, about how you might want to consider applying Chaos Engineering, not only to your technology, but also your people. A random person gets selected maybe once a week, to take the day off. Simulating a sick-leave like situation.

    I’m a huge proponent of the idea that no single person should be critical to your business, and in fact they become a massive risk to your business that is only realised once they leave. It also creates an uncollaborative working environment because you will have a lot of different pressures from different people on one person. This is covered a lot in The Phoenix Project.

    Using the principles behind Chaos Engineering you can build in expectations in to your team that you should be able to survive the loss of a person.

    What I like with the experiment mentioned in the post, taking a random person out of meetings, was that this it is similar to Chaos Engineering your team, but at a smaller scale. In the situation where the person, who is critical to your meeting, gets unceremoniously booted out of said meeting, you have identified a process that doesn’t account for the loss of one person.

    The other issue is that if people learn they’re not necessary for a meeting they stop coming, which I assume is the intention. However now, you start kicking out important people from the meeting, such as the executive authoriser who was there specifically for that meeting.

    In the post these issues was explained as the experiment failing. I would look at it from the other direction, it’s a failing of the processes to have a dependency in a single person.

    As CGP Grey often says, one is none.

    Now don’t get me wrong, I’m not saying you need two (or more) executives in the meeting. I’m taking the broader picture. Does everyone really need to be in the same room at the same time? In my experience I would say 90% of the time the answer is no.

    Instead the major discussions should happen before the meeting, on your preferred collaboration tool. Any reading material can be distributed before hand and consumed at your leisure. If someone has a vested interest, such as an executive authoriser, they can raise any concerns before anyone steps in the room.

    The meeting becomes a formality to say “a decision needs to be made by this date and time”. It’s more like that cliche moment in weddings in movies where the vicar says “Speak now or forever hold your peace”. The meeting should be 5 minutes tops.

    The only other time people might want to meet at the same time is if something was being presented, such as a town hall or a sprint review. In those situations you should ensure that those meetings are recorded and published so that others can view and ask questions later if they were interested, but weren’t able to attend.

    This leaves everyone free to work on what’s important, when it’s important. Not spending their entire day in meetings being presented material they could have accessed at any time. Or using time less efficiently because they’re winding up to a meeting.

    But, as the original post admits, this may only work in a Google-eque business process utopia.

  • Common Sense as Bad Practice

    It came to me in conversation recently as to how toxic “Common Sense” is.

    Common Sense says “well, everyone uses X, so we should use X” or “Everyone should know this.”

    But those are fallacies and assumptions that aren’t tested or rooted in evidence.

    Usually Common Sense comes from asking your mates or colleagues, or assuming they think the same as you. However, by doing that you inherit bias in those assumptions. You make decisions that don’t hold true, and at best you are likely going to make poor decisions, and at worst create an exclusionary environment – locking out people who you will want to include.

    If you believe it is Common Sense, prove it is Common Sense. Test it out. Trial it, and properly. Don’t be part of the problem.

    Also, challenge “Common Sense” when it is used.

    Why is it common sense?

    Who else thinks they know that as the answer?

    You should end up with a better solution that is more open and equitable than your assumptions.

  • Assessing Security Practices of 3rd Party products

    In recent months I’ve been involved in discussions about whether remote working tools are “secure” or not. The answer to any blunt question like that is, as always, “it depends”, but this is as helpful as getting financial advice from YouTube adverts.

    It struck me that a lot of people interested in IT security often judge tools based upon how many vulnerabilities there are in a product. But lets be accurate here, they are judging it on how many security vulnerabilities are reported, or visible.

    Once the pandemic forced us all to work from home, Zoom seemed to be the target for every “white hat” hacker consultancy generously giving their time to declare the 0 days they found to news websites, with nothing more in return than their company name to be placed along side the advert article.

    These articles seemingly leading to several instances of “Enterprise” security teams declaring Zoom as insecure and made attempts to block its usage in their environments.

    But is Zoom unsecure? And does blocking it improve things?

    If you block Zoom, what will people use instead? Google Hangout? Skype? Chime? WebEx? Some random tool they found on the internet? Is blocking Zoom making things more or less secure?

    Zoom got the attention because its user base and visibility increased massively during the pandemic. Admittedly it did have some low hanging fruit, but was it more or less of a threat to a business than your employees using something like Omegle for work?

    If we used the same metrics in which we judge Zoom and apply it to Windows or Linux those security teams would block almost all operating systems out there except maybe OpenBSD and BeOS.

    IT security shouldn’t be reactionary#

    IT security needs to be evidence lead. Would you say now that Zoom, after 6 months of global usage it is not secure? Other than the occasional person posting their Zoom codes, which is the Video Conferencing equivalent of calling your S3 bucket big-bank-data.

    So can how can we be less reactionary in future?

    One thing is to recognise that Zoom fixed a lot, if not all, the issues that the security researchers were making a fuss over. Usually within days or weeks.

    Where-as shouldn’t we trust GitHub less for having a long standing issue it’s been unable or unwilling to fix?

    What I’m proposing is instead of making decisions based upon what we read in the tech news this week, we measure good security practice from 3rd parties on how quickly and responsibly vulnerabilities in their products are announced and fixed, and be pragmatic about what your users need and expect from their IT.