Back to Blog

Calling Yourself a Platform Engineer Because You Write Helm Charts Is Like Calling Yourself a Chef Because You Own a Microwave

The DevOps job market is drowning in microwave operators calling themselves chefs. The consequences are not limited to slow deployments.

By Catalin Lichi · Sugau Pty Ltd


I am going to say something that will make some people uncomfortable.

The DevOps job market is drowning in microwave operators calling themselves chefs. They can press the buttons. They know the presets. They will confidently tell you the lasagne is ready in four minutes. What they cannot do is tell you what happens when the kitchen catches fire — and more importantly, what they should have built before it did.

This is not a personal attack. It is a documented organisational risk. And the consequences are not limited to slow deployments and misconfigured ingress controllers. The consequences, as history has recently shown, include catastrophic security failures that no Helm chart in existence could have prevented or contained.


What the Market Calls DevOps

Browse any job board in Australia today. Senior DevOps Engineer. Platform Engineer. Cloud Native Architect. The requirements are remarkably consistent — Kubernetes, Terraform, CI/CD pipelines, ArgoCD, Helm. Maybe some Prometheus. Maybe some Grafana. Tick the boxes, pass the interview, start Monday.

I have nothing against these tools. They are legitimate. Some of them are genuinely excellent. ArgoCD is a well-built piece of software that solves a real problem.

But ArgoCD is git diff plus kubectl apply with a UI on top. Deploying it is an afternoon’s work. Writing a Helm values file is configuration, not engineering. Setting up a GitHub Actions pipeline is plumbing — useful, necessary plumbing, but plumbing nonetheless.

None of it answers the question that actually matters: what does your architecture do when something goes catastrophically wrong at a layer you never thought to look at?

That question is DevOps. Everything else is preparation.


What DevOps Actually Is

DevOps is the discipline of owning the full lifecycle of software in production. Not the deployment lifecycle. The full lifecycle — including the failure modes nobody planned for, the dependencies nobody audited, the attack surfaces nobody mapped, and the blast radius nobody calculated.

It means asking who maintains the libraries your platform depends on. It means knowing the difference between a CNCF graduated project and a sandbox project and why that difference matters when you are choosing what runs in production. It means understanding that your CI/CD pipeline is not your security posture — it is one layer of a stack that requires defence in depth at every level independently.

Real DevOps is building the architecture that contains the failure nobody predicted. That requires understanding what is happening at the kernel level, at the network level, at the dependency level, and at the human level — because as we are about to see, humans are an attack surface too.


The Problem With Hiring Microwave Operators

When an organisation cannot distinguish a platform engineer from a deployment operator — and most cannot, because the credential inflation has made the signal indistinguishable from the noise — they make a category error with serious consequences.

They put a deployment operator in charge of platform security. They ask someone who knows how to sync a Helm release whether the organisation is protected against supply chain attacks. They get a confident answer because confident answers are what the market has trained people to give.

The gap between that confidence and the actual security posture is where the catastrophic failure lives.

This is not hypothetical. It happened. Publicly. Documented in detail. And it came from a direction that no standard DevOps curriculum covers.


February 2024. A Real DevOps Problem Arrives.

It did not arrive as a failing pipeline.

It did not arrive as a misconfigured ingress controller or a Helm chart with wrong replica counts or an ArgoCD sync that needed a manual refresh.

It arrived as xz-utils. A compression library. The kind of dependency that ships on every Linux distribution, compresses log files and package archives in the background, and that nobody thinks about because it has always just worked.

An attacker operating under the name Jia Tan spent two years contributing legitimate, high-quality patches to the project. They built trust with the sole maintainer — one person, no institutional support, no backup, no organisational cover. They applied social pressure through sock puppet accounts. They questioned the maintainer’s commitment. They pushed for faster releases. A patient, sophisticated psychological operation against a single exhausted volunteer.

Two years. Then they inserted a backdoor.

The backdoor used ifunc resolvers to hook RSA key decryption in OpenSSH on systemd-linked glibc systems. Silent authentication using the attacker’s key. No password. No certificate. No log entry. Any SSH server running an affected Linux distribution — compromised, silently, simultaneously.

Stable Debian, Ubuntu, Fedora, and RHEL were days away from shipping it to millions of servers.

It was caught by accident. Andres Freund, a Microsoft engineer, noticed SSH login taking 500 milliseconds longer than expected while benchmarking something unrelated. He was curious enough to investigate. That curiosity — one person, one afternoon, one anomaly — was the entire margin between near-miss and the largest coordinated infrastructure compromise in history.

Now. The question.

What would your platform have done?

Not your pipeline. Not your deployment process. Not your ArgoCD configuration. Your platform. The architecture beneath the tools. The layers that exist specifically for the moment when something succeeds at getting through.

If the answer is uncertain — that uncertainty is the problem.


What the Answer Looks Like

A platform engineered for this problem looks nothing like a collection of deployment tools. It looks like this:

SSH is never internet-facing. Access to nodes is gated through WireGuard with cryptographic peer authentication. The backdoor required internet-accessible SSH authentication. That surface does not exist.

Internet-facing workloads run inside gVisor sandboxes. gVisor interposes a user-space kernel between the container and the host. The specific mechanism the backdoor used — hooking RSA key decryption in the host OpenSSH process via a shared library loaded into the host address space — cannot function inside a gVisor sandbox. The structural boundary makes the attack vector incoherent.

Falco monitors system call profiles on every node in real time. SSH authentication has a known, stable syscall pattern. The 500 millisecond anomaly that Andres Freund noticed manually becomes an automated alert within seconds of first occurrence. You do not rely on one engineer’s curiosity on one afternoon. You operationalise the intuition.

Dependencies are pinned at verified git commit hashes. Images are built from source, not from release tarballs that can be silently tampered with between source and publication. The discrepancy between the xz-utils git source and the compromised tarball would have been caught at build time. The compromised image never reaches production.

Cilium enforces zero-trust east-west network policy at the eBPF layer. Even if a compromised workload runs — it can only reach what it was explicitly designed to reach. Exfiltration requires an egress path. That path does not exist without explicit policy.

None of this is in a Helm chart. None of it is in an ArgoCD application manifest. All of it requires understanding what is happening below the abstraction layer — at the kernel, at the network, at the dependency chain, at the human attack surface that no tool automatically monitors.

That is platform engineering.


The Hiring Consequence

Every organisation that cannot make this distinction is running a risk they have not quantified.

They have a deployment operator who confidently describes themselves as a platform engineer because the market taught them that confidence is the product. They have no Falco rules watching syscall profiles. They have no gVisor boundary on internet-facing workloads. They have no dependency pinning policy. They have no reproducible build pipeline. They have no WireGuard-gated SSH.

They have a very clean ArgoCD dashboard.

When the catastrophic failure arrives — and it will, because xz-utils was not the first and will not be the last — the dashboard will be green. The pipelines will be passing. The Helm releases will be synced. And the blast radius will be unlimited because nobody built the containment architecture.

Hiring the wrong person for this role is the same category of mistake as running production on a single availability zone. It feels fine until it catastrophically is not. The difference is that AZ failure is visible and immediate. A supply chain compromise is invisible, patient, and surgical — and the people who should have been watching were busy making sure the deployment pipeline was clean.


The Question Worth Asking

If you are a CTO reading this — one question is worth taking back to your team.

Not “are our pipelines working.” Not “is ArgoCD healthy.” Not “are our Helm charts up to date.”

This one: what does our platform do in the thirty seconds after a compromised dependency reaches production?

If your team can answer that question with a specific architecture — specific tools, specific layers, specific blast radius at each failure point — you have platform engineers.

If the answer is a pause followed by a discussion about monitoring dashboards and alerting thresholds — you have deployment operators doing a platform engineer’s job. That gap is not their fault. It is a hiring problem, and it is solvable.

But it needs to be identified before the compression library arrives. Not after.


A Final Word to the Deployment Operators

This article is not an insult. It is an invitation.

The gap between operating a microwave and running a kitchen is not talent. It is exposure, depth, and the willingness to go below the abstraction layer and understand what is actually happening. Most people who stop at the tools stop there because nobody showed them there was more.

There is significantly more. The kernel is not frightening once you spend time there. The network layer is not mysterious once you have written eBPF rules. Supply chain security is not abstract once you have read the xz-utils post-mortem carefully enough to understand the ifunc resolver mechanism.

The catastrophic failure is coming for every platform that is not built to contain it. The people who build the containment architecture are the ones who went looking for the problem before it arrived.

That is the job. The Helm chart is just the beginning.


Next: the full architecture that would have contained the xz-utils backdoor at every layer — and why it requires full stack control to implement.

Catalin Lichi is the founder of Sugau — a bare-metal Kubernetes consultancy specialising in sovereign infrastructure for regulated and security-critical environments.