Cutting Kubernetes Costs with kube-downscaler

May 10, 2025

Kubernetes

Read time: 3 minutes

Abhimanyu Saharan

Kubernetes offers great flexibility and power—but with that comes the risk of overprovisioning and, consequently, overspending. Especially in dev and staging environments, workloads often sit idle during off-hours, silently burning compute resources and cloud budget.

After digging around for solutions to this exact problem, I came across kube-downscaler and gave it a try. Here's what I learned, how it fits into a real-world stack, and why I think more teams should be aware of it.

What Is kube-downscaler?¶

kube-downscaler is a lightweight, open-source tool that helps you schedule time-based scaling of Kubernetes workloads. Think of it as a cron job for your Deployment’s replica count.

It doesn’t replace HPA (Horizontal Pod Autoscaler). In fact, it complements it by providing predictable, proactive scaling behavior—especially useful when you know your traffic patterns (e.g., dev workloads idle overnight, traffic drops off after office hours, etc.).

Why I Gave It a Shot¶

We had a few environments that didn’t need to run at full capacity 24/7. HPA couldn’t help much in those cases because the usage was either too low to trigger a scale-down or the scale-down didn’t happen quickly enough.

We wanted something simple, scriptable, and compatible with our existing Deployments—and kube-downscaler fit right in.

Key Features I Found Useful¶

Scheduled Downscaling¶

You can define downtime windows using annotations like this:

annotations:
  downscaler/downtimePeriod: "Mon-Fri 00:00-07:00 Europe/Berlin"
  downscaler/minReplicas: "1"

During those hours, kube-downscaler automatically reduces the number of replicas to the specified minimum.

Works with HPA¶

If you already have an HPA in place, kube-downscaler won’t fight with it. Instead, it can downscale preemptively, and let HPA take over when actual load increases.

: I especially liked this behavior—because it means I can define floor and ceiling behavior separately.

Zero vendor lock-in¶

No CRDs, no custom APIs—just standard Kubernetes annotations and a simple controller. It’s transparent and easy to remove or override.

Installation Was Straightforward¶

Deploying kube-downscaler to a cluster was simple:

From there, it watches your Deployments and scales them according to the annotations you set. No webhook configurations, no operator complexity.

Where It Fits Best¶

Here’s where kube-downscaler shines, based on my experience:

Non-production clusters (dev, staging, QA)
Teams working with known traffic windows (e.g., internal dashboards, data ETL jobs)
Workloads with stable replica targets (e.g., 3 during the day, 1 at night)

: It’s especially helpful when you're running Kubernetes in environments without aggressive autoscaler support or want to reduce unnecessary HPA flapping.

Tradeoffs and Limitations¶

To be fair, kube-downscaler isn’t perfect.

It doesn’t support metrics-based dynamic scaling like Keda.
You’ll need to manually define your schedules.
If your team operates across multiple time zones, managing time windows gets more complex.

That said, it does exactly what it promises—and does it well.

How It Compares to Keda¶

If you're looking for event-driven or metric-based autoscaling, Keda is a better fit. It integrates with external systems (Kafka, Prometheus, queue depths) and can autoscale both Deployments and Jobs.

But if all you need is “scale down after hours, bring it back up in the morning”—kube-downscaler is simpler, more transparent, and gets the job done with minimal moving parts.

Final Thoughts¶

We often think of autoscaling as something fancy—driven by metrics, algorithms, and reactive logic. But sometimes, a good old-fashioned schedule is the most effective and predictable way to control costs.

If you haven’t already, give kube-downscaler a try. It might not be the most glamorous tool in your stack—but it’s a practical one that can save you real money with almost no complexity.

Let me know if you're using kube-downscaler in production—or if you've found another lightweight way to reduce idle workloads. Always curious to learn from how others manage cluster efficiency.