InPlacePodVerticalScaling: Scale Pods Without Restarting
Pods that grow with your workload? Discover how Kubernetes v1.33 lets you scale CPU and memory without a restart, and when it still might not be enough.
In many Kubernetes setups, adjusting the CPU or memory assigned to a pod has historically meant one thing: restarting it. That’s because updating a pod’s resource requests or limits typically requires the pod to be recreated by its controller. While this works fine for replicated workloads behind a service, it's far less ideal when you're dealing with singleton pods, batch jobs, or long-lived processes where restarts are expensive or disruptive. In some cases, restarting isn’t even an option, think data pipelines in the middle of a transformation or analytical queries that take hours to complete.
Kubernetes v1.27 introduced InPlacePodVerticalScaling as an alpha feature, which was later promoted to beta in v1.33. This guide walks through how to enable and use this feature, what changes under the hood, and where it makes a real difference. You’ll see how to update CPU allocations on the fly without disruption and how to increase memory when needed, even if it requires a container restart. For developers and platform engineers managing resource-sensitive workloads, this feature opens up a new dimension of flexibility in Kubernetes.
Traditionally, changing a pod’s resource requests or limits meant terminating the pod and allowing a controller, like a Deployment, to bring up a new one with updated specifications. This works fine when your workload runs multiple replicas behind a Service, as traffic simply shifts to other instances during the update. But when you're dealing with singleton pods, batch jobs, or stateful workloads where scale-out isn't an option or restart delays break processing windows, this approach falls short.
InPlacePodVerticalScaling changes that. Introduced as an alpha feature in Kubernetes v1.27 and promoted to beta in v1.33, it enables you to change the CPU and memory configuration of a running pod without terminating it. Kubernetes does this by applying changes directly to the container's cgroup allocation.
It also supports resizing of restartable init containers (commonly used for sidecars), further expanding its usefulness in service mesh and observability tooling scenarios.
This has huge implications for operational flexibility. You can dynamically adjust workloads to respond to changing resource demands, whether it's a batch job that unexpectedly spikes in memory usage or a Java application that needs more CPU only during initialization.
To enable this feature, it must be explicitly activated in your cluster.
Let’s say your workload is processing a large dataset and hits a memory ceiling. You can give it more memory by patching the pod as follows:
Because our resize policy for memory is set to RestartContainer, this change will restart the container. You can confirm the restart by checking the restart count:
This restart is expected and necessary if your application doesn't support dynamic memory allocation. If your workload can handle memory increases at runtime (e.g., JVM with dynamic heap sizing), you can set the memory resize policy to NotRequired.
Once a resize is requested, the Kubelet coordinates with the container runtime to apply the new settings. The pod’s .status section gets updated to reflect both what’s been requested and what’s actually enforced.
Key fields to be aware of:
spec.containers[].resources: the desired configuration
status.containerStatuses[].resources: the current configuration on the running container
status.containerStatuses[].allocatedResources: confirmed by the node
This separation allows Kubernetes to reflect real-world drift between what was asked and what was applied, especially helpful in cases where resizing is deferred due to lack of node resources.
If you attempt to resize a pod with resource values beyond what the node can provide, Kubernetes won’t apply the change. Instead, it marks the request as Infeasible or Deferred, depending on whether the request can be satisfied later.
InPlacePodVerticalScaling offers fine-grained control over pod resource allocation. It fills a critical gap for workloads that require elasticity but cannot afford downtime. If you're running Kubernetes v1.33+, it’s stable enough to experiment with in development or staging environments, and potentially in production, depending on your tolerance for restart behavior.
With each release, Kubernetes is expanding the scope of this feature, from regular containers to sidecars, making it increasingly production-ready for complex, multi-container workloads.
Or use a real Kubernetes cluster (v1.27+), ensuring the feature gate is enabled on all nodes.
FAQs
What is InPlacePodVerticalScaling in Kubernetes?
InPlacePodVerticalScaling is a Kubernetes feature (beta as of v1.33) that allows you to increase CPU or memory resources of a running pod without recreating it. CPU changes can happen without a restart, while memory updates may require one based on the configured resize policy.
How do I enable InPlacePodVerticalScaling?
Enable the InPlacePodVerticalScaling feature gate on both the API server and kubelets by passing --feature-gates=InPlacePodVerticalScaling=true. This requires Kubernetes v1.27 or later and a supported container runtime (e.g., containerd ≥ 1.6.9).
What types of workloads benefit most from this feature?
InPlacePodVerticalScaling is ideal for:
Singleton or stateful pods that can't be restarted easily
Data pipelines and analytics jobs in progress
CI jobs needing temporary resource boosts
JVM-based apps with dynamic CPU needs
Sidecar-heavy workloads where restarts are disruptive
What are the limitations of InPlacePodVerticalScaling in v1.33?
Only CPU and memory are supported
QoS class can't change after pod creation
Memory reduction always requires a restart
Does not support init or ephemeral containers
Requires containerd v1.6.9+
How do I know if a resource update succeeded or failed?
Use kubectl get pod <pod-name> -o yaml to inspect:
.status.resize for operation state (e.g., InProgress, Deferred, Infeasible)
.status.containerStatuses[].allocatedResources to verify what was actually applied If a request exceeds node capacity, Kubernetes marks the resize as Infeasible or defers it.
Like what you read? Support my work so I can keep writing more for you.