Consistent Sizing for Memory Volumes in Kubernetes
If you've used emptyDir
volumes with medium: Memory
, you might have noticed an inconsistency, the size of the volume wasn’t defined by your pod spec but instead defaulted to 50% of the host memory. This behavior made workload portability fragile, especially across heterogeneous node types. With KEP-1967, Kubernetes addresses this gap.
What Changed?¶
Kubernetes now supports sizable memory-backed volumes for emptyDir
with medium: Memory
. With this feature enabled, volume size is no longer tied to the host node's memory. Instead, it dynamically reflects actual pod-level memory limits, ensuring consistent and predictable sizing regardless of the node type.
How it Works¶
The new behavior is controlled by the SizeMemoryBackedVolumes
feature gate (enabled via the kubelet). When active, the size of a memory-backed emptyDir
is calculated as:
min(nodeAllocatable[memory], podAllocatable[memory], emptyDir.sizeLimit)
If no explicit sizeLimit
is provided, the volume still respects the pod’s actual memory limits, improving alignment between available memory and volume size.
Before vs After
Scenario | Previous Behavior | New Behavior |
---|---|---|
Pod requests 2Gi memory on 32Gi node | /dev/shm gets 16Gi (50% of node) |
/dev/shm gets 2Gi (pod limit) |
No sizeLimit specified | Defaults to node memory heuristic | Uses actual pod memory limit |
This improvement especially benefits workloads with /dev/shm
usage (e.g., AI/ML jobs, IPC-heavy apps), where improper sizing can lead to unpredictable failures or over-provisioning.
Why It Matters¶
The default behavior (50% of host memory) made workloads hard to port between nodes. For example:
- A pod working fine on a large memory node might fail due to insufficient
/dev/shm
on a smaller one. - Resource quotas based on actual pod limits were ignored when sizing tmpfs volumes.
Now, Kubernetes aligns volume sizing with how memory is charged to the pod, making behavior predictable and consistent.
Enabling the Feature¶
You can enable it by setting the following on the kubelet:
featureGates:
SizeMemoryBackedVolumes: true
No node reprovisioning or downtime is required. The change is backward-compatible and can be safely rolled back.
Observability and Validation¶
Operators can:
- Audit pods with
emptyDir.medium: Memory
- Use
df -h /dev/shm
in running pods to validate expected sizing
For example, with the feature enabled and a pod limit of 2Gi:
$ kubectl exec example -- df -h /dev/shm
Filesystem Size Used Avail Use% Mounted on
tmpfs 2.0G 0 2.0G 0% /dev/shm
Maturity
Milestone | Status |
---|---|
v1.20 | Alpha |
v1.22 | Beta |
v1.32 | GA (Stable) |
No Drawbacks, No Surprises¶
- No impact on CPU, RAM, or API calls
- No changes to API types
- No dependencies on other components
- No hidden resource exhaustion risks
Final Thoughts¶
This feature brings a much-needed level of predictability to memory-backed volumes. For platform teams, it means fewer environment-specific bugs. For developers, it’s one less abstraction leak to deal with. Kubernetes continues to evolve toward workload portability without compromise, and this is another step in that direction.
FAQs
What problem does Kubernetes solve with KEP-1967 for memory-backed emptyDir volumes?
Previously, emptyDir
volumes with medium: Memory
defaulted to 50% of host memory, making behavior unpredictable across nodes. Kubernetes now allows the volume size to reflect pod-level memory limits, ensuring consistent behavior regardless of node size.
How is the size of a memory-backed volume determined under the new feature?
With the SizeMemoryBackedVolumes
feature gate enabled, the size of a medium: Memory
volume is based on the pod’s memory limits, not the node’s total memory. This improves consistency and predictability across heterogeneous environments.
How do I enable predictable sizing for memory-backed emptyDir volumes?
Enable the SizeMemoryBackedVolumes
feature gate in the kubelet configuration:
--feature-gates=SizeMemoryBackedVolumes=true
No downtime or node reprovisioning is required.
What types of workloads benefit most from this change?
Workloads that rely on /dev/shm
or temporary memory storage—such as AI/ML jobs, inter-process communication (IPC)-intensive applications, and memory-bound tasks—see improved reliability and portability.
Are there any compatibility or performance concerns with this feature?
No. The change is backward-compatible, introduces no API changes, does not affect CPU or RAM usage, and carries no risk of hidden resource exhaustion. It simply aligns volume sizing with pod memory limits.