Purpose-built for inference, training, and model management. RDMA-native. On bare metal you own, in a jurisdiction you choose.
AI infrastructure runs on GPUs. But those GPUs spend a staggering amount of time doing nothing: waiting for storage to serve cached context, waiting for checkpoints to write, waiting for models to load. These aren't GPU problems. They're storage problems.
Three bottlenecks, one root cause: storage that wasn't designed for AI workloads.
And for a growing number of organizations, sovereign AI programmes, defence, healthcare, regulated industries, the problem goes deeper than performance. Training data, model weights, and inference context contain some of the most sensitive IP an organization produces. When that data lives on someone else's infrastructure, in someone else's jurisdiction, you've outsourced control over your most valuable asset.
Conventional storage was designed around files, directories, and the POSIX API. That works for documents. It doesn't work when 512 GPUs need to write a checkpoint at the same time, or when a single inference request generates 40 GB of KV cache that needs to be read back in microseconds.
✓ Shipping todayFewer hops, no kernel, direct NVMe access. Data moves at wire speed.
Data stored as objects with direct key-value access. No POSIX file system in the path. KV cache blocks, tensor shards, model weights are all directly addressable. No metadata contention, no namespace overhead.
The storage engine uses libfabric/UCX, the same transport libraries as NVIDIA NCCL and NIXL. Data moves directly between application memory and NVMe with zero kernel involvement.
Every write tagged with a monotonically increasing epoch. Snapshots are instant, zero cost, no data copy. Read from any point in time, diff between versions, roll back by changing a pointer. This isn't bolted on. It's how the storage works.
LLM inference generates massive KV caches. A single 128K context on Llama-3.1-70B produces ~40 GB. When cache exceeds GPU memory, it's either evicted and recomputed (expensive) or offloaded to storage. The storage needs to be fast enough that loading cached KV is cheaper than recomputing it.
Sub-20µs read latency via RDMA, object-granular access perfectly matched to KV block sizes (64KB-1MB), and native compatibility with the same RDMA transport as NVIDIA NIXL. No protocol translation required.
GPU memory, host memory, and Enakta storage. As context grows, data automatically moves to the next tier. Drag the slider.
Models stored as objects, each version an instant snapshot. LoRA adapters stored alongside base models with separate versioning. 100 LoRA variants of a 70B model: ~160 GB (adapters only) vs ~14 TB (full copies). Rollback is an epoch pointer change, not a file copy.
Base model resident in GPU memory after provisioning. Adapter swaps via RDMA in sub-second time. Blue/green deploys with instant rollback via epoch pointer change. Roll new models to a canary set, validate quality, then expand or revert automatically.
Plug Enakta storage into the KV cache tier for vLLM and SGLang via LMCache. Enables KV cache offload, cross-request cache reuse, and TP-agnostic cache sharing.
A 700 GB checkpoint on a 512-GPU cluster means 15 minutes of every GPU sitting idle while state writes to storage. Failures happen every ~3 hours at scale. Every recovery reads from storage. The difference between 270+ GB/s sustained writes and a conventional filesystem isn't academic: it's the difference between a checkpoint completing in seconds and one that takes minutes.
Every checkpoint save means idle GPUs. Faster storage = less waste.
Reference: On the same storage architecture at exascale (Argonne Aurora, 128 storage nodes), full LLaMA3-405B checkpoints complete in under 10 seconds at ~1 TB/s. That's Aurora's deployment, not ours, but the architecture is identical.
Native distributed checkpoint backend for PyTorch. Each rank writes directly to Enakta storage over RDMA. Atomic snapshots guarantee consistency. Enables faster checkpoints, automatic versioning, instant rollback, and changed-shard-only writes.
270+ GB/s sustained reads keep data loaders saturated. Random access without metadata bottleneck for multimodal training. Dataset versioning via snapshots for reproducible runs and regulatory compliance. Existing PyTorch Dataset integration (pydaos.torch) ships today.
We're extending Enakta from AI-native storage into a complete bare-metal AI platform. Train models, serve them, manage everything, from one place. No SLURM to learn. No Kubernetes YAML to write.
Coming 2026PXE boot, immutable OS, GPU driver auto-detection, storage mount, model pull, health check. Eight stages, fully automated, all driven by a dedicated HA management block.
PXE boot, immutable OS, auto-configure GPUs and storage. No Kubernetes, no VMs.
Blue/green model deploys with automatic rollback on quality regression. Roll new models to a canary set, validate, then expand or revert. Toggle between scenarios below.
Roll new models to a canary set, validate quality, then expand or roll back automatically.
Honest status labels on everything. We'd rather you trust the table than discover a gap in production.
| Integration | What it enables | Status |
|---|---|---|
| PyTorch Dataset / IterableDataset | Training data loading from Enakta storage | ✓ Shipping |
| PyTorch Checkpoint (torch.save/load) | Basic model checkpointing | ✓ Shipping |
| vLLM / SGLang | LLM inference engines (model loading from storage) | ✓ Compatible |
| LMCache KV Cache Backend | KV cache offload to Enakta storage | In Development |
| PyTorch DCP StorageWriter | Native distributed checkpointing | In Development |
| NVIDIA NIXL | Direct RDMA data transfer for inference | Roadmap |
| SGLang HiCache Backend | Hierarchical KV caching | Roadmap |
| Enakta CLI + Web UI | Training / inference job management | Coming 2026 |
Enakta's storage engine is built on the open-source DAOS project (Linux Foundation). We're founding members of the DAOS Foundation alongside Argonne National Laboratory, Google Cloud, HPE, and Intel. Google Cloud's Parallelstore service runs on the same core. So does the Aurora exascale supercomputer.
Enakta runs entirely on your infrastructure. No data leaves your premises. No cloud dependency. No external API calls. Full auditability from storage to GPU. Suitable for sovereign AI programmes, defence, healthcare, financial services, and any environment where data residency and operational control are non-negotiable.
The AI Platform is in active development. We're building it with a small number of infrastructure operators who run real GPU workloads, because the only way to get this right is to design it against real requirements, not assumptions.
Nothing. We're not selling early access. We're looking for operators whose real-world problems make the product better.
Tell us what's broken in your current stack. Share your pain points. Give us feedback on what we build. That's it.
Already running GPU infrastructure and need better storage?
Explore the Storage Platform →Whether you're exploring AI-native storage today or interested in the full platform as it develops, we'd love to hear what you're building.