AI-native storage infrastructure

AI-native storage for every GPU workload.

Purpose-built for inference, training, and model management. RDMA-native. On bare metal you own, in a jurisdiction you choose.

Become a Design Partner → See How It Works
<20µs
Storage latency via RDMA
270+
GB/s sustained read/write
to a single client
#3
IO500 10-Node Production
SC25, on TCP
Yours
Hardware, data, jurisdiction

Your GPUs are expensive. They shouldn't wait for storage.

AI infrastructure runs on GPUs. But those GPUs spend a staggering amount of time doing nothing: waiting for storage to serve cached context, waiting for checkpoints to write, waiting for models to load. These aren't GPU problems. They're storage problems.

GPU Utilization

Where your GPU time actually goes

Three bottlenecks, one root cause: storage that wasn't designed for AI workloads.

Utilization: 43%
0:00 0:15 0:30 0:45 1:00
Active compute
Waiting on storage
Checkpoint / model swap
Failure recovery
40-60%
typical GPU utilization with unoptimized data pipelines
~40 GB
KV cache for a single 128K context on Llama-3.1-70B, recomputed every time
7%
of organizations achieve >85% GPU utilization

And for a growing number of organizations, sovereign AI programmes, defence, healthcare, regulated industries, the problem goes deeper than performance. Training data, model weights, and inference context contain some of the most sensitive IP an organization produces. When that data lives on someone else's infrastructure, in someone else's jurisdiction, you've outsourced control over your most valuable asset.

Built different. On purpose.

Conventional storage was designed around files, directories, and the POSIX API. That works for documents. It doesn't work when 512 GPUs need to write a checkpoint at the same time, or when a single inference request generates 40 GB of KV cache that needs to be read back in microseconds.

✓ Shipping today
Data Path

Conventional vs Enakta

Fewer hops, no kernel, direct NVMe access. Data moves at wire speed.

CONVENTIONAL
6 hops, kernel overhead at every stage
ENAKTA
3 hops, kernel bypass, sub-20µs end-to-end
<20µs
time-to-first-byte via RDMA
Same transport libraries as NVIDIA NCCL and NIXL (libfabric/UCX)

Native objects, not files

Data stored as objects with direct key-value access. No POSIX file system in the path. KV cache blocks, tensor shards, model weights are all directly addressable. No metadata contention, no namespace overhead.

RDMA from the ground up

The storage engine uses libfabric/UCX, the same transport libraries as NVIDIA NCCL and NIXL. Data moves directly between application memory and NVMe with zero kernel involvement.

Built-in versioning

Every write tagged with a monotonically increasing epoch. Snapshots are instant, zero cost, no data copy. Read from any point in time, diff between versions, roll back by changing a pointer. This isn't bolted on. It's how the storage works.

KV cache, model weights, adapters. All on storage that keeps up.

KV Cache Tiering

In Development

LLM inference generates massive KV caches. A single 128K context on Llama-3.1-70B produces ~40 GB. When cache exceeds GPU memory, it's either evicted and recomputed (expensive) or offloaded to storage. The storage needs to be fast enough that loading cached KV is cheaper than recomputing it.

Sub-20µs read latency via RDMA, object-granular access perfectly matched to KV block sizes (64KB-1MB), and native compatibility with the same RDMA transport as NVIDIA NIXL. No protocol translation required.

KV Cache Tiering

Extend Context Windows Beyond GPU Memory

GPU memory, host memory, and Enakta storage. As context grows, data automatically moves to the next tier. Drag the slider.

Context window size 32 GB
8 GB 2 TB
In Development Model Management

Models stored as objects, each version an instant snapshot. LoRA adapters stored alongside base models with separate versioning. 100 LoRA variants of a 70B model: ~160 GB (adapters only) vs ~14 TB (full copies). Rollback is an epoch pointer change, not a file copy.

In Development Model Activation and Blue/Green Deploys

Base model resident in GPU memory after provisioning. Adapter swaps via RDMA in sub-second time. Blue/green deploys with instant rollback via epoch pointer change. Roll new models to a canary set, validate quality, then expand or revert automatically.

In Development LMCache connector for DAOS core

Plug Enakta storage into the KV cache tier for vLLM and SGLang via LMCache. Enables KV cache offload, cross-request cache reuse, and TP-agnostic cache sharing.

Checkpoints in seconds, not minutes. Data loading that keeps pace.

Checkpoint Storage

✓ Shipping

A 700 GB checkpoint on a 512-GPU cluster means 15 minutes of every GPU sitting idle while state writes to storage. Failures happen every ~3 hours at scale. Every recovery reads from storage. The difference between 270+ GB/s sustained writes and a conventional filesystem isn't academic: it's the difference between a checkpoint completing in seconds and one that takes minutes.

Checkpoint I/O

Training Checkpoint: Conventional vs Enakta

Every checkpoint save means idle GPUs. Faster storage = less waste.

CONVENTIONAL STORAGE ~15 min checkpoint
512 GPUs idle during each checkpoint save
ENAKTA STORAGE (270+ GB/s) ~30s checkpoint
GPUs back to training 30x faster
Training
Checkpoint I/O (idle GPUs)
Fast checkpoint (Enakta)

Reference: On the same storage architecture at exascale (Argonne Aurora, 128 storage nodes), full LLaMA3-405B checkpoints complete in under 10 seconds at ~1 TB/s. That's Aurora's deployment, not ours, but the architecture is identical.

In Development PyTorch DCP StorageWriter

Native distributed checkpoint backend for PyTorch. Each rank writes directly to Enakta storage over RDMA. Atomic snapshots guarantee consistency. Enables faster checkpoints, automatic versioning, instant rollback, and changed-shard-only writes.

Data Pipelines

✓ Shipping

270+ GB/s sustained reads keep data loaders saturated. Random access without metadata bottleneck for multimodal training. Dataset versioning via snapshots for reproducible runs and regulatory compliance. Existing PyTorch Dataset integration (pydaos.torch) ships today.

Storage is the foundation. The platform is what you build on it.

We're extending Enakta from AI-native storage into a complete bare-metal AI platform. Train models, serve them, manage everything, from one place. No SLURM to learn. No Kubernetes YAML to write.

Coming 2026
enakta-platform PLATFORM PREVIEW
Training and inference on the same hardware, dynamically allocated
Checkpoints, KV cache, and models on the same storage, all versioned
Your entire AI workflow on hardware you own, managed from one CLI
Nothing leaves your premises. No telemetry. Complete operational sovereignty

From empty rack to serving inference.

PXE boot, immutable OS, GPU driver auto-detection, storage mount, model pull, health check. Eight stages, fully automated, all driven by a dedicated HA management block.

Bare-Metal Provisioning

Rack to Production in ~1 Hour

PXE boot, immutable OS, auto-configure GPUs and storage. No Kubernetes, no VMs.

0:00 elapsed
GPU Nodes Stateless, any node replaceable
HA Management Block Logging · Metrics · Alerting

Ship with confidence.

Blue/green model deploys with automatic rollback on quality regression. Roll new models to a canary set, validate, then expand or revert. Toggle between scenarios below.

Canary Rollouts

Blue/Green Deploys with Auto-Rollback

Roll new models to a canary set, validate quality, then expand or roll back automatically.

Current model deployed
Current model
Canary
Health check
Deployed
Rollback

What works with what. No surprises.

Honest status labels on everything. We'd rather you trust the table than discover a gap in production.

Integration What it enables Status
PyTorch Dataset / IterableDataset Training data loading from Enakta storage ✓ Shipping
PyTorch Checkpoint (torch.save/load) Basic model checkpointing ✓ Shipping
vLLM / SGLang LLM inference engines (model loading from storage) ✓ Compatible
LMCache KV Cache Backend KV cache offload to Enakta storage In Development
PyTorch DCP StorageWriter Native distributed checkpointing In Development
NVIDIA NIXL Direct RDMA data transfer for inference Roadmap
SGLang HiCache Backend Hierarchical KV caching Roadmap
Enakta CLI + Web UI Training / inference job management Coming 2026

The numbers. The foundation. The partners.

#3
IO500 10-Node Production
SC25, on TCP
270+
GB/s sustained read/write
to a single client
<20µs
time-to-first-byte
via RDMA

The storage core

Enakta's storage engine is built on the open-source DAOS project (Linux Foundation). We're founding members of the DAOS Foundation alongside Argonne National Laboratory, Google Cloud, HPE, and Intel. Google Cloud's Parallelstore service runs on the same core. So does the Aurora exascale supercomputer.

DAOS FOUNDATION Argonne National Laboratory Enakta Labs Google Cloud HPE Intel

Enakta runs entirely on your infrastructure. No data leaves your premises. No cloud dependency. No external API calls. Full auditability from storage to GPU. Suitable for sovereign AI programmes, defence, healthcare, financial services, and any environment where data residency and operational control are non-negotiable.

Shape what we build next.

The AI Platform is in active development. We're building it with a small number of infrastructure operators who run real GPU workloads, because the only way to get this right is to design it against real requirements, not assumptions.

WHAT YOU GET
  • Early access to AI Platform features as they reach testable state
  • Direct access to the engineering team building the platform
  • Your requirements influence the roadmap, actual design input, not a feature request queue
WHAT IT COSTS

Nothing. We're not selling early access. We're looking for operators whose real-world problems make the product better.

WHAT WE ASK

Tell us what's broken in your current stack. Share your pain points. Give us feedback on what we build. That's it.

Start a Conversation →

Already running GPU infrastructure and need better storage?

Explore the Storage Platform →

Your GPUs deserve better storage.
Let's talk about your workload.

Whether you're exploring AI-native storage today or interested in the full platform as it develops, we'd love to hear what you're building.