AI-native storage infrastructure

AI-native storage for every GPU workload.

Purpose-built for inference, training, and model management. RDMA-native. On bare metal you own, in a jurisdiction you choose.

Become a Design Partner → See How It Works

<20µs

Storage latency via RDMA

270+

GB/s sustained read/write
to a single client

IO500 10-Node Production
SC25, on TCP

Yours

Hardware, data, jurisdiction

The Problem

Your GPUs are expensive. They shouldn't wait for storage.

AI infrastructure runs on GPUs. But those GPUs spend a staggering amount of time doing nothing: waiting for storage to serve cached context, waiting for checkpoints to write, waiting for models to load. These aren't GPU problems. They're storage problems.

GPU Utilization

Where your GPU time actually goes

Three bottlenecks, one root cause: storage that wasn't designed for AI workloads.

Utilization: 43%

0:00 0:15 0:30 0:45 1:00

Active compute

Waiting on storage

Checkpoint / model swap

Failure recovery

40-60%

typical GPU utilization with unoptimized data pipelines

~40 GB

KV cache for a single 128K context on Llama-3.1-70B, recomputed every time

of organizations achieve >85% GPU utilization

And for a growing number of organizations, sovereign AI programmes, defence, healthcare, regulated industries, the problem goes deeper than performance. Training data, model weights, and inference context contain some of the most sensitive IP an organization produces. When that data lives on someone else's infrastructure, in someone else's jurisdiction, you've outsourced control over your most valuable asset.

Storage Architecture

Built different. On purpose.

Conventional storage was designed around files, directories, and the POSIX API. That works for documents. It doesn't work when 512 GPUs need to write a checkpoint at the same time, or when a single inference request generates 40 GB of KV cache that needs to be read back in microseconds.

✓ Shipping today

Data Path

Conventional vs Enakta

Fewer hops, no kernel, direct NVMe access. Data moves at wire speed.

CONVENTIONAL

6 hops, kernel overhead at every stage

ENAKTA

3 hops, kernel bypass, sub-20µs end-to-end

<20µs

time-to-first-byte via RDMA

Same transport libraries as NVIDIA NCCL and NIXL (libfabric/UCX)

Native objects, not files

Data stored as objects with direct key-value access. No POSIX file system in the path. KV cache blocks, tensor shards, model weights are all directly addressable. No metadata contention, no namespace overhead.

RDMA from the ground up

The storage engine uses libfabric/UCX, the same transport libraries as NVIDIA NCCL and NIXL. Data moves directly between application memory and NVMe with zero kernel involvement.

Built-in versioning

Every write tagged with a monotonically increasing epoch. Snapshots are instant, zero cost, no data copy. Read from any point in time, diff between versions, roll back by changing a pointer. This isn't bolted on. It's how the storage works.

Inference Storage

KV cache, model weights, adapters. All on storage that keeps up.

KV Cache Tiering

In Development

LLM inference generates massive KV caches. A single 128K context on Llama-3.1-70B produces ~40 GB. When cache exceeds GPU memory, it's either evicted and recomputed (expensive) or offloaded to storage. The storage needs to be fast enough that loading cached KV is cheaper than recomputing it.

Sub-20µs read latency via RDMA, object-granular access perfectly matched to KV block sizes (64KB-1MB), and native compatibility with the same RDMA transport as NVIDIA NIXL. No protocol translation required.

KV Cache Tiering

Extend Context Windows Beyond GPU Memory

GPU memory, host memory, and Enakta storage. As context grows, data automatically moves to the next tier. Drag the slider.

Context window size 32 GB

8 GB 2 TB

In Development Model Management

Models stored as objects, each version an instant snapshot. LoRA adapters stored alongside base models with separate versioning. 100 LoRA variants of a 70B model: ~160 GB (adapters only) vs ~14 TB (full copies). Rollback is an epoch pointer change, not a file copy.

In Development Model Activation and Blue/Green Deploys

Base model resident in GPU memory after provisioning. Adapter swaps via RDMA in sub-second time. Blue/green deploys with instant rollback via epoch pointer change. Roll new models to a canary set, validate quality, then expand or revert automatically.

In Development LMCache connector for DAOS core

Plug Enakta storage into the KV cache tier for vLLM and SGLang via LMCache. Enables KV cache offload, cross-request cache reuse, and TP-agnostic cache sharing.

Training Storage

Checkpoints in seconds, not minutes. Data loading that keeps pace.

Checkpoint Storage

✓ Shipping

A 700 GB checkpoint on a 512-GPU cluster means 15 minutes of every GPU sitting idle while state writes to storage. Failures happen every ~3 hours at scale. Every recovery reads from storage. The difference between 270+ GB/s sustained writes and a conventional filesystem isn't academic: it's the difference between a checkpoint completing in seconds and one that takes minutes.

Checkpoint I/O

Training Checkpoint: Conventional vs Enakta

Every checkpoint save means idle GPUs. Faster storage = less waste.

CONVENTIONAL STORAGE ~15 min checkpoint

512 GPUs idle during each checkpoint save

ENAKTA STORAGE (270+ GB/s) ~30s checkpoint

GPUs back to training 30x faster

Training

Checkpoint I/O (idle GPUs)

Fast checkpoint (Enakta)

Reference: On the same storage architecture at exascale (Argonne Aurora, 128 storage nodes), full LLaMA3-405B checkpoints complete in under 10 seconds at ~1 TB/s. That's Aurora's deployment, not ours, but the architecture is identical.

In Development PyTorch DCP StorageWriter

Native distributed checkpoint backend for PyTorch. Each rank writes directly to Enakta storage over RDMA. Atomic snapshots guarantee consistency. Enables faster checkpoints, automatic versioning, instant rollback, and changed-shard-only writes.

Data Pipelines

✓ Shipping

270+ GB/s sustained reads keep data loaders saturated. Random access without metadata bottleneck for multimodal training. Dataset versioning via snapshots for reproducible runs and regulatory compliance. Existing PyTorch Dataset integration (pydaos.torch) ships today.

The Platform

Storage is the foundation. The platform is what you build on it.

We're extending Enakta from AI-native storage into a complete bare-metal AI platform. Train models, serve them, manage everything, from one place. No SLURM to learn. No Kubernetes YAML to write.

Coming 2026

enakta-platform PLATFORM PREVIEW

•Training and inference on the same hardware, dynamically allocated

•Checkpoints, KV cache, and models on the same storage, all versioned

•Your entire AI workflow on hardware you own, managed from one CLI

•Nothing leaves your premises. No telemetry. Complete operational sovereignty

From empty rack to serving inference.

PXE boot, immutable OS, GPU driver auto-detection, storage mount, model pull, health check. Eight stages, fully automated, all driven by a dedicated HA management block.

Bare-Metal Provisioning

Rack to Production in ~1 Hour

PXE boot, immutable OS, auto-configure GPUs and storage. No Kubernetes, no VMs.

0:00 elapsed

GPU Nodes Stateless, any node replaceable

HA Management Block Logging · Metrics · Alerting

Ship with confidence.

Blue/green model deploys with automatic rollback on quality regression. Roll new models to a canary set, validate, then expand or revert. Toggle between scenarios below.

Canary Rollouts

Blue/Green Deploys with Auto-Rollback

Roll new models to a canary set, validate quality, then expand or roll back automatically.

Current model deployed

Current model

Canary

Health check

Deployed

Rollback

Integrations

What works with what. No surprises.

Honest status labels on everything. We'd rather you trust the table than discover a gap in production.

Integration	What it enables	Status
PyTorch Dataset / IterableDataset	Training data loading from Enakta storage	✓ Shipping
PyTorch Checkpoint (torch.save/load)	Basic model checkpointing	✓ Shipping
vLLM / SGLang	LLM inference engines (model loading from storage)	✓ Compatible
LMCache KV Cache Backend	KV cache offload to Enakta storage	In Development
PyTorch DCP StorageWriter	Native distributed checkpointing	In Development
NVIDIA NIXL	Direct RDMA data transfer for inference	Roadmap
SGLang HiCache Backend	Hierarchical KV caching	Roadmap
Enakta CLI + Web UI	Training / inference job management	Coming 2026

Credibility

The numbers. The foundation. The partners.

IO500 10-Node Production
SC25, on TCP

270+

GB/s sustained read/write
to a single client

<20µs

time-to-first-byte
via RDMA

The storage core

Enakta's storage engine is built on the open-source DAOS project (Linux Foundation). We're founding members of the DAOS Foundation alongside Argonne National Laboratory, Google Cloud, HPE, and Intel. Google Cloud's Parallelstore service runs on the same core. So does the Aurora exascale supercomputer.

DAOS FOUNDATION Argonne National Laboratory Enakta Labs Google Cloud HPE Intel

Enakta runs entirely on your infrastructure. No data leaves your premises. No cloud dependency. No external API calls. Full auditability from storage to GPU. Suitable for sovereign AI programmes, defence, healthcare, financial services, and any environment where data residency and operational control are non-negotiable.

Design Partners

Shape what we build next.

The AI Platform is in active development. We're building it with a small number of infrastructure operators who run real GPU workloads, because the only way to get this right is to design it against real requirements, not assumptions.

WHAT YOU GET

•Early access to AI Platform features as they reach testable state
•Direct access to the engineering team building the platform
•Your requirements influence the roadmap, actual design input, not a feature request queue

WHAT IT COSTS

Nothing. We're not selling early access. We're looking for operators whose real-world problems make the product better.

WHAT WE ASK

Tell us what's broken in your current stack. Share your pain points. Give us feedback on what we build. That's it.

Start a Conversation →

AI-native storage for every GPU workload.

Your GPUs are expensive. They shouldn't wait for storage.

Where your GPU time actually goes

Built different. On purpose.

Conventional vs Enakta

Native objects, not files

RDMA from the ground up

Built-in versioning

KV cache, model weights, adapters. All on storage that keeps up.

KV Cache Tiering

Extend Context Windows Beyond GPU Memory

Checkpoints in seconds, not minutes. Data loading that keeps pace.

Checkpoint Storage

Training Checkpoint: Conventional vs Enakta

Data Pipelines

Storage is the foundation. The platform is what you build on it.

From empty rack to serving inference.

Rack to Production in ~1 Hour

Ship with confidence.

Blue/Green Deploys with Auto-Rollback

What works with what. No surprises.

The numbers. The foundation. The partners.

The storage core

Shape what we build next.

Your GPUs deserve better storage.Let's talk about your workload.

Your GPUs deserve better storage.
Let's talk about your workload.