How SEDIM Works — Nage AI

Core Equation

The formula

SEDIM Compositional Formula

CENTO = FACIES + Σi STEMMAi(x) · VARVEi

Base Layer

W₀

FACIES

Frozen base weights W₀. The immutable foundation from pre-training. All knowledge accumulates on top of FACIES without ever modifying it.

Knowledge Layer

UΣV¹

VARVE

Knowledge layer UᵢΣᵢVᵢ¹. Low-rank domain-specific stratum. Each trained independently on its own source corpus.

Source Residual

ΔW

CLAST

Source residual ΔW = W_source − W_facies. The knowledge delta extracted from a source model via SVD decomposition.

Routing Function

α(x)

STRIA

Per-block routing α(x). Computes which VARVE serves each query. 262K params vs 768M for per-layer routing — extremely lightweight.

Attribution Vector

{v:w}

STEMMA

Attribution vector {varve: weight}. Mechanistic contribution distribution for every output. Shows exactly which sources shaped the response.

Composed Output

Σ

CENTO

Composed output. The final intelligence combining FACIES and weighted VARVEs. A patchwork of attributed knowledge sources, unified at inference.

Training Pipeline

Four phases to
production intelligence

Phase 1

CLAST Initialization — CONFLUX

No gradient computation. Extract source residuals via SVD decomposition of pre-trained weights. Asymmetric init: A-matrix receives decomposed structure at scale 0.01, B stays zeros. Quality gate: CKA > 0.8 required for transfer to proceed.

Phase 2

Parallel VARVE Training

Each VARVE trains independently with stop-gradient isolation. VARVEs cannot influence each other during training — ensuring clean knowledge boundaries. Overhead: approximately 1.3x vs single LoRA. All VARVEs can train in parallel across GPUs.

Phase 3

STRIA Routing Training

All VARVEs frozen. Only the lightweight STRIA routing network trains. ARS entropy floor prevents routing collapse into a single dominant VARVE. The router learns which knowledge source serves each query pattern optimally.

Phase 4

Optional Joint Fine-tune

All components unfrozen with extremely conservative learning rate (LR=1e-6). Light joint optimization refines the composition without disrupting established VARVE knowledge boundaries. This phase is optional and typically yields marginal gains.

Inference Modes

Four modes,
one architecture

SEDIM supports multiple inference configurations, each trading latency for attribution depth. Choose the mode that fits your deployment.

Full

+56% latency

All VARVEs active. Complete STEMMA attribution. Maximum quality, full source transparency.

Use case: Research, compliance, audit

Top-K

+17% latency

Only the highest-scoring STEMMA VARVE activates. Best tradeoff between quality and speed.

Use case: Production default

Static

+5% latency

Fixed routing weights baked at deploy time. No runtime STRIA computation. Minimal overhead.

Use case: Edge, mobile, embedded

Resonance

+9% avg latency

FACIES confidence gate. When base weights are confident, skip VARVEs entirely. Adaptive per-query.

Use case: High-throughput APIs

Cross-Architecture Transfer

CONFLUX
initialization

CONFLUX enables cross-architecture SVD initialization. Source model weights are decomposed via truncated SVD, and the resulting structure initializes the target VARVE's A-matrix at scale 0.01 while B remains zeros.

The CKA quality gate determines whether transfer proceeds. High alignment means the source structure is meaningfully compatible with the target — low alignment means direct SFT is a better strategy.

CKA > 0.8

Strong Transfer

Initiate CONFLUX. Expect 2x convergence and significant val loss improvement.

CKA < 0.3

Minimal Transfer

Skip CONFLUX. Use direct SFT instead. Transfer overhead not justified.

Fehm A/B Results

Val Loss0.0265

Val Loss0.0265 (beats FFT 0.0281 & LoRA 0.0278)

Convergence2× faster

EVR99.46%

Transparency Note

Proof of concept (N=1). Multi-seed validation in progress. Results from Fehm A/B experiment only. Bilge A/B (CKA 0.235) confirmed low-CKA yields minimal benefit.

Comparison

SEDIM vs
existing approaches

Capability	SEDIM	LoRA	MoE	RAG
Source attributionCan every output be traced to its knowledge source?	✓ STEMMA vector	✗	~ expert ID only	~ document level
Knowledge isolationAre knowledge sources trained independently?	✓ stop-gradient	✗ single adapter	✓ per expert	✓ per document
Continuous learningCan new knowledge be added without retraining?	✓ new VARVE	~ new adapter	✗ full retrain	✓ new docs
Parameter efficiencyOverhead per knowledge source?	✓ low-rank VARVE	✓ low-rank	✗ full expert	✓ no params
Routing overheadCost of dynamic source selection?	✓ 262K STRIA	~ manual selection	~ gating network	~ retrieval latency
Forgetting resistanceDoes new knowledge overwrite old?	✓ FACIES frozen	✗ base drifts	~ depends	✓ no training

Evaluation Framework

SEDIM-Bench
five dimensions

A purpose-built benchmark for evaluating sedimentary architectures. Five dimensions capture what generic LLM benchmarks miss.

SB-1

VARVEiq

Per-VARVE domain expertise. Measures each knowledge layer's accuracy on its own domain without routing assistance.

SB-2

RoutingIQ

STRIA accuracy and consistency. Does the router correctly identify which VARVE should serve each query?

SB-3

FusionIQ

Multi-VARVE synergy score. Values above 1.0 indicate positive composition — the combination outperforms individual VARVEs.

SB-4

IsolationIQ

Cross-VARVE contamination metric. 1.0 means perfect isolation. Measures whether training one VARVE leaks into another.

SB-5

AttributionIQ

STEMMA faithfulness. Do attribution vectors accurately reflect the actual computational contribution of each VARVE?

Open source benchmark released — github.com/NageAI/sedim

Follow on GitHub

How SEDIMworks

The formula

Four phases toproduction intelligence

Four modes,one architecture

CONFLUXinitialization

SEDIM vsexisting approaches

SEDIM-Benchfive dimensions

How SEDIM
works

Four phases to
production intelligence

Four modes,
one architecture

CONFLUX
initialization

SEDIM vs
existing approaches

SEDIM-Bench
five dimensions