MEMORY ARCHITECTURE

How CogniWeave remembers

CogniWeave's memory layer is informed by published research on lifelong memory for LLM agents (Liu et al., 2026). The results on this page are from that published research. Your production performance will depend on your data and configuration.

The problem with AI memory

Long-running agents face a finite context window. Two approaches dominate and both fail. The first retains the full history, which accumulates low-value filler alongside useful information and leads to middle-context degradation, where the model attends poorly to information buried in a long context. The second filters continuously using iterative reasoning loops, which improves relevance but adds latency and token cost. Neither allocates memory and computation efficiently.

The approach

CogniWeave's memory engine uses a three-stage pipeline designed to maximise information density: retain meaning, discard noise, retrieve only what the question needs. The design draws on Complementary Learning Systems theory, the same framework used to describe how the brain consolidates short-term experience into long-term memory.

Compress at source

Interactions are scored for information value before storage. Low-value content is discarded at the gate. What passes is transformed into self-contained, timestamped memory units with resolved entity references.

Consolidate over time

An asynchronous process clusters related memories by semantic similarity and temporal proximity, synthesising them into compact abstractions. The active index stays small. Detail is archived, not lost.

Retrieve by relevance

Query complexity is estimated at runtime and retrieval scope adjusts accordingly, drawing on semantic embeddings, sparse lexical features, and structured metadata.

Results from the published research

In the published research, the architecture reaches an average F1 of 43.24 on the LoCoMo benchmark using GPT-4.1-mini, against 34.20 for Mem0 and 18.70 for a full-context baseline. Token use averages roughly 550 per query, about 30 times lower than full-context methods. The largest gains are in temporal reasoning. These are the researchers' benchmark figures, not measurements of CogniWeave's production system.

Explore the technical detail