ptrnsai

Episodic Memory

Advanced💾 Memory PatternsStanford (Park et al., 2023)

Intent

Store and retrieve specific past experiences — complete interaction episodes — so the agent can learn from what worked and what didn't.

Problem

General long-term memory stores facts and preferences but loses context. Knowing 'the user prefers Python' is different from remembering 'last Tuesday when we tried to debug that async issue, we discovered the problem was in the event loop configuration.' Specific episodes contain rich context that general summaries lose.

Solution

Store complete interaction episodes (or rich summaries of them) with temporal and contextual metadata. When a new situation resembles a past episode, retrieve the relevant experience and use it to inform the current approach. Episodic memory mimics how humans recall 'that time when...' — specific experiences that provide directly applicable lessons.

Diagram

Past Episode: [Context: debugging async] [Actions: tried X, Y, Z] [Outcome: Z worked]
                                    ↓ [Store with metadata]
                              [Episodic Memory DB]
                                    ↓ [Retrieve when similar context detected]
New Episode: [Context: another async issue] → [Retrieved: 'Last time, Z worked'] → Informed response

When to Use

  • Agents that handle recurring types of tasks and should improve over time
  • When past problem-solving strategies are directly applicable to new situations
  • Support agents that encounter similar issues repeatedly
  • Long-running projects where past decisions inform future ones

When NOT to Use

  • Truly novel tasks with no relevant past episodes
  • When storage and retrieval overhead isn't justified
  • Privacy-sensitive contexts where interaction logs can't be stored

Pros & Cons

Pros

  • Rich, contextual learning from past experiences
  • Can recall specific strategies that worked
  • Improves agent performance over time on recurring tasks
  • Provides explainable reasoning ('I'm doing X because it worked last time')

Cons

  • Storage costs grow over time
  • Retrieval relevance is hard to get right
  • Past episodes may not be applicable to new contexts
  • Privacy implications of storing detailed interaction logs

Implementation Steps

  1. 1Define what constitutes an 'episode' worth storing
  2. 2Extract key elements: context, actions taken, outcome, lessons learned
  3. 3Store episodes with rich metadata (timestamps, tags, success/failure)
  4. 4Build similarity matching: given a new situation, find relevant past episodes
  5. 5Inject relevant episodes into the agent's context
  6. 6Implement episode curation: merge similar episodes, archive old ones

Real-World Example

DevOps Incident Response

An agent handling production incidents stores each incident as an episode: symptoms, investigation steps, root cause, resolution. When a similar alert fires, it retrieves relevant past incidents: 'This pattern matches the memory leak incident from January — check the connection pool configuration first.'

PythonStore and Retrieve Past Incidents
from openai import OpenAI
import numpy as np
from datetime import datetime

client = OpenAI()
episodes = []
episode_embeddings = []

def store_episode(description: str, outcome: str, tags: list[str]):
    episode = {
        "timestamp": datetime.now().isoformat(),
        "description": description,
        "outcome": outcome,
        "tags": tags,
    }
    episodes.append(episode)
    embedding = client.embeddings.create(
        model="text-embedding-3-small", input=[description]
    ).data[0].embedding
    episode_embeddings.append(embedding)

def recall(situation: str, top_k: int = 3) -> list[dict]:
    query_emb = np.array(client.embeddings.create(
        model="text-embedding-3-small", input=[situation]
    ).data[0].embedding)
    similarities = np.array(episode_embeddings) @ query_emb
    top_idx = np.argsort(similarities)[-top_k:][::-1]
    return [episodes[i] for i in top_idx]

# Store a past incident
store_episode("Redis connection pool exhausted", "Increased max connections to 50", ["redis", "perf"])
# Later, recall similar
similar = recall("Database connections timing out during peak traffic")

References