Embeddings

Three tiers of embedding storage with content-hash provenance. Pick the one that matches your traffic and budget.

Embedding storage

Semantic search needs vectors somewhere. InferaGraph supports three storage tiers behind a single contract — start simple, upgrade in place. Embeddings are generated on demand the first time a node is searched semantically, then reused indefinitely until the model or the source text changes.

Tier 1 — None. Keyword search only. Zero LLM calls for embeddings.
Tier 2 — Cache as store. Pass a cache with a provider that supports embed(). Embeddings live next to completions.
Tier 3 — Dedicated store. Pass embeddingStore for a vector-native backend with its own eviction policy.

Tier 1 — None

Default. useInferaGraphSearch() falls back to keyword scoring (name 10, alias 3, tag 2, content 1). Free, fast, predictable — and often plenty for single-token lookups.

// Tier 1 — keyword search only, zero embedding calls
<InferaGraph data={data} llm={llm} />

Tier 2 — Cache as store

Pass any CacheProvider; the engine writes embeddings under composite keys and re-runs cosine similarity on reads. This is the easiest upgrade — switch between lruCache and @inferagraph/redis without touching anything else.

import { lruCache } from '@inferagraph/core/data';

// Tier 2 — reuse the response cache as embedding storage.
// Easiest upgrade path. Embeddings live alongside completions
// and are evicted by the same LRU/TTL rules.
<InferaGraph
  data={data}
  llm={llm}
  cache={lruCache({ maxEntries: 1000 })}
/>

Caveat: similarity is computed in-process, so very large vector sets benefit from Tier 3. For low-thousands node graphs, Tier 2 is usually sufficient.

Tier 3 — Dedicated EmbeddingStore

An EmbeddingStore is a typed contract that decouples vector storage from the response cache. Core ships inMemoryEmbeddingStore() built-in. Implement the contract directly to wrap your favorite vector database.

import { inMemoryEmbeddingStore } from '@inferagraph/core/data';

// Tier 3 — dedicated EmbeddingStore. Same API, but separate
// from the completion cache so eviction policies can differ.
<InferaGraph
  data={data}
  llm={llm}
  embeddingStore={inMemoryEmbeddingStore()}
/>

A vector-native external store (e.g. a Redis Search adapter) is on the roadmap; track progress in the core repo.

EmbeddingStore contract

Six methods, all async. Implement them and you can plug in any vector store — pgvector, Pinecone, Qdrant, a remote service, anything.

interface EmbeddingStore {
  // Look up by composite key: nodeId + ':' + sourceHash
  get(key: string): Promise<StoredEmbedding | undefined>;

  // Persist a new embedding with provenance metadata
  set(key: string, value: StoredEmbedding): Promise<void>;

  // Remove a stale entry (model version mismatch, content change)
  delete(key: string): Promise<void>;

  // Iterate all stored vectors for a given node (rare — used by
  // cleanup tooling)
  listForNode(nodeId: string): Promise<StoredEmbedding[]>;

  // Cosine-similarity search across all stored vectors
  search(
    vector: number[],
    opts?: { topK?: number; minScore?: number },
  ): Promise<EmbeddingHit[]>;
}

interface StoredEmbedding {
  nodeId: string;
  vector: number[];
  model: string;       // e.g. 'text-embedding-3-small'
  modelVersion: string; // provider-stamped version
  sourceHash: string;  // hash of the embedded text
  createdAt: number;
}

Provenance and invalidation

Every stored embedding carries the model id, model version, and a hash of the source text. The engine never trusts a stored vector blindly — on every read it checks that the model and content hash still match. If anything has drifted, the entry is dropped and regenerated.

// When AIEngine generates an embedding it stamps:
//   - model        — the embedding model name
//   - modelVersion — provider-reported version (or model name)
//   - sourceHash   — SHA of the canonicalized embedding text
//
// On read, the engine verifies the stored hash matches the
// current node text. If the node was edited or the embedding
// model changed, the entry is invalidated and regenerated.
// Unchanged content is never re-embedded.