Datasources

Connect to server-side data backends—graph databases (Gremlin), document stores (Cosmos DB), SQL, file systems, and Azure Log Analytics.

Server-Side Datasources

Datasource packages are server-side implementations of the DataAdapter interface. They connect directly to databases and provide the same 7-method contract, plus a managed connection lifecycle (connect/disconnect/isConnected). Each package exposes a factory function (the recommended on-ramp) that owns SDK construction from neutral options like endpoint / key, plus the underlying class as an escape hatch for hosts that need to share a pre-built SDK client.

Install datasource packages (pick one or more)

$ pnpm add @inferagraph/gremlin # Apache TinkerPop / Gremlin servers
$ pnpm add @inferagraph/cosmosdb # Azure Cosmos DB (NoSQL)
$ pnpm add @inferagraph/sql # SQL via Knex
$ pnpm add @inferagraph/file # CSV / TSV / Markdown
$ pnpm add @inferagraph/log-analytics # Azure Log Analytics

DataSource Base Class

All datasource packages extend the abstract DataSource class, which adds a connection lifecycle on top of the DataAdapter interface.

import { DataSource } from '@inferagraph/core/data';

// All datasources share a lifecycle contract
abstract class DataSource implements DataAdapter {
  abstract connect(): Promise<void>;
  abstract disconnect(): Promise<void>;
  abstract isConnected(): boolean;

  // Plus all 7 DataAdapter methods...
}
G

Gremlin DataSource

@inferagraph/gremlin

Connect to any TinkerPop-compatible Gremlin server, including Apache TinkerPop, JanusGraph, Amazon Neptune, and Azure Cosmos DB Gremlin API. Defaults are tuned for unpartitioned servers; two optional hooks (getCompositeKey, getType) adapt the same datasource to partitioned containers and property-based type fields without forking the package.

import { gremlinDataSource } from '@inferagraph/gremlin';
import { GraphProvider, InferaGraph } from '@inferagraph/core/react';

// TinkerPop / JanusGraph / unpartitioned servers — defaults work as-is.
// Factory owns SDK construction; class form (new GremlinDataSource) is the escape hatch.
const ds = gremlinDataSource({
  endpoint: 'wss://<hostname>:8182/gremlin',
  database: '<database>',
  container: 'graph',
});
await ds.connect();

// Use as a DataAdapter
<GraphProvider adapter={ds}>
  <InferaGraph />
</GraphProvider>

Partitioned containers (e.g. Cosmos DB Gremlin API)

Cosmos DB partitions every container, so g.V(id) needs both the partition value and the id. Pass getCompositeKey to map an id to the values your container expects. The function is host-defined—the partition strategy is your choice. getType is similarly optional and lets you read the semantic type from a property instead of vertex.label. Both options are non-breaking; existing consumers don't need to set them.

import { gremlinDataSource } from '@inferagraph/gremlin';
import { GraphProvider, InferaGraph } from '@inferagraph/core/react';

// Partitioned containers (e.g. Azure Cosmos DB Gremlin API) need a composite key.
// The shape of the partition is host-defined; below is ONE example.
const ds = gremlinDataSource({
  endpoint: 'wss://<hostname>.gremlin.cosmos.azure.com:443/',
  key: process.env.GREMLIN_KEY,
  database: '<database>',
  container: 'graph',

  // Resolve a vertex id to the value(s) passed to g.V(...).
  // Default: identity (returns the id alone) — fine for unpartitioned servers.
  // Override when your container is partitioned. Examples:
  //   partition key == vertex id:   (id) => [id, id]
  //   partition key == type field:  (id) => [getTypeFor(id), id]
  //   constant partition:           (id) => ['unit', id]
  //   tenant partition:             (id) => [tenantId, id]
  getCompositeKey: (id) => [id, id],

  // Resolve the semantic type of a vertex.
  // Default: (v) => v.label. Override when type lives in a property:
  getType: (v) => v.properties?.type?.[0]?.value ?? v.label,
});
await ds.connect();

<GraphProvider adapter={ds}>
  <InferaGraph />
</GraphProvider>
C

Cosmos DB DataSource

@inferagraph/cosmosdb

Connect to Azure Cosmos DB using the NoSQL (document) API. Store nodes and edges as JSON documents with automatic partition key management. getContent(id) returns the stored document body so chat / drilldown UIs render full content; the chat engine should be wired against the same adapter the rest of the app uses (do not import a static seed alongside a live datasource — they will drift).

Data safety — upgrade to 0.3.2+.
Through 0.3.1 the embedding writer ran a read-merge-upsert against the units container. When the read returned 404 (e.g., partition-key mismatch on a per-tenant container) the merge fell back to writing a fresh document containing only embedding fields, wiping pre-existing fields like content, title, and slug. 0.3.2 switched to Cosmos's atomic container.item(id, pk).patch([...]) so embedding writes never touch unrelated fields. The same fix landed in CosmosConversationStore.appendTurn. Anyone running 0.3.0 / 0.3.1 against live data should upgrade and re-hydrate any nodes whose content was lost.

import { cosmosDataSource } from '@inferagraph/cosmosdb';
import { GraphProvider, InferaGraph } from '@inferagraph/core/react';

// Factory owns SDK construction; class form (new CosmosDataSource) is the escape hatch.
const ds = cosmosDataSource({
  endpoint: 'https://<hostname>.documents.azure.com',
  key: process.env.COSMOS_KEY,
  database: '<database>',
  container: 'nodes',
});
await ds.connect();

// Use as a DataAdapter
<GraphProvider adapter={ds}>
  <InferaGraph />
</GraphProvider>

Vector + RAG + Cache + Conversation building blocks

The package ships factories that turn a Cosmos NoSQL account into the persistence layer for @inferagraph/core's RAG pipeline:

  • provisionVectorContainers({endpoint, key, database, unitsContainer, inferredEdgesContainer?, embeddingDimensions?, dataType?}) — one-time, idempotent setup of the units container's vector index policy plus the inferred_edges container with its own vector policy. Safe to re-run on every deploy; rejects unsupported live alters with a clear error.
  • cosmosVectorEmbeddingStore({endpoint, key, database, container}) — implements core's EmbeddingStore. Backed by the units container with a vector index on /embedding; powers semantic retrieval via searchVector.
  • cosmosInferredEdgeStore({endpoint, key, database}) — implements core's InferredEdgeStore. Backed by the inferred_edges container; powers inferred-relationship retrieval at chat time.
  • cosmosConversationStore({endpoint, key, database, container?}) — implements core's ConversationStore. Per-conversation TTL, shareable across processes.
  • cosmosCacheProvider({endpoint, key, database, container?}) — implements core's CacheProvider. Per-call TTL + delete(key); pairs with the chat engine for completion + embedding caching.
  • CosmosDataSource.searchVector(queryEmbedding, opts?) — low-level escape hatch for hosts that want to bypass cosmosVectorEmbeddingStore and query a vector index directly. Targets the units container by default; pass {container: 'inferred_edges'} for the inferred-edge index.

See RAG → Indexing for how to wire cosmosVectorEmbeddingStore + cosmosInferredEdgeStore into a GraphIndexer, and the package README for the full provisionVectorContainers options (vector index type, distance function, embedding path, data type).

Provisioning auto-includes & excludes (0.3.3 / 0.3.4)

Cosmos rejects an indexing policy that defines a vector index on a path which is not excluded from the regular indexer, and it rejects any policy missing the mandatory / root path among includedPaths. provisionVectorContainers handles both automatically:

  • 0.3.3: the embedding path (default /embedding) is auto-added to excludedPaths on every container the function provisions. Pre-0.3.3 the policy was rejected with a misleading "capability not enabled" error.
  • 0.3.4: the inferred_edges container's policy gains the catch-all includedPaths: [{path: '/*'}] automatically. Pre-0.3.4 container creation failed with "the special mandatory indexing path '/' is not provided". mergeVectorPolicy also defaults includedPaths for safety when callers pass a partial policy.

Net effect: pass { endpoint, key, database, unitsContainer, inferredEdgesContainer? } and the function produces a working policy. No manual includedPaths / excludedPaths wiring required.

S

SQL DataSource

@inferagraph/sql

Connect to PostgreSQL, MySQL, SQLite, or MSSQL via Knex. Supports automatic table creation with autoMigrate and stores node attributes as JSON columns.

import { sqlDataSource } from '@inferagraph/sql';
import { GraphProvider, InferaGraph } from '@inferagraph/core/react';

const ds = sqlDataSource({
  dialect: 'postgres',         // or 'mysql', 'sqlite', 'mssql'
  connection: process.env.DATABASE_URL,
  autoMigrate: true,           // auto-create tables
});
await ds.connect();

// Use as a DataAdapter
<GraphProvider adapter={ds}>
  <InferaGraph />
</GraphProvider>

Vector + RAG + Cache + Conversation building blocks

The package now ships SQL-backed implementations of every RAG storage interface, useful when your app already runs against a relational database:

  • provisionSqlSchemas({dialect, connection, tables?}) — idempotent setup of the embedding, inferred-edge, conversation, and cache tables in one call.
  • sqlVectorEmbeddingStore({dialect, connection, table?}) — implements EmbeddingStore. Vector data persisted as a float blob; in-process cosine for ranking. Suitable up to low-millions of nodes.
  • sqlInferredEdgeStore({dialect, connection, table?}) — implements InferredEdgeStore.
  • sqlConversationStore({dialect, connection, table?}) — implements ConversationStore with per-row TTL.
  • sqlCacheProvider({dialect, connection, table?}) — implements CacheProvider; pairs with the chat engine for completion + embedding caching.

See Caching for the cache contract and RAG → Storage for the embedding / inferred-edge / conversation contracts.

F

File DataSource

@inferagraph/file

Load nodes from CSV, TSV, or a folder of Markdown files; the host supplies edges directly at construction time. Markdown bodies are exposed via getContent as contentType: 'markdown', and CSV/TSV columns named in contentFields are surfaced the same way for tooltip and node-template injection.

import { fileDataSource } from '@inferagraph/file';
import { GraphProvider, InferaGraph } from '@inferagraph/core/react';

const ds = fileDataSource({
  type: 'markdown',           // Or 'csv' / 'tsv' with header rows or explicit column names
  path: './data/units',        // folder of .md files (or a .csv/.tsv file)
  edges: relationships,        // host-supplied EdgeData[]
  frontmatter: ['id', 'name', 'type'], // strict-match required keys
  idField: 'id',
});
await ds.connect();

// Use as a DataAdapter
<GraphProvider adapter={ds}>
  <InferaGraph />
</GraphProvider>
L

Log Analytics DataSource

@inferagraph/log-analytics

Query Azure Log Analytics via @azure/monitor-query. The host supplies KQL queries plus a column-mapping config, and rows are mapped into NodeData / EdgeData. Three auth modes are supported: app-registration (tenant / client / secret), managed-identity (for Azure-hosted apps), and apim (a plain fetch against an Azure API Management endpoint that handles auth and rate-limiting upstream). The first two are SSR-only—secrets and managed identities never ship to the browser—while APIM mode is safe for client-side use. findPath and getNeighbors fall back to application-level BFS when queries.neighbors is unset.

import { logAnalyticsDataSource } from '@inferagraph/log-analytics';
import { GraphProvider, InferaGraph } from '@inferagraph/core/react';

const ds = logAnalyticsDataSource({
  workspaceId: '<workspace-guid>',
  workspaceName: '<workspace-name>',
  auth: {
    kind: 'app-registration',  // Or { kind: 'apim', endpoint, headers, buildRequest }
    tenantId: process.env.AZURE_TENANT_ID,
    clientId: process.env.AZURE_CLIENT_ID,
    clientSecret: process.env.AZURE_CLIENT_SECRET,
  },
  queries: {
    nodes: 'AppDependencies | summarize by AppRoleName | project EntityId = AppRoleName, EntityName = AppRoleName, EntityType = "service"',
    edges: 'AppDependencies | project EdgeId = Id, SourceId = AppRoleName, TargetId = Target, EdgeType = Type',
    // optional: node, neighbors, search, filter, content
  },
  mapping: {
    nodes: { idColumn: 'EntityId', typeColumn: 'EntityType' },
    edges: { idColumn: 'EdgeId', sourceColumn: 'SourceId', targetColumn: 'TargetId', typeColumn: 'EdgeType' },
  },
  timespan: { duration: 'P1D' },
});
await ds.connect();

// Use as a DataAdapter (SSR only — secrets must not ship to the browser)
<GraphProvider adapter={ds}>
  <InferaGraph />
</GraphProvider>

Composing Datasources

Mix and match datasources by spreading one adapter and overriding specific methods. For example, use Gremlin for graph traversal and Cosmos DB for rich content.

import { gremlinDataSource } from '@inferagraph/gremlin';
import { cosmosDataSource } from '@inferagraph/cosmosdb';
import { GraphProvider, InferaGraph } from '@inferagraph/core/react';

// Compose datasources: graph from Gremlin, content from Cosmos DB
const graphDs = gremlinDataSource({ /* ... */ });
const contentDs = cosmosDataSource({ /* ... */ });

const composedAdapter: DataAdapter = {
  ...graphDs,
  getContent: (id) => contentDs.getContent(id),
};

<GraphProvider adapter={composedAdapter}>
  <InferaGraph />
</GraphProvider>

Custom DataSource

Extend the DataSource base class to build your own server-side data source for any backend.

import { DataSource } from '@inferagraph/core/data';
import { GraphProvider, InferaGraph } from '@inferagraph/core/react';

class BibleDataSource extends DataSource {
  async connect() { /* open connection */ }
  async disconnect() { /* close connection */ }
  isConnected() { return true; }

  async getInitialView() {
    // Return the initial graph for your data source
    return { nodes: [...], edges: [...] };
  }
  // Implement remaining DataAdapter methods...
}

const ds = new BibleDataSource();
await ds.connect();

<GraphProvider adapter={ds}>
  <InferaGraph />
</GraphProvider>