Browse docs

Connectors

Tap to expand

Getting Started

Core Concepts

API

Auth1 page

API Auth Caller Model

Memory5 pages

Index API1 page

Index API

Context and Sources2 pages

Search and Operations2 pages

SDK

Quickstart1 page

SDK Quickstart

Scoping1 page

User and Session Scoping

Modules4 pages

Adapters2 pages

Migration1 page

Migration: RetainDBClient to RetainDB

MCP

Setup3 pages

Primary Tools1 page

Semantic Search Tools

Security and Scope1 page

Security and Scope Controls

Integrations

Frameworks4 pages

Agent Hosts2 pages

Connectors

Web5 pages

Knowledge Bases6 pages

Structured Sources4 pages

Packages and Research4 pages

Dashboard

Overview2 pages

Sources2 pages

Workflows3 pages

Developer1 page

Dev: Keys, SDK, and MCP

Tutorials

Migrations

Operations

Legacy

Legacy Documentation

Contribute

Contributing

ConnectorsUpdated 2026-03-18

Sitemap Connector

Ingest a site from its sitemap when you want a cleaner, more deterministic web import than an open-ended crawl.

Use the sitemap connector when the target site already exposes a useful sitemap.xml and you want RetainDB to follow that explicit inventory instead of crawling links dynamically.

This is usually the best web connector for documentation sites with a maintained sitemap.

Use this connector when

the site has a valid sitemap
you want more control than a crawler gives you
you want to avoid crawling navigation dead ends or irrelevant linked pages

Create the source

bash

curl -X POST "https://api.retaindb.com/v1/projects/proj_123/sources" \
  -H "Authorization: Bearer $RETAINDB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Acme Sitemap",
    "connector_type": "sitemap",
    "config": {
      "sitemap_url": "https://acme.com/sitemap.xml",
      "limit": 2000
    }
  }'

Why teams choose sitemap over crawl

Sitemaps are more predictable.

They help when you want:

a known page inventory
fewer accidental pages
simpler debugging if expected content is missing

Start sync and check status

bash

curl -X POST "https://api.retaindb.com/v1/sources/src_123/sync" \
  -H "Authorization: Bearer $RETAINDB_API_KEY"

bash

curl "https://api.retaindb.com/v1/sources/src_123/status" \
  -H "Authorization: Bearer $RETAINDB_API_KEY"

Common mistakes

Bad sitemap assumptions

Some sites expose a sitemap, but it does not actually contain the pages you care about. Check the sitemap before troubleshooting.

Oversized imports

If the sitemap is huge, start with a smaller limit and validate quality first.

Using sitemap for one page

If you only need one document, the URL connector is simpler.

Next step

If the site does not have a good sitemap, use web crawler. If you want to validate the project and source lifecycle first, review projects and sources.

Was this page helpful?

Your feedback helps us prioritize docs improvements weekly.