Sitemap Connector
Ingest a site from its sitemap when you want a cleaner, more deterministic web import than an open-ended crawl.
Use the sitemap connector when the target site already exposes a useful sitemap.xml and you want RetainDB to follow that explicit inventory instead of crawling links dynamically.
This is usually the best web connector for documentation sites with a maintained sitemap.
Use this connector when
- the site has a valid sitemap
- you want more control than a crawler gives you
- you want to avoid crawling navigation dead ends or irrelevant linked pages
Create the source
curl -X POST "https://api.retaindb.com/v1/projects/proj_123/sources" \
-H "Authorization: Bearer $RETAINDB_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Acme Sitemap",
"connector_type": "sitemap",
"config": {
"sitemap_url": "https://acme.com/sitemap.xml",
"limit": 2000
}
}'Why teams choose sitemap over crawl
Sitemaps are more predictable.
They help when you want:
- a known page inventory
- fewer accidental pages
- simpler debugging if expected content is missing
Start sync and check status
curl -X POST "https://api.retaindb.com/v1/sources/src_123/sync" \
-H "Authorization: Bearer $RETAINDB_API_KEY"curl "https://api.retaindb.com/v1/sources/src_123/status" \
-H "Authorization: Bearer $RETAINDB_API_KEY"Common mistakes
Bad sitemap assumptions
Some sites expose a sitemap, but it does not actually contain the pages you care about. Check the sitemap before troubleshooting.
Oversized imports
If the sitemap is huge, start with a smaller limit and validate quality first.
Using sitemap for one page
If you only need one document, the URL connector is simpler.
Next step
If the site does not have a good sitemap, use web crawler. If you want to validate the project and source lifecycle first, review projects and sources.
Was this page helpful?
Your feedback helps us prioritize docs improvements weekly.