GitHub Connector
Sync GitHub repositories, issues, and documentation into RetainDB for intelligent code search and documentation retrieval.
The GitHub connector indexes your repository content, enabling AI-powered search across code, issues, PRs, and documentation.
Use Cases
- Code Search — Find relevant code snippets across repositories
- Documentation Retrieval — Query internal docs alongside code
- Issue Context — Ground AI responses in existing issues and discussions
- Onboarding — Help new developers find relevant code and docs
Prerequisites
-
GitHub Personal Access Token with these scopes:
repo(full repository access)read:org(if using organization repos)read:user(for user-specific content)
-
Repository Access — Token must have access to target repos
Configuration
Creating a GitHub Source
curl -X POST "https://api.retaindb.com/v1/projects/proj_abc123/sources" \
-H "Authorization: Bearer $RETAINDB_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "My Company Docs",
"connectorType": "github",
"config": {
"owner": "acme-corp",
"repo": "platform-api",
"branch": "main",
"paths": ["docs/", "src/"],
"include_issues": true,
"include_prs": true,
"include_wiki": false
}
}'Configuration Options
| Option | Type | Description | Default |
|---|---|---|---|
owner | string | GitHub organization or username | Required |
repo | string | Repository name | Required |
branch | string | Branch to sync | main |
paths | array | Paths to include (glob patterns) | ["**/*"] |
exclude_paths | array | Paths to exclude | ["node_modules/", "dist/"] |
include_issues | boolean | Index GitHub issues | true |
include_prs | boolean | Index pull requests | true |
include_wiki | boolean | Index wiki pages | false |
sync_mode | string | full or incremental | incremental |
Syncing
Trigger Initial Sync
# Create source first, then trigger sync
SOURCE_ID="src_xyz789"
curl -X POST "https://api.retaindb.com/v1/sources/$SOURCE_ID/sync" \
-H "Authorization: Bearer $RETAINDB_API_KEY"Response
{
"id": "job_abc123",
"source_id": "src_xyz789",
"status": "queued",
"created_at": "2026-03-07T12:00:00Z"
}Check Sync Status
curl "https://api.retaindb.com/v1/sync-jobs/job_abc123" \
-H "Authorization: Bearer $RETAINDB_API_KEY"Response
{
"id": "job_abc123",
"source_id": "src_xyz789",
"status": "completed",
"progress": {
"files_indexed": 450,
"issues_indexed": 125,
"prs_indexed": 78
},
"started_at": "2026-03-07T12:00:00Z",
"completed_at": "2026-03-07T12:05:00Z",
"error": null
}Status Values
| Status | Description |
|---|---|
queued | Waiting to start |
running | Currently syncing |
completed | Successfully finished |
failed | Error occurred |
cancelled | Cancelled by user |
Incremental Sync
By default, the connector uses incremental sync, only fetching changes since the last sync:
# Trigger incremental sync (usually happens automatically)
curl -X POST "https://api.retaindb.com/v1/sources/src_xyz789/sync" \
-H "Authorization: Bearer $RETAINDB_API_KEY" \
-d '{"sync_mode": "incremental"}'Manual Full Sync
Forced full re-sync:
curl -X POST "https://api.retaindb.com/v1/sources/src_xyz789/sync" \
-H "Authorization: Bearer $RETAINDB_API_KEY" \
-d '{"sync_mode": "full"}'Searching Synced Content
After sync completes, search your code:
curl -X POST "https://api.retaindb.com/v1/memory/search" \
-H "Authorization: Bearer $RETAINDB_API_KEY" \
-d '{
"user": "developer@example.com",
"query": "authentication implementation",
"filters": {
"source": "github:acme-corp/platform-api"
},
"topK": 10
}'Search Results
{
"results": [
{
"id": "mem_abc123",
"content": "async function authenticateUser(email: string, password: string) {\n // Implementation here\n}",
"source": "github:acme-corp/platform-api",
"source_type": "code",
"file_path": "src/auth/login.ts",
"score": 0.94
}
]
}Filtering by Source Type
Filter search results by content type:
curl -X POST "https://api.retaindb.com/v1/memory/search" \
-H "Authorization: Bearer $RETAINDB_API_KEY" \
-d '{
"user": "developer@example.com",
"query": "API endpoint",
"filters": {
"source": "github:acme-corp/platform-api",
"source_type": "issue"
}
}'Available Source Types
| Type | Description |
|---|---|
code | Source code files |
issue | GitHub issues |
pr | Pull requests |
wiki | Wiki pages |
readme | README files |
Webhooks
Configure webhooks to trigger syncs on GitHub events:
# Create webhook
curl -X POST "https://api.retaindb.com/v1/webhooks" \
-H "Authorization: Bearer $RETAINDB_API_KEY" \
-d '{
"url": "https://your-server.com/github-webhook",
"events": ["github.push", "github.pull_request"],
"source_id": "src_xyz789"
}'Then configure GitHub to send webhook events to your endpoint.
Troubleshooting
401/403 Errors
Cause: Token is invalid or lacks permissions
Solution:
- Verify token has required scopes
- Check token hasn't expired
- Ensure repo is accessible to token owner
Empty Sync Results
Cause: No matching files found
Solution:
- Check
pathsconfiguration includes correct directories - Verify
exclude_pathsisn't too aggressive - Confirm branch name is correct
Rate Limiting
Cause: GitHub API rate limit exceeded
Solution:
- Use a GitHub App with higher rate limits
- Reduce sync frequency
- Contact RetainDB for dedicated rate limits
Large Repository Performance
For large repos (>10,000 files):
{
"config": {
"paths": ["src/", "lib/"],
"exclude_paths": ["node_modules/", "dist/", "*.test.ts"],
"max_files": 5000
}
}Best Practices
1. Use Path Filters
Don't sync everything:
{
"paths": ["src/", "docs/", "README.md"],
"exclude_paths": ["node_modules/", "dist/", ".git/"]
}2. Schedule Regular Syncs
Set up automatic incremental syncs:
# In your CI/CD or cron
curl -X POST "https://api.retaindb.com/v1/sources/src_xyz789/sync" \
-H "Authorization: Bearer $RETAINDB_API_KEY"3. Monitor Sync Status
Track sync health:
# Get recent sync jobs
curl "https://api.retaindb.com/v1/sources/src_xyz789/sync-jobs?limit=10" \
-H "Authorization: Bearer $RETAINDB_API_KEY"Next step
Was this page helpful?
Your feedback helps us prioritize docs improvements weekly.