Data sources

Overview#

Data sources are how your chatbot learns. There are two types:

Website crawls — Automatically fetch and index content from your website
Q&A entries — Manually curated question-and-answer pairs

Both feed into the same knowledge base and are used together when answering questions. You can use one or both depending on your needs.

Website crawls#

Creating a crawl#

Navigate to Data Sources in your chatbot sidebar and click Create crawl. Configure the following fields:

Field	Description
Name	A label for this crawl (e.g., "Main docs", "Blog posts")
Start URL	The page to begin crawling from
Include paths	Only crawl URLs matching these paths (e.g., `/docs`, `/help`). Leave empty to crawl everything under the domain.
Exclude paths	Skip URLs matching these paths (e.g., `/docs/internal`, `/admin`)
Depth (1-5)	How many link levels to follow from the start URL. Depth 1 = only the start page. Depth 2 = start page + pages it links to. Default is 2.
Max pages per run	The maximum number of pages to crawl in one run. Limited by your remaining monthly crawl credits.

Tip: Start with a small depth (2) and narrow include paths. You can always re-crawl with broader settings once you've verified the results.

Understanding crawl stages#

Each crawl run progresses through a series of stages:

Stage	What's happening
Crawling	Fetching pages from your website
Processing	Extracting text and splitting into chunks
Indexing	Generating embeddings and building the search index
Finished	Ready — your chatbot can now answer questions using this content
Failed	Something went wrong (check the failure reason in the run details)

Monitoring a crawl#

While a crawl is running, you can track its progress in real time:

Progress counters — Pages discovered, fetched, processed, embedded, and failed
Insights — Duration, fetch rate, average text length, and HTTP status breakdown
Crawled pages table — Every page with its URL, title, HTTP status code, and text statistics
Run history — View all previous runs for a crawl to compare results over time

Re-crawling#

Click Refresh on a crawl to start a new run. This re-fetches all pages and rebuilds the knowledge base for that crawl. Use this after you update your website content to keep your chatbot's answers current.

Crawl credits#

Each page fetched consumes one crawl credit from your monthly allowance. Credits are shared across all chatbots in your organization, so plan your crawl settings accordingly.

See Plans for credit limits on each plan.

Q&A entries#

What are Q&A entries?#

Q&A entries are manually curated question-and-answer pairs that you add directly to your chatbot's knowledge base. Use them to:

Handle FAQs your website doesn't cover
Override or supplement crawled content with more precise answers
Add responses for common sales or support questions
Fill gaps while your site content catches up

Creating a Q&A entry#

Navigate to Data Sources and open the Q&A tab. Click Add entry and fill in the following:

Field	Required	Description
Question	Yes	The question visitors might ask
Answer	Yes	The response the chatbot should give
Additional context	No	Background information that helps the chatbot understand when to use this answer

Q&A entries are indexed immediately after you save them — no crawl run needed.

Limits#

The number of Q&A entries you can create depends on your plan. See Plans for details.

How crawls and Q&A work together#

When a visitor asks a question, HyperHelp searches both crawled content and Q&A entries to find the best answer. Q&A entries receive a slight relevance boost, so they're preferred when both sources match equally well.

This means you can use Q&A entries to fine-tune your chatbot's responses without modifying your website content. Crawled pages provide broad coverage, and Q&A entries handle the specifics.