Skip to content

Data sources

Overview#

Data sources are how your chatbot learns. There are two types:

  • Website crawls — Automatically fetch and index content from your website
  • Q&A entries — Manually curated question-and-answer pairs

Both feed into the same knowledge base and are used together when answering questions. You can use one or both depending on your needs.

Website crawls#

Creating a crawl#

Navigate to Data Sources in your chatbot sidebar and click Create crawl. Configure the following fields:

Field Description
Name A label for this crawl (e.g., "Main docs", "Blog posts")
Start URL The page to begin crawling from
Include paths Only crawl URLs matching these paths (e.g., /docs, /help). Leave empty to crawl everything under the domain.
Exclude paths Skip URLs matching these paths (e.g., /docs/internal, /admin)
Depth (1-5) How many link levels to follow from the start URL. Depth 1 = only the start page. Depth 2 = start page + pages it links to. Default is 2.
Max pages per run The maximum number of pages to crawl in one run. Limited by your remaining monthly crawl credits.

Tip: Start with a small depth (2) and narrow include paths. You can always re-crawl with broader settings once you've verified the results.

Understanding crawl stages#

Each crawl run progresses through a series of stages:

Stage What's happening
Crawling Fetching pages from your website
Processing Extracting text and splitting into chunks
Indexing Generating embeddings and building the search index
Finished Ready — your chatbot can now answer questions using this content
Failed Something went wrong (check the failure reason in the run details)

Monitoring a crawl#

While a crawl is running, you can track its progress in real time:

  • Progress counters — Pages discovered, fetched, processed, embedded, and failed
  • Insights — Duration, fetch rate, average text length, and HTTP status breakdown
  • Crawled pages table — Every page with its URL, title, HTTP status code, and text statistics
  • Run history — View all previous runs for a crawl to compare results over time

Re-crawling#

Click Refresh on a crawl to start a new run. This re-fetches all pages and rebuilds the knowledge base for that crawl. Use this after you update your website content to keep your chatbot's answers current.

Crawl credits#

Each page fetched consumes one crawl credit from your monthly allowance. Credits are shared across all chatbots in your organization, so plan your crawl settings accordingly.

See Plans for credit limits on each plan.

Q&A entries#

What are Q&A entries?#

Q&A entries are manually curated question-and-answer pairs that you add directly to your chatbot's knowledge base. Use them to:

  • Handle FAQs your website doesn't cover
  • Override or supplement crawled content with more precise answers
  • Add responses for common sales or support questions
  • Fill gaps while your site content catches up

Creating a Q&A entry#

Navigate to Data Sources and open the Q&A tab. Click Add entry and fill in the following:

Field Required Description
Question Yes The question visitors might ask
Answer Yes The response the chatbot should give
Additional context No Background information that helps the chatbot understand when to use this answer

Q&A entries are indexed immediately after you save them — no crawl run needed.

Limits#

The number of Q&A entries you can create depends on your plan. See Plans for details.

How crawls and Q&A work together#

When a visitor asks a question, HyperHelp searches both crawled content and Q&A entries to find the best answer. Q&A entries receive a slight relevance boost, so they're preferred when both sources match equally well.

This means you can use Q&A entries to fine-tune your chatbot's responses without modifying your website content. Crawled pages provide broad coverage, and Q&A entries handle the specifics.