TheSkinnyAI Crawler · Bot Documentation
Public documentation of our crawler's identity, behavior, and operations for Cloudflare verified bot review and for site owners.
Bot Purpose & Use Case
- Purpose: Discover and index public website content for customers who embed TheSkinnyAI assistant on their own domains. The indexed content is used to answer end‑user questions about the customer’s products and services.
- Scope: Public pages only; no login‑gated or paywalled content. We honor
robots.txt and customer‑provided exclusions (e.g., “Do Not Crawl URLs”).
- Intent: Benign and helpful. We do not perform competitive scraping, price scraping, or any activity intended to harm site performance or business interests.
- Targets: Customer websites that have explicitly onboarded to TheSkinnyAI. Discovery begins from a customer‑provided starting URL and/or
/sitemap.xml.
User‑Agent String
Our crawler identifies itself with the following User‑Agent string (or newer minor versions). Deployments may override this via the CRAWLER_USER_AGENT environment variable; the value below is the default.
TheSkinnyAI-Crawler/1.0 (+https://theskinnyai.com/bot-docs/)
Site owners and WAF rules can match on the product token TheSkinnyAI-Crawler in the User-Agent (custom deployments may use a different string via CRAWLER_USER_AGENT agreed with the site owner).
IP Addresses & ASN
For verified‑bot review requiring static egress, we can operate through a fixed IP set dedicated to TheSkinnyAI crawler.
- Status: Static egress IPs will be published here when enabled. If you require IP allow‑listing sooner, contact https://theskinnyai.com/contact or email support@theskinnyai.com to obtain a dedicated IP (or small CIDR) reserved for crawling.
- ASN: We do not operate our own ASN. If fixed egress is used, the ASN will correspond to our hosting/provider and will be documented alongside the published IPs.
Expected Behavior
- Robots compliance: We fetch and honor
/robots.txt (allow/disallow; crawl‑delay if provided, minimum 1‑second delay when specified).
- Crawl rate: 0.5 requests/second per domain by default. Rates can be lowered to 0.1 req/sec upon site‑owner request.
- Discovery: From a customer’s starting URL and, if permitted, from
/sitemap.xml. Explicit “Do Not Crawl” rules are enforced.
- Data handling & retention: We store only public page text, title, and structural metadata to support question‑answering for that customer. Data is encrypted at rest, retained for as long as the customer account is active, and removed within 30 days of contract termination.
- No evasion: We do not rotate identities to bypass bot protection. If blocked, we request allow‑listing or fall back to customer‑provided content.
- Managed fetch (opt‑in): For sites whose own WAF blocks our default crawler, with the customer's knowledge, requests for that site may originate from a managed‑fetch provider's IP space (currently Bright Data or ScrapingBee) rather than ours. This is activated on a per‑site basis by our operations team only when a block is confirmed, and the customer is informed. All such requests still identify as
TheSkinnyAI-Crawler via User‑Agent where the provider supports it.
Contact & Verification
Compliance & Policy
- We adhere to Cloudflare’s Verified Bots policy and site owners’ robots and rate‑limit preferences. Breaches lead to immediate remediation and potential removal of a domain from crawling.
- We keep this page updated to reflect any changes to identity (User‑Agent), egress IPs, or behavior.
Cloudflare Allow‑List Guidance
To allow our bot in Cloudflare (Security > WAF > Firewall Rules or Security > Bots), use:
Expression: (http.user_agent contains "TheSkinnyAI-Crawler")
Action: Allow
Features: Bypass Bot Management, Skip Managed Challenges
Optionally, allow‑list /sitemap.xml and specific content paths (e.g., /team, /advisors) to ensure discovery and ingestion succeed.
Change Log
- 2026‑04‑21: Added managed‑fetch opt‑in disclosure (Bright Data, ScrapingBee) for WAF‑blocked sites.
- 2026‑04‑21: User-Agent aligned with production default (
TheSkinnyAI-Crawler/1.0); documented optional CRAWLER_USER_AGENT override.
- 2025‑10‑05: Initial publication of bot documentation.
This page is the official Bot Documentation URL for TheSkinnyAI crawler and may be referenced in Cloudflare’s verified bot submission.