The mechanism: how ChatGPT picks sources
ChatGPT's search functionality is powered partly by Bing's index and partly by web crawls executed by OpenAI's own bots: GPTBot, OAI-SearchBot, and ChatGPT-User. When a user asks a question that triggers search mode, the system retrieves a small set of candidate pages, summarizes them, and surfaces 3-5 with citations.
Three things determine whether your page is in that candidate set:
First, whether OpenAI's bots can crawl your site at all. If your CDN blocks them (Cloudflare's default in 2026) or your robots.txt disallows them, you're invisible from the start.
Second, whether your content matches the query intent semantically. ChatGPT doesn't keyword-match the way Google does — it does semantic similarity over chunked content. Pages that directly answer the question, in clean prose, win.
Third, whether your domain is considered trustworthy. ChatGPT factors in Bing's trust signals (which roughly mirror Google's), citation patterns from existing trusted sources, and on-page signals like author identity, publication date, and structured data.
Step 1: Make sure GPTBot, OAI-SearchBot, and ChatGPT-User can crawl
Check your robots.txt. It should explicitly allow these user agents, or at minimum not block them.
User-agent: GPTBot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /
If you're on Cloudflare, also check the AI Audit settings. The default in 2026 is to block AI bots site-wide via the "Managed robots.txt" feature. Turn that off, or whitelist these three bots specifically.
Verify the change by fetching your robots.txt from a fresh browser tab. If you see the AI bot Disallow rules still showing, the CDN-level block hasn't been disabled.
Step 2: Structure your content for semantic extraction
ChatGPT performs best at extracting clear, declarative statements with concrete details. Marketing prose with vague claims gets ignored.
What works:
- Definitive statements with specific numbers ("A Shopify Basic store with under 50 products costs $29 per month plus 2.9 percent payment processing")
- Question-and-answer formatting (FAQPage schema paired with visible Q-and-A on the page)
- Tables with clear comparisons
- Numbered lists where order matters
- Direct first-paragraph answers (the inverted pyramid)
What doesn't work:
- Storytelling intros that delay the answer
- Vague pricing ("competitive rates")
- Conclusions hidden in long paragraphs
- Hedging language ("it depends," "results may vary")
Pages that read like Wikipedia entries get cited more than pages that read like brochures.
Step 3: Add the schema ChatGPT actually uses
The structured data fields ChatGPT and OpenAI's training pipelines pay attention to:
- Article / BlogPosting with `author` (Person), `datePublished`, `dateModified`, `wordCount`, `articleSection`
- FAQPage with `mainEntity` array of Question + Answer pairs that exactly match visible Q-and-A
- Organization with `sameAs` links to social profiles, Crunchbase, LinkedIn, Wikipedia (if applicable)
- Person with `jobTitle`, `knowsAbout`, `sameAs`, `worksFor` references
- BreadcrumbList so the system understands your site hierarchy
Validate every schema block against https://search.google.com/test/rich-results. ChatGPT honors the same rules Google does for valid versus invalid markup.
Step 4: Build citation footprint on sources ChatGPT trusts
ChatGPT weighs citations from sources it considers authoritative. The major ones for B2B and small business:
- Clutch.co (verified B2B service reviews)
- LinkedIn (especially company pages and personal profiles with consistent posting)
- Crunchbase (for startups and growing businesses)
- Wikipedia (extremely high trust, very hard to get on)
- Industry trade publications and association directories
- Local Chambers of Commerce
- Yelp and Google Business Profile (for local businesses)
- Reddit (surprisingly high trust for niche subreddits)
- Stack Overflow and GitHub (for technical content)
Get listed on the ones relevant to your category. Each citation increases the probability of being pulled into ChatGPT's candidate set for related queries.
Step 5: Wait, then measure
Realistic timeline:
- 1-2 weeks for ChatGPT's index to pick up your pages after crawler access is granted
- 4-8 weeks for the first citations to appear in answer results for your target queries
- 3-6 months for citation volume to be material if you keep publishing
Measure by manually testing queries you'd want to rank for. Open ChatGPT, ask the question, see if you get cited. Tools like Otterly, Profound, and HubSpot's AI Search Grader are emerging to automate this monitoring, but the simple manual test still works.
What doesn't work (don't waste time on)
Some things people try that don't move the needle:
- Forcing keywords into content unnaturally — semantic extraction doesn't reward this
- Buying backlinks — ChatGPT's trust signals are different from Google's link graph
- Submitting your URL directly to OpenAI — there's no submission portal, the bots find you
- Stuffing FAQPage schema with questions that don't appear visibly on the page (Google flags as spam, ChatGPT ignores)
- Hiding text from users while showing it to bots (cloaking) — modern AI engines detect and penalize this
The case for starting now
ChatGPT's user base is approaching 400 million weekly active users in 2026. A meaningful fraction of those users are doing product research, local recommendations, and how-to lookups that previously went to Google. Being cited in the AI answer is the new equivalent of ranking on page one of search.
Most of your competitors haven't started this work. The window is open. We offer it as a stand-alone service — see AEO for pricing and process.