Training Example: AfterQuery – Review the Data, Give Your Score & Compare to the Real AI Evaluation

Industry Context — Common BS Fingerprints in Science, Research & Laboratories
Generic Claims: world-class research, pioneering scientific breakthroughs, advancing knowledge, trusted by leading institutions…
Red Flags: accreditation claims without certificate numbers, no publication record for research claims, unnamed scientists or researchers, breakthrough claims without peer review…
Semantic Drift Patterns: homepage claims cutting-edge but equipment list is dated, claims accredited but no accreditation schedule or scope shown, research claims but no publication list, claims GLP but no regulatory inspection history…
Proof Expectations: accreditation certificate numbers and scope (ISO 17025, GLP), publication list with peer-reviewed journal citations, named principal investigators with verifiable track records, specific equipment list with calibration status…

AfterQuery

(https://afterquery.com) 📸 Data Snapshot: June 21, 2026

Analyze the raw signals below. How would a machine score this business’s credibility?

Here are the exact signals captured from up to six pages of the site — the same raw inputs the evaluation engine analyzed. They are grouped by signal type so you can weigh each the way the machine does.

🏗️ Semantic Structure — heading hierarchy & page identity (Info Density · Commodity Fingerprint)
HOMEPAGE AfterQuery (https://afterquery.com)
Title

AfterQuery

Meta

AfterQuery is an applied research lab curating data solutions to accelerate foundation model development.

H1 We teach machines how experts think.
H2 AI researchers and enterprises are hitting walls with suboptimal data solutions.
H2 We turn real-world work into training data.
H2 Research
H2 Lab
H2 Company
H2 Social
H2 Terms & Policies
H3 How We Improved Terminal-Bench 2.0 Scores by Over 5x Using Tinker and Harbor
H3 Human expertise, reimagined
H3 Solving the Last Mile Problem in Partnership with The Raine Group
H5 Careers
NAV_HEADER_HEADING_REPEATED_BODY_FOOTER AfterQuery (https://afterquery.com/research/)
Title

AfterQuery

Meta

AfterQuery is an applied research lab curating data solutions to accelerate foundation model development.

H1 Data quality makes all the difference.
H2 SpreadsheetBench 2
H2 IDE-Bench
H2 How we achieved a net win-loss margin of +21.4% on GDPval with on-policy distillation
H2 Why DeployCo and ServiceCo Are Betting on the Last Mile
H2 Solving the Last Mile Problem in Partnership with The Raine Group
H2 Human expertise, reimagined
H2 How AfterQuery Expert Data Drives Model Performance on τ²-bench
H2 How We Improved Terminal-Bench 2.0 Scores by Over 5x Using Tinker and Harbor
H2 IDE-Bench: Evaluating Large Language Models as IDE Agents
H2 Market-Bench: Evaluating LLMs on Introductory Quantitative Trading
H2 App-Bench: Evaluating Coding Agents on Generating Economically Useful Web-Apps
H2 The AfterQuery Thesis
H2 UI-Bench: A Benchmark for Evaluating User Interface Understanding
H2 FinanceQA: A Benchmark for Assumption-Based Financial Analysis
H2 Core Research Areas
H2 Lab
H2 Company
H2 Social
H2 Terms & Policies
H3 Computer Use
NAV_HEADER_HEADING_REPEATED_BODY_FOOTER AfterQuery (https://afterquery.com/careers/)
Title

AfterQuery

Meta

AfterQuery is an applied research lab curating data solutions to accelerate foundation model development.

H1 Shape how AI learns.
H2 Lab
H2 Company
H2 Social
H2 Terms & Policies
H3 Engineering
H3 Growth
H3 Internal Ops
H3 Operations
H3 Research
H3 Revenue
H4 Benefits
NAV_HEADER_HEADING_REPEATED_FOOTER AfterQuery (https://afterquery.com/leaderboard/)
Title

AfterQuery

Meta

AfterQuery is an applied research lab curating data solutions to accelerate foundation model development.

H1 Rigorous benchmarks, not cherry-picked results.
H2 Evaluating AI Agents on Software Engineering
H2 Introductory Quantitative Trading
H2 AI Web App Generation
H2 FinanceQA, Assumption-Based
H2 Lab
H2 Company
H2 Social
H2 Terms & Policies
📝 The Narrative — clean text per page (Info Density · Semantic Coherence)
HOMEPAGE (https://afterquery.com) AfterQuery
[H1] We teach machines how experts think.
|The future of AI won’t be trained on more data, it will be trained on better thinking.Get dataExplore researchBacked by angels fromPowering every frontier AI research labProblem
[H2] AI researchers and enterprises are hitting walls with suboptimal data solutions.
Today’s models can generate answers. But they struggle with real work. Because real work isn’t just outputs. It’s decisions, tradeoffs, and context. That knowledge doesn’t live on the internet — it lives inside experts.Expertise has never been captured. Until now.The most valuable knowledge isn’t written down. It exists in how professionals think — not just answers, but reasoning, decisions, tradeoffs, and context. We work with domain experts to capture that thinking, then structure it into training data models can learn from.Our solution
[H2] We turn real-world work into training data.
AfterQuery is an applied research lab curating data solutions for frontier foundation model development. Models trained on outputs plateau. Models trained on reasoning improve. We build datasets that reflect how experts actually solve problems — step by step, decision by decision.Our data includes:Supervised Fine-Tuning (SFT)High-quality prompt–response pairs and chain-of-thought reasoning traces — teaching models how to behave across complex tasks.Reinforcement Learning + RubricsExpert-designed prompts with grading frameworks for reasoning and code generation — turning subjective judgment into scalable reward signals.Agent Environments (API / MCP)Custom environments across APIs, tools, and services — enabling training and evaluation of agents in real workflows.Computer Use TrajectoriesHuman-demonstrated interactions across browser and desktop environments — teaching models to navigate and operate software end-to-end.
[H2] Research
Our approach starts with research: where exactly do models break down in real professional contexts? Why do these failure modes exist? We take a proactive stance — every domain has its own failure patterns.More research
[H3] How We Improved Terminal-Bench 2.0 Scores by Over 5x Using Tinker and Harbor
How expert-curated trajectories and tooling lifted Terminal-Bench 2.0 scores more than 5x — and what it says about training agents.Blog·Mar 31, 2026
[H3] Human expertise, reimagined
Capturing how experts think — turning real-world decisions, judgment, and workflows into training data models can learn from.Blog·Apr 9, 2026
[H3] Solving the Last Mile Problem in Partnership with The Raine Group
Encoding domain-specific excellence into forms machines can learn — so agents think and execute like real-world experts.Blog·Apr 28, 2026
[H5] Careers
We’re hiring for engineering, operations, and research roles to help us accelerate AI training data solutions. Join the team revolutionizing AI Research and Training.See open roles↗
2876 chars
SUB-PAGE (https://afterquery.com/research/) AfterQuery
Research
[H1] Data quality makes all the difference.
We’re driven by the conviction that model performance is fundamentally bounded by training data quality. Through expert collaboration, rigorous curation methodologies, and deep domain expertise, we research datasets that power tomorrow’s models.
[H2] SpreadsheetBench 2
Evaluating LLM agents on challenging, expert-curated, end-to-end spreadsheet tasks — financial modeling, debugging, and visualization in complex multi-sheet workbooks.View benchmark↗
[H2] IDE-Bench
A comprehensive framework for evaluating AI IDE agents on real-world software engineering tasks through an IDE-native tool interface.Read paper↗
[H2] How we achieved a net win-loss margin of +21.4% on GDPval with on-policy distillation
Michael E.Spencer M.·Jun 8, 2026Read blog↗
[H2] Why DeployCo and ServiceCo Are Betting on the Last Mile
Sam J.Agustin G.Drew·Jun 3, 2026Read blog↗
[H2] Solving the Last Mile Problem in Partnership with The Raine Group
Carlos G.Sam J.·Apr 28, 2026Read blog↗
[H2] Human expertise, reimagined
Spencer M.·Apr 9, 2026Read blog↗
[H2] How AfterQuery Expert Data Drives Model Performance on τ²-bench
Michael E.Spencer M.Arya F.·Apr 8, 2026Read blog↗
[H2] How We Improved Terminal-Bench 2.0 Scores by Over 5x Using Tinker and Harbor
Spencer M.Michael E.Carlos G.·Mar 31, 2026Read blog↗
[H2] IDE-Bench: Evaluating Large Language Models as IDE Agents
Spencer M.Jeff Y.Tiana C.·Jan 20, 2026Read paper↗
[H2] Market-Bench: Evaluating LLMs on Introductory Quantitative Trading
Abhay S.Sam J.Spencer M.·Dec 13, 2025Read paper↗
[H2] App-Bench: Evaluating Coding Agents on Generating Economically Useful Web-Apps
Andrew Z.Sam J.Spencer M.·Oct 25, 2025Read paper↗
[H2] The AfterQuery Thesis
Spencer M.·Oct 20, 2025Read blog↗
[H2] UI-Bench: A Benchmark for Evaluating User Interface Understanding
Sam J.Agustin G.Spencer M.·Aug 28, 2025Read paper↗
[H2] FinanceQA: A Benchmark for Assumption-Based Financial Analysis
Spencer M.Sam J.·Jan 30, 2025Read paper↗
[H2] Core Research Areas
1
[H3] Computer Use
We’ve created training data and reinforcement learning environments that teach AI agents to navigate real software workflows end-to-end, capturing judgment calls and edge cases that only experienced practitioners recognize.Computer UseMultimodalAI Safety & SecurityData Quality & CurationModel Evaluation
2351 chars
SUB-PAGE (https://afterquery.com/careers/) AfterQuery
Careers
[H1] Shape how AI learns.
Behind our exceptional training data is an equally exceptional team.With a founding team that left jobs at Goldman Sachs, McKinsey, Jane Street, Palantir, NVIDIA, Google, and top startups, we’re seeking exceptional talent to own one of the defining problems of our generation.All Jobs25 jobs
[H3] Engineering
6Software Engineer - Platform/Applied AI (Fullstack)San FranciscoFull-timeApply now↗Software Engineering InternSan FranciscoInternApply now↗Senior Software Engineer - Infrastructure & PlatformSan FranciscoFull-timeApply now↗Software Engineer - Security/InfrastructureSan FranciscoFull-timeApply now↗Software Engineer - RL Environments San FranciscoFull-timeApply now↗Engineering Manager San FranciscoFull-timeApply now↗
[H3] Growth
2Growth AssociateSan FranciscoFull-timeApply now↗Growth InternSan FranciscoInternApply now↗
[H3] Internal Ops
5Business Operations GeneralistSan FranciscoFull-timeApply now↗People Programs LeadSan FranciscoFull-timeApply now↗Business Talent Acquisition LeadSan FranciscoFull-timeApply now↗Technical Recruiter San FranciscoFull-timeApply now↗Talent Coordinator San FranciscoFull-timeApply now↗
[H3] Operations
6Strategic Projects LeadSan FranciscoFull-timeApply now↗Strategic Projects - Coding InternSan FranciscoInternApply now↗Strategic Projects AssociateSan FranciscoFull-timeApply now↗Strategic Projects Lead - CodingSan FranciscoFull-timeApply now↗Strategic Projects Associate - CodingSan FranciscoFull-timeApply now↗Strategic Projects InternSan FranciscoInternApply now↗
[H3] Research
2Research Scientist - Frontier DataSan FranciscoFull-timeApply now↗Research Scientist - Post TrainingSan FranciscoFull-timeApply now↗
[H3] Revenue
4Marketing LeadSan FranciscoFull-timeApply now↗Business Development San FranciscoFull-timeApply now↗Strategic FinanceSan FranciscoFull-timeApply now↗Customer EngagementSan FranciscoFull-timeApply now↗
[H4] Benefits
Health insuranceMedical, vision & dental401(k)With employer matchDaily mealsDaily UberEats stipendCommute coveredUber stipend for safe transportationWellness stipendMonthly — covers Equinox membership
2135 chars
SUB-PAGE (https://afterquery.com/leaderboard/) AfterQuery
Leaderboards
[H1] Rigorous benchmarks, not cherry-picked results.
Design custom evaluations that measure your specified model capabilities.Collaborate with usIDE-Bench
[H2] Evaluating AI Agents on Software Engineering
Assessing AI agents across real-world software engineering workflows—measuring how models navigate, reason, and execute complex development tasks.IDE-Bench·Jan 20, 2026Market-Bench
[H2] Introductory Quantitative Trading
Evaluating AI models on real-world market scenarios—measuring how they reason, predict, and make decisions under dynamic conditions.Market-Bench·Dec 13, 2025App-Bench
[H2] AI Web App Generation
A benchmark for evaluating how well AI coding agents can generate real web apps from a single natural language prompt. One-shot generations. Zero human edits.App-Bench·Oct 25, 2025FinanceArena
[H2] FinanceQA, Assumption-Based
Analyzing AI models on real-world financial analysis—measuring how they reason, interpret data, and make decisions under uncertainty.FinanceArena·Jan 30, 2025
1016 chars
🛡️ Trust Signals — reviews, proof links, trust-theatre flag (Trust & Proof)
38Review mentions (all pages)
0External proof links (all pages)
PageReviewsProof links
/ (home) 10 0
/research/ 12 0
/careers/ 8 0
/leaderboard/ 8 0
🔗 Identity & Technical Layer — schema JSON-LD: identity chains, entity gaps (Identity & Authority)
Homepage — no schema detected (entity gap)
/research/ — no schema detected (entity gap)
/careers/ — no schema detected (entity gap)
/leaderboard/ — no schema detected (entity gap)

Your Diagnosis

Before revealing the machine’s verdict, predict the BS score for each signal. Higher = more BS (more fluff, less verifiable substance). Drag each slider, then submit to compare your judgment against the engine.

Information Density 0 / 30
Read the Narrative & headings: do hard facts (prices, dates, numbers) outweigh fluff power-words?
Semantic Coherence 0 / 20
Compare the homepage promise against the sub-page reality. Do they hold the same line?
Trust & Proof 0 / 20
Weigh review mentions against actual external proof links. Claims without verification = theatre.
Commodity Fingerprint 0 / 15
Check headings & narrative against the industry clichés in the setup above.
Identity & Authority 0 / 15
Inspect the schema: is there real Organization/Person identity with sameAs links, or gaps?
Your predicted BS score 0 / 100
💡 Stuck? Reveal the heuristic lens — how the deterministic page-auditor reads each signal (no AI, pure pattern rules)

These are the structural rules a local, deterministic auditor applies — the same lens you can use to judge each signal. They describe what to look for, not this company’s result.

Information Density

Classify each sentence as substantive or hollow. Grounding markers — numbers, currencies, dates, technical units, named entities — outweigh marketing adjectives. When fluff sits right next to hard evidence, the fluff is forgiven.

Semantic Alignment

Pull the main entities out of the H1, then check whether they actually recur through the body. A page that announces one thing and then talks about another drifts. Headings with no real sentences underneath read as pseudo-substance.

Trust & Proof

Count trust words (review, testimonial, rating, verified) against real outbound proof links (Google, Trustpilot, Clutch, G2, Yelp). Lots of trust language with zero verification links is trust theatre. Unlinked logo galleries count against it.

Commodity Fingerprint

Look at how much sentence length varies. Natural writing varies its rhythm; templated or mass-produced copy is statistically uniform. Very low variation reads as commodity content — unless unique named entities break the pattern.

Identity & Authority

Inspect the JSON-LD. Is there an Organization or Person schema, and does it carry sameAs links to real external profiles (LinkedIn, socials)? Missing schema or no identity declaration signals an anonymous entity.

Want to apply this lens yourself? The free BS Indicator Chrome extension runs these heuristic checks live on any page. Bear in mind it is a single-page, deterministic tool — it relies only on pattern rules for the page in front of it and does not perform the cross-page semantic correlation this audit uses, so its readout is a starting lens, not the full verdict.

B
BS Level
Science, Research & Laboratories
34.3 Avg BS

Based on 126 businesses audited.

BS Detector

Science, Research & Laboratories BS: AfterQuery (afterquery.com)

https://afterquery.com 📍 Industry: Science, Research & Laboratories
30 BS / 100

AfterQuery is a legitimate technical powerhouse suffering from significant technical debt in its digital authority signals. While the research substance is granular and current (dated June 2026), the reliance on trust theatre counts and the complete absence of structured data creates a surface-level ‘BS’ signature that belies its actual intellectual depth.

Info Density Power-words vs. Substance ratio.
5
17% BS
Semantic Coherence Homepage promise vs. Sub-page reality.
0
0% BS
Trust & Proof Verifiable evidence vs. Trust Theatre.
11
55% BS
Commodity Fingerprint Detection of industry clichés/templates.
4
27% BS
Identity & Authority Expert verifiability & Schema depth.
10
67% BS

1. Deploy comprehensive Organization and Person schema to link named researchers to their academic and professional profiles. 2. Substantiate the ‘every frontier AI lab’ claim with a logo wall or specific, anonymized enterprise case studies. 3. Convert the ‘Read paper’ buttons into crawlable outbound links to Arxiv or DOI repositories to increase proof_links_count. 4. Reduce template-heavy language in the Benefits and Social sections to maintain the high-science tone.

The site aligns strongly with the Science, Research & Laboratories category, specifically as an applied AI research lab. The content focuses on benchmarking, technical data curation methodologies, and reinforcement learning environments rather than generic software services.

“The score of 30 is primarily driven by technical BS markers in the Trust and Proof (11) and Identity and Authority (10) pillars. The site's failure to implement structured data and its use of unverified review counts significantly outweigh its high-density technical content in the forensic calculation.”

Verified Analysis Date: June 21, 2026 © 1EuroSEO Independent Evaluator — Non-Sponsored Result