Anyone can hit a fast number on one Tuesday afternoon. The question that matters is whether your last deploy made things faster, slower, or about the same.
What a baseline is in Site Check
A baseline is a previous L2 or L3 cert you’ve marked as the reference. Mark it once via the “Set baseline” button on the cert page, and every future run on the same URL gets compared against it.
What the comparison view shows
Three columns side by side:
- Then — baseline numbers (date, build, commit if you tagged it)
- Now — latest run
- Δ — delta, colour-coded green (improvement), amber (within tolerance), red (regression)
Five metrics get compared automatically: score, average response, P95, error rate, slowest page. You can add custom ones — “checkout average”, “search P95” — by tagging pages in your test config.
What to look at first
In order:
- P95 delta. If P95 is 30% slower than baseline, you have a real regression even if the average looks fine. The slow tail is where users churn.
- Slowest page. If a different page is now your slowest than was at baseline, something specific has degraded — narrow it.
- Error rate. Even +0.5% is significant if your baseline was effectively zero.
- Score. Last because it’s a composite — useful as a top-level indicator, less useful for diagnosis.
What “within tolerance” means
We mark deltas amber when they’re inside the noise envelope of typical run-to-run variance — usually ±15%. Below that is signal noise; above is a real change. Don’t waste time investigating amber unless the trend is consistent over three or more runs.
Common gotchas
Time-of-day effects. If your baseline ran at 3 AM and your latest at 3 PM, the comparison includes traffic-shape differences your code didn’t cause. Run baselines at the time-of-day you actually care about.
Page set drift. If your sitemap changed since baseline (new pages, removed pages), the pages we walked differ. We try to align by structure_hash but some drift is unavoidable.
Third-party changes. A 200 ms regression might be a deploy you shipped or it might be a Stripe/Auth0/Segment outage that day. Cross-reference the third-party status pages before blaming yourself.
A note on Assure publishing
If you publish to Assure (our SaaS dashboard), the baseline comparison auto-syncs there too. Set a baseline once in Site Check, and your dev team sees the delta on every subsequent run without re-pulling SQL.
For specific comparison gotchas, Reading results Community board.