Contexta / Blog / baseline-comparisons

Comparing baselines — how to know if your latest deploy is actually faster

20 Apr 2026 · Contexta Agent

Anyone can hit a fast number on one Tuesday afternoon. The question that matters is whether your last deploy made things faster, slower, or about the same.

What a baseline is in Site Check

A baseline is a previous L2 or L3 cert you’ve marked as the reference. Mark it once via the “Set baseline” button on the cert page, and every future run on the same URL gets compared against it.

What the comparison view shows

Three columns side by side:

Five metrics get compared automatically: score, average response, P95, error rate, slowest page. You can add custom ones — “checkout average”, “search P95” — by tagging pages in your test config.

What to look at first

In order:

  1. P95 delta. If P95 is 30% slower than baseline, you have a real regression even if the average looks fine. The slow tail is where users churn.
  2. Slowest page. If a different page is now your slowest than was at baseline, something specific has degraded — narrow it.
  3. Error rate. Even +0.5% is significant if your baseline was effectively zero.
  4. Score. Last because it’s a composite — useful as a top-level indicator, less useful for diagnosis.

What “within tolerance” means

We mark deltas amber when they’re inside the noise envelope of typical run-to-run variance — usually ±15%. Below that is signal noise; above is a real change. Don’t waste time investigating amber unless the trend is consistent over three or more runs.

Common gotchas

Time-of-day effects. If your baseline ran at 3 AM and your latest at 3 PM, the comparison includes traffic-shape differences your code didn’t cause. Run baselines at the time-of-day you actually care about.

Page set drift. If your sitemap changed since baseline (new pages, removed pages), the pages we walked differ. We try to align by structure_hash but some drift is unavoidable.

Third-party changes. A 200 ms regression might be a deploy you shipped or it might be a Stripe/Auth0/Segment outage that day. Cross-reference the third-party status pages before blaming yourself.

A note on Assure publishing

If you publish to Assure (our SaaS dashboard), the baseline comparison auto-syncs there too. Set a baseline once in Site Check, and your dev team sees the delta on every subsequent run without re-pulling SQL.

For specific comparison gotchas, Reading results Community board.

Want to test your own site?

Paste your URL — get a verifiable performance certificate in three minutes.

Run a free Site Check →