technicalengineering· 2026-04-20· 6 min read

Why Your AI Grammar Checker Says "Perfect" When It's Not

The grammar tool that said "pass" while listing 8 errors

A grammar tool that says "your writing is clean" while listing 8 errors is not a grammar tool. It's a liar. Here's how we stopped doing that.

CheckApp's grammar skill shipped in v1.1 with a real verdict bug. The LLM-fallback path scored the article, subtracted 3 points per finding, and then applied this logic:

// v1.1 — the broken version
const score = Math.max(0, 100 - findings.length * 3);
const verdict = score >= 75 ? "pass" : score >= 50 ? "warn" : "fail";

With 8 findings, score = 76. Verdict: pass. The findings list was right there — 8 items, surfaced correctly. But the top-level verdict said everything was fine. A user who looked at the verdict and moved on would ship an article with 8 unaddressed grammar issues.

That's what v1.2 fixed. But the verdict bug wasn't the only one. While fixing it, we found four more. This post is about all five.

Bug 1 — The wrong-occurrence replace

LanguageTool returns findings as (offset, length, replacements). A finding for a misspelled word in a sentence includes the sentence text, the position within it, and the suggested replacement. The v1.1 code did this:

// v1.1 — broken
const rewrite = best ? m.sentence.replace(bad, best) : undefined;

String.replace replaces the first occurrence. If "the" appears twice in a sentence and LanguageTool flags the second one, sentence.replace("the", "a") fixes the wrong "the". The rewrite is incorrect while looking plausible.

The fix is to use the offset directly — splice at the exact character position instead of searching for the string:

// v1.2 — correct
function spliceReplace(text: string, offset: number, length: number, replacement: string): string {
  return text.slice(0, offset) + replacement + text.slice(offset + length);
}

// usage
const sentenceOffsetInText = text.indexOf(m.sentence);
const localOffset = sentenceOffsetInText >= 0 ? m.offset - sentenceOffsetInText : -1;
const rewrite = best && localOffset >= 0
  ? spliceReplace(m.sentence, localOffset, m.length, best)
  : undefined;

The offset is the ground truth. The string search was always a shortcut that happened to work on simple cases.

Bug 2 — Offset drift on multi-fix rewrites

The LLM-fallback path generates rewrites and then double-checks them through LanguageTool — if the LLM's rewrite itself contains a grammar mistake, we want to catch that before showing it to the user. In v1.1, the recheck loop looked like this:

// v1.1 — broken
const recheckMatches = await ltCheck({ text: f.rewrite });
let corrected = f.rewrite;
for (const m of recheckMatches.matches) {
  const best = m.replacements[0]?.value;
  if (best) corrected = corrected.replace(m.context.text, best);
}

The problem: the loop processes matches in ascending-offset order. The first replace changes the string length. Every match after the first has an offset that was computed against the original string — now it's wrong. The second correction splices at the wrong position, the third is worse, and the result is garbled text.

The fix: sort matches by offset descending, then splice from back to front. Later offsets don't affect earlier ones.

// v1.2 — correct
const sorted = [...recheck.matches].slice(0, 5).sort((a, b) => b.offset - a.offset);
let corrected = f.rewrite;
for (const m of sorted) {
  const best = m.replacements[0]?.value;
  if (!best) continue;
  corrected = spliceReplace(corrected, m.offset, m.length, best);
}

This is a standard in-place editing pattern. Work from right to left, and earlier positions stay valid. We missed it in v1.1.

Bug 3 — Concurrency burst into rate-limit throttling

The recheck loop in v1.1 ran all corrections in parallel:

// v1.1 — broken
findings = await Promise.all(findings.map(async (f) => {
  const recheck = await ltCheck({ text: f.rewrite });
  // ...
}));

For an article with 10 grammar findings, this fires 10 concurrent LanguageTool POSTs at once. The managed LanguageTool tier allows 20 requests per minute. A single article check could consume half the minute's quota in one burst, causing subsequent requests to 429 and findings to silently drop.

The fix: cap concurrency at 3.

// v1.2 — correct
findings = await mapWithConcurrency(findings, 3, async (f) => {
  if (!f.rewrite) return f;
  // recheck logic
});

Three concurrent is enough for throughput. It doesn't exhaust the rate limit on a single article and leaves headroom for concurrent checks running in parallel.

Bug 4 — The verdict that lied

Back to the verdict bug. The LLM-fallback path used a score threshold with no regard for whether findings actually existed:

// v1.1 — broken
const score = Math.max(0, 100 - findings.length * 3);
const verdict = score >= 75 ? "pass" : score >= 50 ? "warn" : "fail";

Eight findings deduct 24 points: 100 - 24 = 76. Verdict: pass. The user sees a green badge and a list of 8 issues. That's incoherent. The fix is one line:

// v1.2 — correct
const verdict: SkillResult["verdict"] =
  findings.length === 0 ? "pass"
  : score >= 50 ? "warn"
  : "fail";

If there are any findings at all, the verdict is warn at minimum. A pass requires zero findings. This applies to both the LanguageTool path and the LLM-fallback path.

Bug 5 — The 20KB wall

LanguageTool's managed tier enforces a 20KB per-request cap. In v1.1, ltCheck posted the full article text as a single request. Articles over ~18KB would return a 400 error with no useful information — no findings, no error message explaining the size limit.

The fix: measure the text in bytes before posting, and if it exceeds 18KB (conservative, to leave room for URL-encoded overhead), split it at sentence boundaries and aggregate the results:

// v1.2 — ltCheck in src/providers/languagetool.ts
const LT_CHUNK_SIZE = 18_000;

export async function ltCheck(opts: LTCheckOpts): Promise<LTResponse> {
  if (Buffer.byteLength(opts.text, "utf8") <= LT_CHUNK_SIZE) {
    return ltCheckOne(opts);
  }

  const sentences = splitIntoSentences(opts.text);
  const chunks: string[] = [];
  let cur = "";
  for (const s of sentences) {
    if (Buffer.byteLength(cur + " " + s, "utf8") > LT_CHUNK_SIZE && cur) {
      chunks.push(cur);
      cur = s;
    } else {
      cur = cur ? cur + " " + s : s;
    }
  }
  if (cur) chunks.push(cur);

  // Adjust offsets so findings reference positions in the original text
  const results = await Promise.all(
    chunks.map((c, i) => {
      const priorLen = chunks.slice(0, i).reduce((n, ch) => n + ch.length + 1, 0);
      return ltCheckOne({ ...opts, text: c }).then((r) => ({
        ...r,
        matches: r.matches.map((m) => ({ ...m, offset: m.offset + priorLen })),
      }));
    })
  );
  return { matches: results.flatMap((r) => r.matches) };
}

The sentence-boundary split matters: cutting at an arbitrary byte position can break a word mid-token and confuse LanguageTool's parser. splitIntoSentences uses Intl.Segmenter — which also handles Hebrew correctly.

Hebrew support

Intl.Segmenter is the right primitive for sentence splitting in both English and Hebrew. Regex-based sentence splitters (/[.!?]+\s/) fail on Hebrew because punctuation rules differ and right-to-left text needs a different tokenization strategy.

Every sentence-boundary operation in the grammar skill — chunking, offset calculation, recheck scope — now routes through splitIntoSentences, which uses Intl.Segmenter("und") (language-agnostic segmentation) with sentence granularity. This gives correct splits for Hebrew-language articles without a separate code path.

What the grammar skill does today

LanguageTool first. Deterministic, free on the managed tier, catches mechanical errors with documented rule IDs. No API key required for the free tier.
LLM fallback for style. When LanguageTool misses a style issue — passive voice, awkward phrasing, register mismatch — the LLM path catches it. Rewrites are double-checked through LanguageTool before they're surfaced (concurrency cap 3, descending-sort splice).
Honest verdicts. Any finding → at minimum warn. Zero findings → pass. The score still communicates severity, but it doesn't override the finding count.
Hebrew via Intl.Segmenter. Correct sentence splitting for both English and Hebrew. CJK and Arabic are Phase 8.
18KB chunking. Articles of any length work against the managed LanguageTool tier.

What it still doesn't do

The grammar skill finds mechanical errors and some style issues. It doesn't reshape your voice, normalize dialect (US vs. UK English coexisting in one article), or handle code-switching — a sentence that switches mid-way between Hebrew and English will confuse any single-language rule set. Those are Phase 8 problems.

It's also not an auto-editor. It returns rewrites. You decide whether to apply them. That's intentional: the system finds the problems, you own the fixes.

CheckApp v1.2.0 is shipped. Install it, run your next article through it, and the grammar skill will tell you what it actually found.

npm install -g checkapp
checkapp --setup
checkapp article.md

Grammar uses LanguageTool free tier by default — no API key needed to start. The dashboard (checkapp --ui, opens at localhost:3000) shows the full findings list with per-sentence rewrites and severity breakdowns.

GitHub repo and full docs →

CheckApp is free and open-source, built by one person. If this post helped — ☕ buy me a coffee

Was this useful?

Share it with someone who ships AI content.

Share on Twitter

Continue reading

case-study

I Ran 5 Client Articles Through CheckApp — Here's What It Caught

Five real client articles — fintech, wellness, SaaS, B2B, onboarding. A contradicted statistic, FDA-risk phrases, a plagiarism near-miss, and self-plagiarism the writer didn't know about.

comparison

CheckApp vs Grammarly vs ChatGPT vs Copyscape

An honest comparison of four content quality tools across grammar, plagiarism, fact-checking, AI detection, tone matching, and legal risk — for agencies, marketers, and writers.

Try CheckApp

Open source. MIT. ~$0.15/check (estimate). Install in 60 seconds.

Get started View on GitHub