π€ N-Gram Precision Metrics
1-Gram Precision (Unigram)
Original (PDF): "The quick brown fox jumps over the lazy dog"
Extracted: "The quick brown fox jumps over the dog"
1-gram precision: 7/8 = 87.5% (missing "lazy")
2-Gram Precision (Bigram)
Original (PDF): "The quick brown fox"
Extracted: "The brown quick fox"
1-gram precision: 100% (all 4 words present)
2-gram precision: 50% (only "The" pairs match; "quick brown" vs "brown quick" differ)
3-Gram Precision (Trigram)
Original (PDF): "The board of directors meets quarterly"
Extracted: "The board meets quarterly"
3-gram precision: 50% (only "The board" triplets match; "of directors meets" vs "board meets quarterly" differ)
4-Gram Precision (Quadgram)
Original (PDF): "Please submit your assignment by Friday morning"
Extracted: "Please submit your assignment by Friday morning"
4-gram precision: 100% (all quadgrams match exactly)
βοΈ How Metrics Are Calculated
N-Gram Extraction Process & BLEU Proxy
- Tokenization: Original PDF text and extracted value are split into individual words and punctuation marks (tokens).
-
N-Gram Generation: For each value of n (1, 2,
3, 4), all possible sequences of n consecutive tokens are
extracted.
Example: "The quick brown" generates:
1-grams: [The, quick, brown]
2-grams: [(The, quick), (quick, brown)]
3-grams: [(The, quick, brown)] - Overlap Calculation: Count how many n-grams from the extracted value appear in the original PDF text.
- Precision Computation: Precision = (Matching n-grams) / (Total n-grams in extracted value) β computed separately for n=1..4.
-
BLEU Proxy: When no gold reference exists the
system computes a BLEU-like proxy as the arithmetic mean of the
1-gram through 4-gram precisions:
BLEU_proxy = (p1 + p2 + p3 + p4) / 4This proxy is simple, interpretable, and stable for short extracted values typical in form extraction.
Why N-Grams Matter for This Task
- 1-grams: Detect if the LLM used completely foreign vocabulary not in the PDF.
- 2-grams: Detect if words are reordered or the sequence is altered.
- 3-grams: Detect if larger phrases are compressed or rewritten.
- 4-grams: Detect if the overall sentence structure and idiom are preserved (most critical for assignments).
βοΈ Assignment Compliance Guide
Requirement: "Retain exact original wording, sentence structure, and phrasing"
What the assignment is asking for:
- Extract information from the PDF as-is, without paraphrasing or rewriting.
- Preserve the original author's voice, style, and phrasing.
- Maintain sentence structure and grammatical choices.
- Do not compress, simplify, or improve upon the original text.
How to use metrics to verify compliance:
BLEU_proxy = (0.80 + 0.60 + 0.40 + 0.20) / 4 = 0.50 (50%)
Red Flags to Watch For:
- 1-gram < 80%: LLM introduced new vocabulary. Possible hallucination.
- High 1-gram, Low 4-gram: Same words used but reordered/compressed. Likely paraphrasing.
- Multiple PARAPHRASED fields: LLM is not suitable for this task; consider rule-based extraction.
- Values not in PDF: LLM hallucinated; definitely reject.
π‘ Tips & Best Practices
For Best Results:
- Use digital PDFs: Scanned PDFs may have OCR errors, leading to mismatches even if extraction is correct.
- Review SOMEWHAT MATCH fields immediately: Don't submit extractions with low BLEU_proxy scores without manual review.
- Check context: If a field has high 1-gram but low 4-gram, read the top "missing n-grams" to see what changed.
- Trust the BLEU_proxy score: For assignment compliance, focus on the BLEU_proxy as the primary indicator of wording preservation.
- Manual override: If the extraction is functionally correct but has low n-gram scores due to necessary abbreviation, you can manually approve it.
Understanding Missing & Extra N-Grams:
Missing N-Grams: Sequences present in the original PDF but absent from the extraction. Usually indicates:
- Words/phrases were omitted (compression)
- Synonyms were used (paraphrasing)
- Word order was changed (restructuring)
Extra N-Grams: Sequences in the extraction not found in the original PDF. Usually indicates:
- LLM added new content (hallucination or context blending)
- Different phrasing was used (paraphrasing)
- Formatting or punctuation was altered (usually minor)
π Reference
Summary of All Metrics
| Metric | What It Measures | Ideal Range | Relevance to Assignment |
|---|---|---|---|
| 1-Gram | Word-level vocabulary match | β₯ 85% | Detects hallucination |
| 2-Gram | Word-pair sequence match | β₯ 80% | Detects word reordering |
| 3-Gram | Phrase-level match | β₯ 80% | Detects phrase rewording |
| 4-Gram | Sentence structure & idiom match | β₯ 90% | PRIMARY COMPLIANCE INDICATOR |
Common Questions
Q: Why is 4-gram the most important metric?
A: Because your assignment specifically requires "exact original
wording, sentence structure, and phrasing." 4-gram precision
directly measures whether sentence structure and multi-word
phrases are preserved, making it the best single indicator of
compliance.
Q: What if 1-gram is high but 4-gram is low?
A: This means the LLM used all the right words but rearranged them
or rewrote the sentences. This is still a form of paraphrasing and
does NOT meet the requirement. Reject or manually fix.
Q: Can I use extractions with "GOOD MATCH" status?
A: Only if your assignment allows minor rewording or if you
manually verify the changes are acceptable. If the rubric demands
exact wording, prefer "EXCELLENT MATCH".
Q: Why do I see 0% on some n-grams?
A: This typically means the value was significantly rewritten or
is entirely different from the PDF. The LLM may have hallucinated
or misunderstood the field. Reject and investigate.