Overview
TypeStats uses advanced AI-powered algorithms to provide accurate character counting across multiple languages. Our counting rules are designed to handle the unique characteristics of each script system.
Language Detection Algorithm
Our system automatically detects the dominant language in your text using Unicode property escapes and statistical analysis:
// Korean Detection
const koreanRegex = /[\uAC00-\uD7AF\u1100-\u11FF\u3130-\u318F]/g;
// Chinese Detection
const chineseRegex = /[\u4E00-\u9FFF]/g;
// Japanese Detection
const japaneseRegex = /[\u3040-\u309F\u30A0-\u30FF\u4E00-\u9FFF]/g;
Counting Rules by Language
Korean-Dominant Text
When Korean is the dominant language, we apply special rules:
- English words are counted by individual characters (e.g., "MMORPG" = 6 characters)
- Total count includes all characters excluding spaces
- Punctuation and special characters are included in the total
Non-Korean Languages
For other languages, we use unified multilingual rules:
- English words are counted as complete words
- Total count = sum of all detected language characters + digits
- Punctuation is excluded from the total count
Grapheme Counting
TypeStats uses the modern Intl.Segmenter
API for accurate grapheme counting:
const segmenter = new Intl.Segmenter('en', { granularity: 'grapheme' });
const segments = Array.from(segmenter.segment(text));
const graphemeCount = segments.length;
Special Character Handling
- Emojis - Counted as single graphemes
- Combining Characters - Properly handled with Unicode normalization
- Zero-Width Characters - Excluded from counts
- Line Breaks - Counted separately from character counts
Performance Optimization
Our algorithms are optimized for real-time processing:
- Efficient regex patterns for language detection
- Cached segmenter instances for repeated use
- Minimal DOM manipulation for smooth user experience
Accuracy Standards
TypeStats maintains high accuracy standards:
- ✅ Unicode 15.0 compliance
- ✅ Cross-browser compatibility
- ✅ Real-time processing under 100ms
- ✅ 99.9% accuracy for common text patterns