AI Counting Rules

Technical Documentation

← Back to TypeStats

Overview

TypeStats uses advanced AI-powered algorithms to provide accurate character counting across multiple languages. Our counting rules are designed to handle the unique characteristics of each script system.

Language Detection Algorithm

Our system automatically detects the dominant language in your text using Unicode property escapes and statistical analysis:

// Korean Detection const koreanRegex = /[\uAC00-\uD7AF\u1100-\u11FF\u3130-\u318F]/g; // Chinese Detection const chineseRegex = /[\u4E00-\u9FFF]/g; // Japanese Detection const japaneseRegex = /[\u3040-\u309F\u30A0-\u30FF\u4E00-\u9FFF]/g;

Counting Rules by Language

Korean-Dominant Text

When Korean is the dominant language, we apply special rules:

Non-Korean Languages

For other languages, we use unified multilingual rules:

Grapheme Counting

TypeStats uses the modern Intl.Segmenter API for accurate grapheme counting:

const segmenter = new Intl.Segmenter('en', { granularity: 'grapheme' }); const segments = Array.from(segmenter.segment(text)); const graphemeCount = segments.length;

Special Character Handling

Performance Optimization

Our algorithms are optimized for real-time processing:

Accuracy Standards

TypeStats maintains high accuracy standards: