Ordinary Character

Introduction

The term ordinary character designates a class of textual symbols that are treated as standard, printable elements within a writing system or digital encoding scheme. Unlike control codes, escape sequences, or specialized formatting tokens, ordinary characters possess inherent visual or semantic properties that are readily perceived by humans and processed by software without requiring additional context. The concept is central to typography, font rendering, language processing, and digital communication standards. It is applied across diverse domains, from the classification of Chinese characters in national orthography to the definition of printable ASCII ranges and Unicode categories that influence text rendering engines.

Terminology and Definition

Literal Meaning

At its most basic, an ordinary character is a symbol that represents a meaningful unit in a language or system. It is distinguished from non-printing or control characters that perform actions rather than convey content. In the context of the Unicode Standard, ordinary characters typically fall under the general categories of Letter (L), Number (N), Punctuation (P), or Symbol (S). However, the term “ordinary” is not formally defined within Unicode; instead, it is used by typographers and software engineers to refer to the subset of characters that are expected to appear directly in user-visible text.

Contrast with Special Characters

Special characters include a variety of code points that have specific functional roles: Control characters such as U+0009 (horizontal tab) or U+000A (line feed), Formatting marks like U+200C (zero-width non-joiner), and Delimiters used in markup languages, such as the angle brackets in HTML. Ordinary characters are those that do not perform these functions but instead serve as content in textual representation.

Unicode and Classification

Unicode Standard Overview

The Unicode Standard assigns a unique code point to every character in the world's writing systems. It defines general categories that describe a character’s syntactic role, such as L* for letters, N* for numbers, and P* for punctuation. Ordinary characters generally belong to these categories and are rendered with glyphs in fonts. The Unicode Consortium publishes charts and documentation at https://www.unicode.org/charts/.

Printable vs. Non-Printable

In the ASCII subset, characters from U+0020 (space) to U+007E (tilde) are considered printable and are thus ordinary. Code points below U+0020 are control characters. Extending beyond ASCII, Unicode introduces many more printable characters, including mathematical symbols and emoji, which are still considered ordinary because they are intended to appear in visual text. Conversely, U+2060 (word joiner) and U+FEFF (byte order mark) are technically printable but often treated specially by software, blurring the line between ordinary and special.

Cultural and Linguistic Contexts

Chinese Language: 通用字

In Mandarin Chinese, the term 通用字 (tōng yòng zì), literally “ordinary character,” refers to a set of 3,500 characters deemed essential for everyday literacy. The National Language Commission publishes the “Common Characters List” at https://www.china.com/, which guides education, publishing, and character encoding. These characters are selected based on frequency of use and cultural significance. While the term originates in a specific language context, it illustrates how ordinary characters can be defined relative to cultural norms.

Japanese and Korean Usage

Japanese and Korean languages also maintain lists of essential characters. The Japanese Ministry of Education lists the 2,136 kanji used in elementary education, often called the “常用漢字” (jōyō kanji). Korean includes the “표준국어대사전” (Standard Korean Language Dictionary) that catalogs characters used in modern Korean. These lists, though not titled “ordinary character,” function similarly by specifying which characters are expected to appear in standard text.

Typography and Rendering

Font Design and Glyph Coverage

Fonts must provide glyphs for ordinary characters to render text correctly. Glyph coverage is often specified in OpenType tables, such as cmap for character-to-glyph mapping. Developers use tools like Microsoft Typography or FontLab to design glyphs for these characters. Inadequate glyph coverage can lead to missing symbol icons or replacement boxes.

OpenType Features

Ordinary characters also interact with typographic features such as ligatures, kerning, and alternates. The OpenType specification defines many such features in the GSUB and GPOS tables. For instance, the “liga” feature replaces common letter combinations (e.g., fi) with a single glyph. These features only apply to ordinary characters because they represent printable content.

Text Processing and Software Development

Regular Expressions

In many programming languages, regular expressions use character classes to match ordinary characters. The dot . matches any character except a newline in most engines, thereby assuming ordinary status for matched code points. The class \w matches word characters (letters, digits, underscore), a subset of ordinary characters. Special classes like \p{L} explicitly match Unicode letters, reinforcing the distinction between ordinary and non-ordinary symbols.

Encoding and Decoding

When converting text between encodings - such as UTF-8, UTF-16, or ISO-8859-1 - software must preserve ordinary characters. The Unicode Standard’s encoding forms ensure that ordinary characters have consistent binary representations across platforms. Conversion tools like IANA Character Sets provide guidelines for handling ordinary characters during transformation.

Digital Communication Protocols

HTML and XML

Markup languages use angle brackets (< and >) as delimiters. These delimiters are considered special characters. Ordinary characters are inserted between tags to produce human-readable content. When rendering, browsers apply CSS styles to ordinary characters, while the structural semantics are governed by tags. Entities such as é encode ordinary characters using numeric references.

Unicode Emoji and Presentation

Emoji characters, such as U+1F600 (grinning face), are treated as ordinary characters in many contexts. However, certain emoji are designated as “modifier” or “variation selector” code points that alter the presentation of preceding ordinary characters. For example, U+FE0F forces emoji presentation. These modifiers are special and not counted as ordinary characters.

Natural Language Processing (NLP)

Tokenization

Tokenization algorithms identify boundaries between ordinary characters to produce tokens. In languages like English, whitespace separates tokens; in Chinese, tokenizers rely on statistical models to segment continuous streams of ordinary characters into words. The presence of ordinary punctuation marks, such as commas or periods, aids in determining sentence boundaries.

Part-of-Speech Tagging

POS taggers use ordinary characters to infer morphological and syntactic roles. The tagger may treat digits or symbols as distinct tokens. Ordinary characters form the bulk of the input corpus for training models.

Special Cases and Edge Conditions

Zero-Width and Non-Printing Marks

Zero-width non-joiner (U+200C) and zero-width joiner (U+200D) influence the visual appearance of ordinary characters in scripts such as Arabic or Indic scripts. While they themselves do not render visible glyphs, they modify the rendering of adjacent ordinary characters. Software typically categorizes them as “formatting marks” rather than ordinary characters.

Combining Diacritics

Combining marks, such as U+0301 (combining acute accent), attach to preceding ordinary characters. The base character is ordinary; the combining mark is non-printing in isolation but alters the visual representation. Text layout engines normalize such sequences into composed or decomposed forms depending on the encoding.

Control Characters

Control characters perform actions rather than represent content. They are typically invisible in rendering and are essential for communication protocols. Examples include carriage return (U+000D) and form feed (U+000C).

Formatting Marks

Formatting marks provide instructions to text processors, such as tab stops or page breaks. They often carry no visual representation and are considered special rather than ordinary.

Private Use Area (PUA)

Code points in the PUA (U+E000–U+F8FF) are reserved for custom characters. Users may assign ordinary characters to these code points, but such assignments are not standardized and may conflict across systems.

Applications and Industry Use

Publishing and Typesetting

Publishers rely on comprehensive ordinary character sets to ensure accurate representation of texts across languages. Typesetting software, such as Adobe InDesign, automatically maps ordinary characters to appropriate glyphs in the selected fonts.

Web Development

Web designers use ordinary characters to construct user interfaces. They ensure that text is encoded in UTF-8 and that fonts support all required ordinary characters. Accessibility tools, such as screen readers, rely on correct interpretation of ordinary characters for pronunciation.

Software Localization

Localization teams translate user interfaces, ensuring that ordinary characters in the source language are replaced with culturally appropriate equivalents. They must preserve punctuation and numeric formatting while adjusting for language-specific typographic conventions.

Future Directions

Unicode Expansion

The Unicode Consortium continues to add characters, expanding the set of ordinary characters. Each new version, such as Unicode 15.0, introduces additional emoji, historic scripts, and less common alphabets, requiring updates in fonts and text-processing libraries.

Adaptive Rendering

Emerging technologies, such as variable fonts and advanced CSS typographic features, allow more nuanced rendering of ordinary characters. These advances support responsive design and accessibility across devices.

References & Further Reading

Unicode Consortium – Official Website
Unicode Charts – Official Glyph Charts
Unicode Standard, Version 15.0.0
Microsoft Typography – Font Development Resources
W3C Internationalization Working Group
Linguee – Language Translation Dictionary
Wikipedia – Unicode
Wikipedia – Chinese character
Wikipedia – Font
IANA Character Sets – Internationalized Domain Names

Sources

The following sources were referenced in the creation of this article. Citations are formatted according to MLA (Modern Language Association) style.

1.

"https://www.unicode.org/charts/." unicode.org, https://www.unicode.org/charts/. Accessed 16 Apr. 2026.

Visit Source
2.

"https://www.china.com/." china.com, https://www.china.com/. Accessed 16 Apr. 2026.

Visit Source
3.

"FontLab." fontlab.com, https://www.fontlab.com/. Accessed 16 Apr. 2026.

Visit Source
4.

"IANA Character Sets." iana.org, https://www.iana.org/assignments/character-sets. Accessed 16 Apr. 2026.

Visit Source
5.

"Unicode Consortium – Official Website." unicode.org, https://www.unicode.org/. Accessed 16 Apr. 2026.

Visit Source
6.

"Unicode Standard, Version 15.0.0." unicode.org, https://www.unicode.org/versions/Unicode15.0.0/. Accessed 16 Apr. 2026.

Visit Source
7.

"W3C Internationalization Working Group." w3.org, https://www.w3.org/International/. Accessed 16 Apr. 2026.

Visit Source
8.

"Linguee – Language Translation Dictionary." linguee.com, https://www.linguee.com/. Accessed 16 Apr. 2026.

Visit Source

Search

Table of Contents

Introduction

Terminology and Definition

Literal Meaning

Contrast with Special Characters

Unicode and Classification

Unicode Standard Overview

Printable vs. Non-Printable

Cultural and Linguistic Contexts

Chinese Language: 通用字

Japanese and Korean Usage

Typography and Rendering

Font Design and Glyph Coverage

OpenType Features

Text Processing and Software Development

Regular Expressions

Encoding and Decoding

Digital Communication Protocols

HTML and XML

Unicode Emoji and Presentation

Natural Language Processing (NLP)

Tokenization

Part-of-Speech Tagging

Special Cases and Edge Conditions

Zero-Width and Non-Printing Marks

Combining Diacritics

Related Concepts

Control Characters

Formatting Marks

Private Use Area (PUA)

Applications and Industry Use

Publishing and Typesetting

Web Development

Software Localization

Future Directions

Unicode Expansion

Adaptive Rendering

References & Further Reading

Sources

Share this article

See Also

Cosmic Horror

Clases

Fernseher

Air Shocks

Hdtv Indoor Antenna

Suggest a Correction

Comments (0)

More Articles

Pacing Thermometer Prompts Mapping Tension Across Scenes

Outline Divergence Branches When Brainstorming Alternate Endings

Novel Synopsis Beat Boards Mixed With Stochastic Expansions

Nonlinear Timeline Sanity Checks Aided By Branching Summaries

Narrative Distance Vocabulary For Omniscient Close Third Hybrids

Categories