Phase 0: Unicode Support for PDP-10
Purpose
- Establish Unicode encoding and decoding infrastructure as a foundation for higher-level language porting (e.g., Python and APL).
- Ensure compatibility with existing UTF-8/UTF-9 schemes devised by Mark Crispin.
Tasks
- Leverage Mark Crispin’s work:
- Review and validate existing UTF-8 and UTF-9 implementations
- Adapt to PDP-10 word and byte structure
- Implement a Locale Module for Localization
- Review and validate existing UTF-8 and UTF-9 implementations
- Adapt to PDP-10 word and byte structure
- Licensing
- verify compliance with licensing conditionsof all external resources used.
- select appropriate liceNses for distributing the deliverables.
- Create
UNICODE.MAC:
- Hold Unicode tables, constants, and bitmask macros
- Include case-folding, character width, category tables as needed
- Design new System Calls:
- Provide kernel-level access to Unicode conversion routines
- Define interface contract (register usage, flags, monitor symbols)
- Assign a UUO number for
UTFC% Functions in TOPS-10
- Assign a JSYS number for
UTFC% in TOPS-20
- modify existing monitor code to be unicode-compliant
(conditional assembly).
- TOPS-20 only: allocate a separate monitor section for new code and data.
-
- Testing & Verification:
- Write MACRO-10 unit tests for encoding/decoding
- Compare outputs against standard Unicode test vectors
Future developments
Supporting Unicode lays the groundwork for future extensions such as SSL, an X11 Window System implementation with audio, high-resolution color graphics and vectorized fonts.
Deliverables
UNICODE.MAC source file
- Documentation for
UTFC% JSYS interface
- Sample utilities: Unicode string echo, UTF-8/9 encoder/decoder
- Emulator scripts for automated test harness
References
- unicode consortium
- RFC 4042: UTF-9 and UTF-18 Efficient Transformation Formats of Unicode
This RFC, authored by Mark Crispin, provides a comprehensive description of UTF-9 and UTF-18 encodings, including sample routines in PDP-10 assembly and C. It serves as the primary reference for these encodings.
- GitHub Repository: enricobacis/utf9
For a practical implementation, there's a Python module that encodes and decodes text using UTF-9. While not authored by Mark Crispin, it is based on the specifications provided in RFC 4042 and can serve as a reference or starting point for our implementation.
- Plain Text, NDC Copenhagen 2022 by Dylan Beattie
- introduction to DIGITAL Standard RUNOFF
- TOPS-20 DIGITAL Standard RUNOFF User's Guide