apng-i18n
From chon@cosmos.kaist.ac.kr Wed Apr 28 10:12:49 1993
Return-Path: <chon@cosmos.kaist.ac.kr>
Received: from cosmos.kaist.ac.kr by mani.kaist.ac.kr (4.1/SMI-4.1)
id AA08284; Wed, 28 Apr 93 10:12:49 KST
Errors-To: Postmaster@cosmos.kaist.ac.kr
Received: by cosmos.kaist.ac.kr (4.1/SMI-4.1)
id AA00552; Wed, 28 Apr 93 10:17:33 KST
Date: Wed, 28 Apr 93 10:17:33 KST
From: chon@cosmos.kaist.ac.kr (Kilnam Chon)
Message-Id: <9304280117.AA00552@cosmos.kaist.ac.kr>
Errors-To: Postmaster@cosmos.kaist.ac.kr
To: apccirn-i18n@nic.nm.kr
Subject: first mail
Will you acknowledge this mail to form the mailing list on internationalization
and localization?
The goal of this group is to make the networking friendly to non-English
speakers. The current networking does not support Asian languages properly,
and we need to do the following;
Internationalization to provide the framework
Localization to provide the local language/culture support
Kilnam Chon
PS: the mailing list will be (minimally) moderated by apccirn-sec initially
until the moderator/chair is elected.
From mohta@necom830.cc.titech.ac.jp Wed Apr 28 11:47:12 1993
Received: from kum.kaist.ac.kr by mani.kaist.ac.kr (4.1/SMI-4.1)
id AA08935; Wed, 28 Apr 93 11:47:12 KST
Errors-To: Postmaster@necom830.cc.titech.ac.jp
Received: from necom830.cc.titech.ac.jp by kum.kaist.ac.kr (4.1/KUM-0.1)
id AA05450; Wed, 28 Apr 93 11:53:53 KST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Wed, 28 Apr 93 11:43:55 +0859
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9304280244.AA21562@necom830.cc.titech.ac.jp>
Subject: Re: first mail
To: chon@cosmos.kaist.ac.kr (Kilnam Chon)
Date: Wed, 28 Apr 93 11:43:53 JST
Cc: apccirn-i18n@nic.nm.kr
In-Reply-To: <9304280117.AA00552@cosmos.kaist.ac.kr>; from "Kilnam Chon" at Apr 28, 93 10:17 am
X-Mailer: ELM [version 2.3 PL11]
> Will you acknowledge this mail to form the mailing list on internationalization
> and localization?
Yes, I will.
> The goal of this group is to make the networking friendly to non-English
> speakers. The current networking does not support Asian languages properly,
> and we need to do the following;
>
> Internationalization to provide the framework
> Localization to provide the local language/culture support
I'm also interested in the possibility of
Internationalization to provide the local language/culture
support without localization
Masataka Ohta
From trin@nwg.nectec.or.th Wed Apr 28 16:25:53 1993
Return-Path: <trin@nwg.nectec.or.th>
Received: from munnari.oz.au by mani.kaist.ac.kr (4.1/SMI-4.1)
id AA10404; Wed, 28 Apr 93 16:25:53 KST
Errors-To: Postmaster@nwg.nectec.or.th
Received: from [192.150.251.31] by munnari.oz.au with SMTP (5.83--+1.3.1+0.50)
id AA10547; Wed, 28 Apr 1993 17:29:05 +1000 (from trin@nwg.nectec.or.th)
From: trin@nwg.nectec.or.th (Trin Tantsetthi)
Message-Id: <9304290554.AA26274@nwg.nectec.or.th>
To: apccirn-i18n@nic.nm.kr
Subject: Re: first mail
In-Reply-To: Your message of Wed, 28 Apr 93 10:17:33 T.
<9304280117.AA00552@cosmos.kaist.ac.kr>
Date: Wed, 28 Apr 93 12:54:08 -1700
Hello,
This is an acknowledgement per request by chon@cosmos.kaist.ac.kr
(Kilnam Chon).
I am the secretariat of the internationalization and international
standards coexistence working group of the Thai national standards body
(TISI/TC536/SC2/WG2). TISI is an O-member of ISO. Thai character set is
registered with ECMA as ISO-IR-166; the corresponding local version
is called the TIS 620-2533 standard.
Regards,
-Trin
From htk@ipied.tu.ac.th Wed Apr 28 23:31:18 1993
Return-Path: <htk@ipied.tu.ac.th>
Received: from kum.kaist.ac.kr by mani.kaist.ac.kr (4.1/SMI-4.1)
id AA11120; Wed, 28 Apr 93 23:31:18 KST
Errors-To: Postmaster@ipied.tu.ac.th
Received: from chulkn.chula.ac.th by kum.kaist.ac.kr (4.1/KUM-0.1)
id AA17562; Wed, 28 Apr 93 23:37:57 KST
Received: by chulkn.chula.ac.th (Smail3.1.28.1 #12)
id m0noD7y-0003QmC; Wed, 28 Apr 93 21:28 BKK
Received: by ipied.tu.ac.th (4.1/SMI-3.2A+08)
id AA03321; Wed, 28 Apr 93 21:30:28+0700
Date: Wed, 28 Apr 1993 21:30:03 +0700 (GMT+0700)
From: Hugh Thaweesak Koanantakool <htk@ipied.tu.ac.th>
Subject: Re: first mail
To: Kilnam Chon <chon@cosmos.kaist.ac.kr>
Cc: apccirn-i18n@nic.nm.kr
In-Reply-To: <9304280117.AA00552@cosmos.kaist.ac.kr>
Message-Id: <Pine.3.07.9304282100.B3308-8100000@ipied.tu.ac.th>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
On Wed, 28 Apr 1993, Kilnam Chon wrote:
> Will you acknowledge this mail to form the mailing list on internationalization
> and localization?
Here is it!
Thaweesak.
From chon@cosmos.kaist.ac.kr Fri Apr 30 12:40:47 1993
Return-Path: <chon@cosmos.kaist.ac.kr>
Received: from cosmos.kaist.ac.kr by mani.kaist.ac.kr (4.1/SMI-4.1)
id AA05266; Fri, 30 Apr 93 12:40:47 KST
Errors-To: Postmaster@cosmos.kaist.ac.kr
Received: by cosmos.kaist.ac.kr (4.1/SMI-4.1)
id AA17172; Fri, 30 Apr 93 12:45:33 KST
Date: Fri, 30 Apr 93 12:45:33 KST
From: chon@cosmos.kaist.ac.kr (Kilnam Chon)
Message-Id: <9304300345.AA17172@cosmos.kaist.ac.kr>
Errors-To: Postmaster@cosmos.kaist.ac.kr
To: apccirn-i18n@nic.nm.kr
Subject: this group and JWCC session and first activity
Internationalization/localization(i18n/l10n) is one of the most important issues
for the Asian networking community. I would like to see orderly delivery of
networking software with appropriate local language/culture support to Asian
community in timely manner. The APCCIRN and this group is to help/guide such
capability to be realized.
As the first step, I would like to propose the special session on local language
support at the JWCC(Joint Workshop on Computer Communications) in Taipei in Dec.
12-14 immediately after the planned APCCIRN Meeting at the same city. The
deadline of papers are due June 30. My proposal is the panel discussion with
brief description of what is available now in Chinese, Japanese, Korean and
other langauges, and what are the major issues we are facing now(such as
Unicode).
Can you recommend the panel/session chair with panelists/speakers, and comment
on the content?
The above activity would give us the current status in the region, and we can
start working on what to do next.
From mohta@necom830.cc.titech.ac.jp Thu May 13 21:47:15 1993
Received: from daiduk.kaist.ac.kr by nic.nm.kr (4.1/SMI-4.1)
id AA04754; Thu, 13 May 93 21:47:15 KST
Errors-To: Postmaster@nic.nm.kr
Received: from necom830.cc.titech.ac.jp by daiduk.kaist.ac.kr (4.1/KAISTNet-Relay-3.2)
id AA02632; Thu, 13 May 93 21:41:50 KST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Thu, 13 May 93 21:14:27 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta>
Message-Id: <9305131214.AA03128@necom830.cc.titech.ac.jp>
Subject: JWCC and i18n
To: apccirn-i18n@nic.nm.kr
Date: Thu, 13 May 93 21:14:26 JST
X-Mailer: ELM [version 2.3 PL11]
As Kilnam suggested, as part of the activity of APCCIRN i18n group,
let's promote a research on issues of Asian languages at the next
JWCC.
First of all, I would like to ask how many of you are planning to
participate in the next JWCC (1993 Dec. 12~14, Taipei, Taiwan).
The possible topics are:
1) how local languages are currently supported
2) special feature of local languages
3) how local languages should be supported
4) how local language support should be internationalized
As for a formal procedure, we must send papers or a panel proposal to
the program committee before 7/1.
I think topics 1) and 2) are not so much research oriented but could be
an interesting presentation as a panel session (I could be wrong).
But, if we could submit enough number of research papers on language
issues to JWCC, a paper session is possible in which topics 1) and 2)
could also be covered in the introduction parts of papers.
Any suggestions are welcome.
Masataka Ohta
From chon@cosmos.kaist.ac.kr Mon May 17 14:36:52 1993
Return-Path: <chon@cosmos.kaist.ac.kr>
Received: from kum.kaist.ac.kr by nic.nm.kr (4.1/SMI-4.1)
id AA08887; Mon, 17 May 93 14:36:52 KST
Errors-To: Postmaster@cosmos.kaist.ac.kr
Received: from cosmos.kaist.ac.kr by kum.kaist.ac.kr (4.1/KUM-0.1)
id AA01594; Mon, 17 May 93 14:33:05 KST
Received: by cosmos.kaist.ac.kr (4.1/SMI-4.1)
id AA06952; Mon, 17 May 93 14:32:40 KST
From: chon@cosmos.kaist.ac.kr (Kilnam Chon)
Message-Id: <9305170532.AA06952@cosmos.kaist.ac.kr>
Errors-To: Postmaster@cosmos.kaist.ac.kr
Subject: Re: JWCC and i18n
To: mohta@necom830.cc.titech.ac.jp (Masataka Ohta)
Date: Mon, 17 May 93 14:32:39 KST
Cc: apccirn-i18n@nic.nm.kr
In-Reply-To: <9305131214.AA03128@necom830.cc.titech.ac.jp>; from "Masataka Ohta" at May 13, 93 9:14 pm
X-Mailer: ELM [version 2.3 PL11]
the first thing we should do is to recruit apccirn-i18n members from various
countries. for example, we have only one from japan, and none from hong kong
and many other countries.
kilnam chon
PS: Unix International just delivered the following report on April 16.
Guidelines for the Development of Localization Packages, 80 pages.
From uhhyung Fri May 28 04:49:49 1993
Return-Path: <uhhyung>
Received: by nic.nm.kr (4.1/SMI-4.1)
id AA14061; Fri, 28 May 93 04:49:49 KST
From: uhhyung (Uhhyung Choi)
Message-Id: <9305271949.AA14061@nic.nm.kr>
Errors-To: Postmaster@nic.nm.kr
Subject: Status of the Korean Encoding for Internet Messages
To: mohta@necom830.cc.titech.ac.jp (Masataka Ohta)
Date: Fri, 28 May 1993 04:49:49 +0900 (KST)
Cc: apccirn-i18n@nic.nm.kr
In-Reply-To: <9305271345.AA10711@necom830.cc.titech.ac.jp> from "Masataka Ohta" at May 27, 93 10:45:00 pm
X-Mailer: ELM [version 2.4 PL21-h3]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Length: 3093
Prof. Ohta,
I have no plan to attend JWCC in Taipei as for now. Are you planning
to have apccirn-i18n related meeting in upcoming JWCC?
Before any activities, we'll have to make the charter to define the
issues, goals and milestones for the WG.
Information you requested on the status of the Korean encoding
as follows. I'd like to hear about the situation in Taiwan also.
For your background information, we have three nation-wide IP networks
in Korea. KREN(Korea Research and Educational Network), KREONet(Korea
Research and Educational Open Network) are those funded by government,
each by Ministry of Education and by Ministry of Science and Technology
respectively. (The names doesn't seem to give any meaning to me though.)
The other network, SDN, which began its operation in the early eighties
connecting domestic organizations with UUCP and TCP/IP, now has grown
to a membership based network which has also a 56Kbps link to
NASA Science Internet.
The encoding began to be used in SDN in late 1991. and spread to the
other two networks in early 1992. It is the only encoding used to carry
Korean characters as fas as I know. And as for its frequency get used.
I get several tens of emails everyday, about half or two-third of which
are in Korean.
The encoding itself has somewhat different role than that of Japan.
Unlike Japanese practice, we don't recommand the encoding used as the
storage code. Actually, my own implementation of the encoding doesn't
allow the encoding be stored as a file, but used as the transit media.
We don't have any hardware that supports the encoding.
We don't use any encoding in USENET news. We've arranged all the NNTP
gateways handle EUC code correctly so we use bare EUC with USENET news.
It is required that each new organization establishing a new connection to
the network provider can handle mails in the encoding.
Accidentally, the encoding has partial compatibility with the encoding
used in SunOS Korean Language Environment, but most people doesn't seem to
even know about it. Moreover, the KLE itself doesn't have easy documentation
for the novice users, so it doesn't seem to help users get acquaintence
with Korean email system. Personally I don't prefer the implementation
used in KLE be used widely for it leaves the encoded message in each user's
mailbox though there is not any tool to manipulate with the encoding.
I have plan to ask Sun to change their encoding and implementation
currently used in KLE after the Internet-Draft published as an
Informational RFC.
I know several students studying abroad who can read and write in Korean
But I don't have any statistics about it. It seems people overseas get to
know about the encoding by the colleagues in Korea who use the encoding
day to day.
Several months ago, I've discussed implementation issues with a person
from SONY but I don't know whether he was concerned with adopting the
encoding in their workstation's operating system.
If you would like to hear further information, please let me know.
--
Uhhyung Choi
Korea Network Information Center
uhhyung@nic.nm.kr
From mohta@necom830.cc.titech.ac.jp Thu Jun 17 10:33:01 1993
Received: from daiduk.kaist.ac.kr by nic.nm.kr (4.1/SMI-4.1)
id AA13545; Thu, 17 Jun 93 10:33:01 KST
Errors-To: Postmaster@necom830.cc.titech.ac.jp
Received: from necom830.cc.titech.ac.jp by daiduk.kaist.ac.kr (4.1/KAISTNet-Relay-3.2)
id AA14380; Thu, 17 Jun 93 10:23:50 KST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Thu, 17 Jun 93 10:20:57 +0859
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9306170121.AA10755@necom830.cc.titech.ac.jp>
Subject: internet default character code
To: apccirn-i18n@nic.nm.kr
Date: Thu, 17 Jun 93 10:20:56 JST
X-Mailer: ELM [version 2.3 PL11]
Considering the current practice in Japan and Korea, I think it
is worthwhile to standardize that the default 7bit character encoding
method on the Internet be full 7 bit ISO 2022.
That is, if nothing else is specified on a 7 bit stream, the character code
used should be assumed to be ISO 2022 with the initial designation of
ASCII to GO and none to G1/2/3.
As there already exist much non-MIME 7-bit traffic with Japanese
news/mail and Korean news, it is practical to make them legitimate.
Though MIME has limited capability to specify a character set, isn't
MIME too complex to use only to legislate currently used character
encoding? Moreover, it is usable only when there can be a header part.
Any opinions?
Masataka Ohta
From uhhyung Sat Jun 19 00:26:51 1993
Return-Path: <uhhyung>
Received: by nic.nm.kr (4.1/SMI-4.1)
id AA18355; Sat, 19 Jun 93 00:26:51 KST
From: uhhyung (Uhhyung Choi)
Message-Id: <9306181526.AA18355@nic.nm.kr>
Errors-To: Postmaster
Subject: Re: internet default character code
To: mohta@necom830.cc.titech.ac.jp (Masataka Ohta)
Date: Sat, 19 Jun 1993 00:26:50 +0900 (KST)
Cc: apccirn-i18n@nic.nm.kr
In-Reply-To: <9306170121.AA10755@necom830.cc.titech.ac.jp> from "Masataka Ohta" at Jun 17, 93 10:20:56 am
X-Mailer: ELM [version 2.4 PL21-h3]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Length: 1225
As Masataka Ohta writes:
*
* Considering the current practice in Japan and Korea, I think it
* is worthwhile to standardize that the default 7bit character encoding
* method on the Internet be full 7 bit ISO 2022.
*
* That is, if nothing else is specified on a 7 bit stream, the character code
* used should be assumed to be ISO 2022 with the initial designation of
* ASCII to GO and none to G1/2/3.
Yes. That is exactly what is being used in Japanese and Korean email
these days.
* As there already exist much non-MIME 7-bit traffic with Japanese
* news/mail and Korean news, it is practical to make them legitimate.
No, we don't have 7-bit news traffic that carries any kind of Hangul
characters encoded.
* Though MIME has limited capability to specify a character set, isn't
* MIME too complex to use only to legislate currently used character
* encoding? Moreover, it is usable only when there can be a header part.
Yes, MIME is a little bit complex, but I think we'd better stick with MIME
rather than introducing another method for extended character encoding.
Do you have any simple idea that will make current practice legistimate?
--
Uhhyung Choi
Korea Network Information Center
From mohta@necom830.cc.titech.ac.jp Thu Jun 24 16:56:31 1993
Received: from necom830.cc.titech.ac.jp by nic.nm.kr (4.1/SMI-4.1)
id AA09636; Thu, 24 Jun 93 16:56:31 KST
Errors-To: Postmaster@necom830.cc.titech.ac.jp
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Thu, 24 Jun 93 16:48:00 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9306240748.AA11873@necom830.cc.titech.ac.jp>
Subject: Re: internet default character code
To: uhhyung@nic.nm.kr (Uhhyung Choi)
Date: Thu, 24 Jun 93 16:47:59 JST
Cc: apccirn-i18n@nic.nm.kr
In-Reply-To: <9306181526.AA18355@nic.nm.kr>; from "Uhhyung Choi" at Jun 19, 93 12:26 am
X-Mailer: ELM [version 2.3 PL11]
Sorry about the confusion on mails and news in Korea.
> * Though MIME has limited capability to specify a character set, isn't
> * MIME too complex to use only to legislate currently used character
> * encoding? Moreover, it is usable only when there can be a header part.
>
> Yes, MIME is a little bit complex, but I think we'd better stick with MIME
> rather than introducing another method for extended character encoding.
My proposal can interoperate with MIME and applicable to non-mail
traffics.
> Do you have any simple idea that will make current practice legistimate?
Simple. Write an internet draft saying it legitimate because it is the
current practice used by 100,000 or 1,000,000 of people on the internet.
And, that is what I'm proposing.
Masataka Ohta
From mohta@necom830.cc.titech.ac.jp Fri Jul 16 19:07:19 1993
Received: from necom830.cc.titech.ac.jp by nic.nm.kr (4.1/SMI-4.1)
id AA07696; Fri, 16 Jul 93 19:07:19 KST
Errors-To: Postmaster@necom830.cc.titech.ac.jp
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Fri, 16 Jul 93 18:59:24 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9307160959.AA26139@necom830.cc.titech.ac.jp>
Subject: IETF BOF
To: apccirn-i18n@nic.nm.kr
Date: Fri, 16 Jul 93 18:59:23 JST
X-Mailer: ELM [version 2.3 PL11]
In the last IETF, BOF on character encoding was held.
The discussion will continue on the mailing list, and WG will be,
perhaps, formed.
All of you, who have interested in this issue should register
your mail address to:
ietf-charsets-request@innosoft.com.
MO
From mohta@necom830.cc.titech.ac.jp Wed Jul 21 18:17:01 1993
Received: from necom830.cc.titech.ac.jp ([131.112.4.4]) by nic.nm.kr (4.1/SMI-4.1)
id AA08624; Wed, 21 Jul 93 18:17:01 KST
Errors-To: Postmaster@necom830.cc.titech.ac.jp
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Wed, 21 Jul 93 16:50:35 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta>
Message-Id: <9307210750.AA14623@necom830.cc.titech.ac.jp>
Subject: JWCC paper
To: apccirn-i18n@nic.nm.kr
Date: Wed, 21 Jul 93 16:50:33 JST
X-Mailer: ELM [version 2.3 PL11]
Following is my paper submitted for the next JWCC. The paper
is somewhat revised reflecting the discussion in the last
IETF.
Any comments?
Masataka Ohta
PS
Does someone on this list know the contact person of KCS commettee.
.TL
Character Encoding Method for Internationalized Plain Text Processing
.AU
Masataka Ohta
.AI
Computer Center, Tokyo Institute of Technology
2-12-1, O-okayama, Meguro-ku, Tokyo 152, JAPAN
Tel: +81-3-5499-7084, Fax: +81-3-3729-1940
.br
.ce 0
.AB
Encoding, decoding and comparison of text are the most basic operations
of plain text processing.
By inspecting various aspects of these operations under multilingual
environment, operational requirements for internationalized
character encoding methods become clear.
Finiteness properties such as finite state machine operation or
finite length resynchronization are also requirements for
the encoding methods.
As the existing or proposed encoding methods such as ISO 2022 or
ISO 10646 can not fulfill several basic requirements,
they are not useful for the internationalized plain text processing.
Thus, a new encoding system, ICODE/IUTF is proposed based on ISO 10646.
ICODE is a 21 bit code suitable for simple processing of plain, but,
possibly bidirectional, text.
IUTF is a compact information interchange form for ICODE and is
upper compatible to UTF2, the proposed information interchange
code for ISO 10646.
.AE
.LP
KEYWORDS: Text Processing, Character Encoding, Multilingual
.bp
.ls 2
.ds CF "
.NH
Introduction
.PP
Plain text processing is the most basic form of text processing.
Moreover, for various applications, plain text
processing is often enough.
One of the most successful plain text processing system is
UNIX.
UNIX text files are assumed to have several lines
separated by newline characters.
Without assuming further structure, various tools to generate,
filter and consume plain text has been designed such as
cat, grep, wc, ls, echo, sed, tee, sort, diff and so on.
Moreover, these simple but powerful tools are combined through the pipe
mechanisms to perform more complicated processing.
It should also be noted that the command language for the shell is
plain text and the above tools could be used also for meta
level processing.
.PP
On classic UNIX, ASCII was the only available character code with
which English was represented.
Plain text processing with ASCII has been quite simple because
with ASCII and English:
.IP
there are only 128 characters.
.IP
all character can be input directly from the key board
.IP
case correspondence is regular and simple
.IP
all characters are represented by a single 8 bit byte
.IP
text is written left to right
.LP
.PP
Unfortunately, to construct an internationalized text processing
environment most of these favorable properties are lost.
That is,
.IP
there are more than 65536 characters even in a single language
such as Chinese.
.IP
some input mechanism is necessary to construct diacriticized
characters of Latin characters.
Even worse, for Japanese language input,
complex and interactive input mechanism is necessary to map the
typed pronunciation or shape hint to actual characters.
.IP
case correspondence is complex and different language by language.
For example, a capital form of 'y' with diaeresis is 'IJ' in
Danish but 'Y' or 'Y' with diaeresis in French.
.IP
Even by a single 16 bit byte, not all characters can be encoded,
so that multibyte representation is practically inevitable.
.IP
In Arabic, text may be written left to right, right to left or in
mixed direction.
.LP
.PP
While there is an endless debate on what is a character,
what we actually need is
not character encoding but a convenient method of encoding of plain text.
For that purpose, it is enough to define characters
as some unit of text encoding.
Thus, the implications of above stated differences are inspected
in section 2 for three basic operations for plain
text processing: encoding, decoding and equality comparison.
.NH
Requirements for the Internationalized Plain Text Processing
.PP
Text is a visible media.
Thus, translation between graphical representation and
the coded representation is essential for the plain
text processing.
So, the very basic operation of plain text processing
is encoding of graphically represented text to the coded representation
and decoding of the coded representation to the graphical representation.
Various plain text processing such as information interchange, concatenation,
counting and simple sorting become possible only with encoding
and decoding.
.PP
The second most important operation is equality of text which
enables search operations.
.NH 2
Universality
.PP
In this paper, a text encoding method is said to be universal for
some family of languages, if all the decoding information is self
contained in the encoding and no profiling nor negotiation is necessary to
correctly decode the text of the family of the languages.
Universality does not mean that all the languages in the world
could be encoded.
.PP
To make universal encoding/decoding possible, different characters
(whatever 'a character' means) should have different coded representations,
which does not mean that a single character can not have
more than one coded representations.
At the same time, it is also desirable that the encoded
representation is compact, which means that
a single character should have as little number of
representations as possible.
.PP
As long as encoding and decoding concern, it is not
necessary to assign multiple code points to a single
graphic form.
Thus, letters 'A' of English and 'A' of French both
in Gothic font can share
a single code point.
The problem is that a character can have several different graphic
representations.
If all the variant could be allowed and is regarded to have the same semantics
as plain text, the distinction is not necessary.
For example, the font information is not encoded in plain text.
.PP
The distinction between uppercase/lowercase
characters are qualitative, aesthetic and subjective matter.
It is perfectly legal to express English text with uppercase characters only.
On old
computers, all characters are represented in the uppercase,
because, in bad old days, someone thought case difference is not significant.
But, on UNIX, the case distinction has been considered to be
essential.
In general these days, for the computer output of plain text, type-written
or LBP-printed quality is expected.
.PP
But, sometimes, the distinction is objectively necessary for the
universality.
That is, in some context in some language, some graphical
representations of a character is disallowed.
So, it is necessary to select appropriate shapes
allowed by the context.
If such selection can not be performed mechanically, different code
points must be assigned to different graphic representations at the
time of encoding.
For example, in German with case distinction, the first
character of sentences and nouns are in capital form. While the first
character of a sentence could be, in general, mechanically identified,
it is not possible to mechanically identify a noun. Thus,
case distinction information must be encoded.
.NH 2
Causality
.PP
Because of the law of causality, decoding process can not depend on
a not-yet-happened event.
Thus, for an interactive processing, as immediate output is required,
a shape of a character can not depend on the possibly-not-yet-typed
next character.
.PP
For example, Arabic characters have different form depending on
whether the character is at the end of a word or not.
Then, if the end-of-a-word information is not encoded in the
character code, a correct display of an Arabic character
is impossible until the next character arrives.
In interactive environment, the next character might not be
typed by a user so that the waiting period is indeterminate.
So, for interactive processing, it is necessary to be able to produce
a image of a character without looking ahead the possibly-not-yet-exist
next character.
.NH 2
Finite state recognition
.PP
Causality does not prohibit displaying of characters be affected
by previous characters.
That is, the decoding process could be controlled by a stateful automaton.
Such state dependence is inevitable to detect character boundary
of multibyte characters.
But, as long as the plain text processing concerns, the state
transition should be represented with a finite state automata.
Or else many algorithm of plain text processing does not work.
Thus, if some text have more complex structure represented by, say, a context
sensitive grammer, it is not a plain text.
.NH 2
Finite resynchronizablity
.PP
When displaying characters backward or when performing binary search on
sorted text, the state of the displaying automaton
is, in general, unknown.
Moreover, in interactive environment, octets are often lost, because
of communication errors.
User interruption might also cause synchronization error.
Finite resynchronizablity means that, by reading fixed finite number of
bytes, the state of the automaton can be determined uniquely.
It should be noted that this requirement automatically implies the
finiteness of the state machine.
.NH 2
Equality
.PP
Equality of two text should be defined unambiguously, of course.
.PP
As a character might have several different coded representations,
to search some text, it is sometimes convenient that all the possible
representation are compared to be equal or there is a handy representation
for the set of all the related characters.
But it is not a strict requirement, as one can list all the
possible encoded forms, in theory, by hand.
For example, if there is a notation to specify case insensitive
comparison, it is sometimes useful.
But, one can also specify the search pattern containing two code points
for the both case.
Thus,
.DS
% grep -i abc
.DE
could be
.DS
% grep '[Aa][Bb][Cc]'
.DE
.NH 2
Summary
.PP
To summarize, the requirements for the internationalized
character encoding methods for the minimal text processing are:
.IP
Universality
.IP
Causality
.IP
Finite stateness
.IP
Finite resynchronizability
.IP
Equality
.LP
.NH
Existing Encoding Methods
.PP
Considering the requirements in section 2,
the existing encoding method for multi lingual processing
is not enough.
.NH 2
ISO 2022
.PP
ISO 2022 gives the frame work to switch between different encoding systems
by escape sequences.
Each encoding system have one or multiple, but fixed, number of octets
to represent different set of characters.
One of the major problem with ISO 2022 is that there is no unified
policy on encoding systems.
.PP
As for the requirements in section 2,
.IP Universality
Satisfied
.IP Causality
Some encoding system does not satisfy the causality
.IP "Finite stateness
Satisfied
.IP "Finite resynchronizablity
Not satisfied. The standard has several longterm states
.IP Equality
Equality between different encoding systems are not defined
.LP
It should be noted that ISO 2022 is a large standard containing
large number of states that
some profiling is necessary to specify the initial state
and the allowable combinations of escape sequences.
Finite resynchronizability could also be satisfied by profiling
but the resulting encoding method is rather lengthy.
.PP
In general, ISO 2022 is actually used widely to represent limited number
of languaged within which the encoding policy is
unified, but, it is not so useful as a general framework
for the internationalization.
.NH 2
ISO 10646
.PP
ISO 10646 was designed to be a universal coded character set (UCS).
It is actually universal in some sense. That is:
.IP 1)
it contains large number of characters
.IP 2)
it intends to represent all the characters in the world
by a simple 16 bit or 32 bit integer.
.IP 3)
Along the effort to develop the standard, character mnemonic has been
developed to be able to define equality of characters in the different
encoding systems such as those in ISO 2022.
.LP
It has three implementation levels.
In level 1, all the characters are represented by a single
16 or 32 bit integer.
In level 3, to represent complex combination of several graphic
element, combining characters are introduced,
which effectively is a multibyte representation.
In level 2, limited number of combining characters are available
to specify representations of some limited number of languages.
Still, the problem of ISO 10646 is that the standard is not so universal.
That is, it is sometimes required to have
prior negotiation or external profiling.
For example, in some cases the standard is not useful unless the
information on what language is encoded by the
standard.
For example,
corresponding Han characters in China, Japan and Korea are
assigned the common single code points (called Han unification)
in ISO 10646.
As the graphical form of Han characters in China, Japan and Korea
has developed somewhat independently,
some Han characters are now so different that a form used in
one country is considered to be a wrong form in the
other country.
So, to construct the correct graphical shapes of some Han
characters the language information is necessary.
The language information is necessary
to construct the graphical shapes of almost all Han
characters if the required quality of font is those
actually used now in each nation for plain text processing.
.PP
As the way how combining characters graphically interacts
is unspecified, it can be different language by language.
.PP
As for the requirements in section 2 with ISO 10646,
.IP Universality
Not satisfied.
With level 2 or 3,
the decoding rules on how combining characters affect
the shape of characters is not specified.
Han unification make it impossible to restore correct forms
of some Han characters.
.IP Causality
Mostly satisfied with level 1.
It contains code point for Arabic characters without
the end-of-word information mentioned in section 2.
But, as the code points with the end-of-word
information is also contained, they could be used.
Not satisfied with level 2 or 3 as combining characters
affect the shape of the previous character.
.IP "Finite stateness
Satisfied with level 1.
Seemingly not satisfied in level 2 or 3, as some combining
characters might require push
down automaton to restore a graphic form.
.IP "Finite resynchronizablity
Satisfied with level 1 if 16 bit or 32 bit is used as a byte.
Not satisfied with level 2 or 3, as, after a single base
character, there can be any
number of combining characters.
.IP Equality
Satisfied with level 1.
Not satisfied with level 2 or 3.
That is, though equality between a single 16 bit or 32 bit
character is specified, equality between sequences of
multiple characters are not specified. So, equality
between two text is undefined.
.LP
It is obvious that combining characters of level 2 and level 3
has made the entire standard rather useless.
The requirements in section 2 for ISO 10646 level 1 only is
.IP Universality
Not Satisfied
.IP Causality
Satisfied if unnecessary code points for Arabic are removed
.IP "Finite stateness
Satisfied
.IP "Finite resynchronizability
Satisfied
.IP Equality
Satisfied
.LP
.PP
Thus, ISO 10646 level 1 is not so bad that it could be a base for
the internationalized character code.
The problem is that combining characters in level 2
are necessary to represent some languages.
While level 2 allows free combination of combining characters, which
is quite harmful,
combining characters might not be so harmful if its use is strictly profiled.
Some counties actually have ISO-2022-based encoding
system with limited combination of combining characters.
The problem here is that, as such profiling will differ
language by language, there may not be a
universal way to handle all the
characters in the world.
Or, it is also possible to extend the set of characters in level 1 to contain
all the necessary combination results.
But, as such profiling of combining characters or the enumeration of
required precombined characters
is too much language specific and beyond the author's knowledge,
actual way to support level 2 characters is not discussed in this paper.
.NH
ICODE
.PP
ICODE (Internationalized CODE) is a 21 bit code defined by
adding several bits
to the coded representation of characters in of ISO 10646 level 1
(except for some duplicated code points for Arabic to satisfy Causality
requirement).
Considering that, nowadays, even personal computers have 16MB
of memory or more, a 21 bit encoding space, is quite practical even if some
array must be indexed by the character codes.
.PP
Though the ICODE is, currently, 21 bit, it will actually be used within
32 bit words on most existing machines, which does not matteer at all
as ICODE is a processing code and won't be used for the information
interchange.
IUTF, described in section 5, is provided for the interchange purpose
on communication lines or in files.
.PP
While ISO 10646 allows 32 bit representation of characters (UCS4),
it actually contains code points which can be represented
with 16 bit only.
So, the lower 16 bit of ICODE is identical to ISO 10646.
The added 4 bits are used to extend the set of characters or to
provide language separation information.
.PP
For Han characters,
the combination of four bits are assigned to the source of
characters as identified in the section 26 of ISO 10646 as follows.
.IP 0
Unused. Reserved for compatibility to ISO 10646
.IP 1
Hanzi used by GB standards
.IP 2
Hanzi used by TCA-CNS standards
.IP 3
Kanji used by JIS standards
.IP 4
Hanji used by KS standards
.IP 5~7
Reserved for further languages
.IP 8~15
Used for extension to represent non-Han characters.
.LP
For characters which does not require language information,
the added four bits contain all zeros.
If more than 4 bits are necessary to represent large number
of characters, extra bits could be added to extend ICODE
22 bit, 32 bit or more on top
of the current MSB.
.PP
The MSB of ICODE is a direction bit used to control bi-directionality.
Support for bi-directionality is absolutely necessary to
support some languages such as Arabic.
But, as bi-directionality, in general, have nested structure,
general treatment is impossible with finite-state
mechanism.
That is, the mapping between semantical order and
display order of bi-directional text needs push down
automaton.
So, for the plain text processing, in ICODE, the
display order is used.
The direction bit MSB of ICODE is used to reverse the natural
directionality of
a character.
.PP
That is, with ICODE, all the characters in a line must have the
same directionality and encoded with the display order.
If, in a line of some directionality, characters of different
directionality is needed, direction bits of the characters are set and words
with the characters are spelled backwards.
So, in English context, English are encoded with the natural
order with direction bit reset and Arabic is spelled backwards
with direction bit set.
But, in Arabic context, Arabic are encoded with the natural
order with direction bit reset and English is spelled backwards
with direction bit set.
.PP
The direction bit is also useful to control the line directionality
of text having top to bottom character directionality.
.PP
The requirements in section 2 for ICODE is
.IP Universality
Satisfied
.IP Causality
Satisfied
.IP "Finite stateness
Satisfied
.IP "Finite resynchronizability
Satisfied
.IP Equality
Satisfied
.LP
.PP
To maintain full compatibility to future extension of
ISO 10646, characters in ICODE
also have a representation as UCS4 of ISO 10646.
That is, characters with ICODE values between 0 to 65535 have the same
UCS4 values
(in the BMP), while other characters of ICODE are represented with UCS4
values between
0x7f010000 to 0x7f1fffff (in the private use zone of UCS4) by adding
0x7f000000 to the ICODE values.
Implementors are perfectly free to choose whichever representation of
characters, ICODE or UCS4.
ICODE or UCS4, here, is for processing, not for interchange and
thus its representation is not visible from the outside of programs.
It should be noted that, the two representations are
equivalent as fully ordered sets.
.NH
IUTF
.PP
IUTF (Internationalized UTF) is an interchange form for ICODE
compatible to UTF2 (UCS Transformation Format 2).
.PP
UTF2 is an ASCII compatible variable length multi octet
interchange form for ISO 10646 proposed by X/Open.
.PP
UTF2 is designed considering
.IP 1)
compatibility to UNIX file system
.IP 2)
compatibility to existing programs
.IP 3)
easy conversion between UTF2 and ISO 10646
.IP 4)
that code length can be determined by the first octet
.IP 5)
that code length is short
.IP 6)
finite resynchronizability
.PP
In UTF2, a octet is classified as
.DS
C0:0~32,127
A :33~126
Tx:128~191
T1:192~223
T2:224~239
T3:240~247
T4:248~251
T5:252~253
Ty:254~255(unused)
.DE
.PP
Then, the following combinations of octets
.DS
Octet Sequence code of ISO 10646
C0 0~32,127
A 33~126
T1 Tx 128~2047
T2 Tx Tx 2048~2^16-1
T3 Tx Tx Tx 2^16~2^21-1
T4 Tx Tx Tx Tx 2^21~2^26-1
T5 Tx Tx Tx Tx Tx 2^26~2^31-1
.DE
are used to represent characters in ISO 10646.
Resynchronization of character boundaries is possible by scanning
at most 6 characters.
.PP
Note that, with UTF2, all the characters of major European languages
can be represented
in two octets and all the existing characters of ISO 10646
can be represented in three octets.
.PP
So, IUTF is designed considering
.IP 0)
compatibility to UTF2
.IP 1)
compatibility to UNIX file system
.IP 2)
compatibility to existing programs as interchange code
.IP 3)
fast conversion between IUTF and ISO 10646
.IP 4)
that code length can be determined without looking
ahead extra octets
.IP 5)
that code length is short
.IP 6)
finite resynchronizability
.LP
that is, IUTF is upper compatible to UTF2 both in its format
and its design policy.
Note that 2) is rather meaningless condition as
processing code (ICODE, not IUTF, in this case) is used
in exsisting programs, which is also a processing model of multibyte/wide
characters of ANSI C and X/Open.
.PP
In UTF2, an octet is classified as
.DS
C0:0~32,127
A :33~126
A':33~46,48~126
C1:128~159
Tx:128~191
T1:192~223
T2:224~239(=S2+S3+S4+S6+S7)
S2:224~229
S3:230~235
S4:236~237
S6:238
S7:239
U1:240~255
.DE
Then, the following combinations of octets
.DS
Octet Sequence code of ISO 10646
C0 0~32,127
A 33~126
T1 Tx 128~4095
T2 Tx Tx 4096~65535
.DE
are used to represent characters in UTF2.
Thus, IUTF is compatible to UTF2.
Then, the following combinations of octets are
available to represent extra characters.
.DS
Octet Sequence number of code points represented
T1 A' 2976
T2 A' 1488
U1 A' 1488
U1 Tx 1024
T1 T2 512
T1 U1 512
U1 T2 256
S2 Tx A' 35712
S3 Tx A' Tx >2^21
S4 Tx A' Tx Tx >2^25
S6 Tx A' Tx Tx Tx Tx >2^36
S7 Tx A' Tx Tx Tx Tx Tx >2^42
.DE
Thus, all the character in 21 bit ICODE can be represented
with four octet form by a sequence beginning with S3.
Resynchronization of character boundaries is possible by scanning
at most 8 characters.
.PP
As IUTF have extra 8256 (= 2976 + 1488 + 1488 + 1024 + 512 + 512 + 256)
two octet representations and
35712 three octet representations, which
can be used for short hand notations of characters such as
frequently used non-European characters.
The actual assignment is not yet determined.
Hash tables could be used for the fast translation between ICODE and IUTF
for such shorthand notations.
.NH
Conclusion
.PP
By using ICODE and IUTF, fully internationalized exchange
of various languages in the world has become possible in a
unified, universal way.
.PP
International cooperation is still necessary to
extend ICODE to support characters represented by
ISO 10646 level 2 or level 3 representations
and to assign shorthand notations of IUTF.
From chon@cosmos.kaist.ac.kr Fri Sep 10 13:51:15 1993
Return-Path: <chon@cosmos.kaist.ac.kr>
Received: from han.hana.nm.kr by nic.nm.kr (4.1/SMI-4.1)
id AA08262; Fri, 10 Sep 93 13:51:15 KST
Errors-To: Postmaster@cosmos.kaist.ac.kr
Received: from cosmos.kaist.ac.kr by han.hana.nm.kr (4.1/KUM-0.1)
id AA18408; Fri, 10 Sep 93 13:49:26 KST
Received: by cosmos.kaist.ac.kr (4.1/SMI-4.1)
id AA20669; Fri, 10 Sep 93 13:45:20 KST
Date: Fri, 10 Sep 93 13:45:20 KST
From: chon@cosmos.kaist.ac.kr (Kilnam Chon)
Message-Id: <9309100445.AA20669@cosmos.kaist.ac.kr>
Errors-To: Postmaster@cosmos.kaist.ac.kr
To: ap-i18n@nic.nm.kr
Subject: next meeting
i would like to see initial, good discussion on internationalization/
localization(i.e., local language support) at the next apccirn meeting in
taipei in 1993.12.10-11. since this is the first time to address on the
local language support at apccirn, i would like to see comprehensive
presentation on status reports of several leading countries such as
Japanese, Korean, Chinese, Thai
do you have good idea who to make the comprehensive presentation of each
language/country?
the above presentations may be followed by development of the issue list
for us to focus for the next years such as
unicode
internationalized(generic) network software packages
(others)
i am looking forward to seeing good discussions on the above matters.
kilnam chon
From @IBM3090.snu.ac.kr:WSCHEN@TWNMOE10.BITNET Tue Sep 21 15:13:45 1993
Return-Path: <@IBM3090.snu.ac.kr:WSCHEN@TWNMOE10.BITNET>
Received: from ercc.snu.ac.kr by nic.nm.kr (4.1/SMI-4.1)
id AA03505; Tue, 21 Sep 93 15:13:45 KST
Errors-To: Postmaster@IBM3090.snu.ac.kr
Received: from IBM3090.snu.ac.kr by ercc.snu.ac.kr (4.1/SMI-4.1)
id AA11787; Tue, 21 Sep 93 15:13:44 KST
Message-Id: <9309210613.AA11787@ercc.snu.ac.kr>
Received: from KRSNUCC1.BITNET by IBM3090.snu.ac.kr (IBM VM SMTP R1.2.1) with BSMTP id 4010; Tue, 21 Sep 93 15:09:57 EXP
Received: from TWNMOE10.edu.tw by KRSNUCC1.BITNET (Mailer R2.08) with BSMTP id
9684; Tue, 21 Sep 93 14:51:30 EXP
Received: by TWNMOE10 (Mailer R2.10 ptf000) id 0765;
Tue, 21 Sep 93 13:51:16 EST
Date: Tue, 21 Sep 93 13:45:04 EST
From: Wen-Sung Chen <WSCHEN%TWNMOE10@IBM3090.snu.ac.kr>
Subject: Re: next meeting
To: Kilnam Chon <chon@cosmos.kaist.ac.kr>, ap-i18n@nic.nm.kr
In-Reply-To: Your message of Fri, 10 Sep 93 13:45:20 KST
On Fri, 10 Sep 93 13:45:20 KST you said:
>i would like to see initial, good discussion on internationalization/
>localization(i.e., local language support) at the next apccirn meeting in
>taipei in 1993.12.10-11. since this is the first time to address on the
>local language support at apccirn, i would like to see comprehensive
>presentation on status reports of several leading countries such as
> Japanese, Korean, Chinese, Thai
>do you have good idea who to make the comprehensive presentation of each
>language/country?
>the above presentations may be followed by development of the issue list
>for us to focus for the next years such as
> unicode
> internationalized(generic) network software packages
> (others)
>i am looking forward to seeing good discussions on the above matters.
We would like to arrange a chinese localization presentation
in APCCIRN(Taipei). This presentation will be prepared by
expert of III, Taiwan.
Topic: Chinese Localization and SUCCESS project
1. What is SUCCESS project
2. The current chinese codes
3. The problem with different chinese codes
4. The problem with chinese input
5. The future ?
Any comments?
Wen-Sung Chen (wschen@twnmoe10.bitnet)
(wschen@twnmoe10.edu.tw)
Computer Center, Ministry of Education Phone #: 011-886-2-7377011
Taipei, Taiwan, R.O.C. Fax #: 011-886-2-7377043
From chon@cosmos.kaist.ac.kr Tue Sep 21 15:26:05 1993
Return-Path: <chon@cosmos.kaist.ac.kr>
Received: from han.hana.nm.kr by nic.nm.kr (4.1/SMI-4.1)
id AA03559; Tue, 21 Sep 93 15:26:05 KST
Errors-To: Postmaster@cosmos.kaist.ac.kr
Received: from cosmos.kaist.ac.kr by han.hana.nm.kr (4.1/KUM-0.1)
id AA21034; Tue, 21 Sep 93 15:24:24 KST
Received: by cosmos.kaist.ac.kr (4.1/SMI-4.1)
id AA09592; Tue, 21 Sep 93 15:19:55 KST
From: chon@cosmos.kaist.ac.kr (Kilnam Chon)
Message-Id: <9309210619.AA09592@cosmos.kaist.ac.kr>
Errors-To: Postmaster@cosmos.kaist.ac.kr
Subject: Re: next meeting (fwd)
To: ap-i18n@nic.nm.kr
Date: Tue, 21 Sep 93 15:19:54 KST
X-Mailer: ELM [version 2.3 PL11]
Wen-Sung Chen writes:
>From @IBM3090.snu.ac.kr:WSCHEN@TWNMOE10.BITNET Tue Sep 21 15:07:20 1993
>Errors-To: Postmaster@cosmos.kaist.ac.kr
>Message-Id: <9309210613.AA11787@ercc.snu.ac.kr>
>Date: Tue, 21 Sep 93 13:45:04 EST
>From: Wen-Sung Chen <WSCHEN%TWNMOE10@IBM3090.snu.ac.kr>
>Subject: Re: next meeting
>To: Kilnam Chon <chon@cosmos.kaist.ac.kr>, ap-i18n@nic.nm.kr
>In-Reply-To: Your message of Fri, 10 Sep 93 13:45:20 KST
>
>On Fri, 10 Sep 93 13:45:20 KST you said:
>>i would like to see initial, good discussion on internationalization/
>>localization(i.e., local language support) at the next apccirn meeting in
>>taipei in 1993.12.10-11. since this is the first time to address on the
>>local language support at apccirn, i would like to see comprehensive
>>presentation on status reports of several leading countries such as
>> Japanese, Korean, Chinese, Thai
>>do you have good idea who to make the comprehensive presentation of each
>>language/country?
>>the above presentations may be followed by development of the issue list
>>for us to focus for the next years such as
>> unicode
>> internationalized(generic) network software packages
>> (others)
>>i am looking forward to seeing good discussions on the above matters.
>
>We would like to arrange a chinese localization presentation
>in APCCIRN(Taipei). This presentation will be prepared by
>expert of III, Taiwan.
> Topic: Chinese Localization and SUCCESS project
> 1. What is SUCCESS project
> 2. The current chinese codes
> 3. The problem with different chinese codes
> 4. The problem with chinese input
> 5. The future ?
>
>Any comments?
>
>Wen-Sung Chen (wschen@twnmoe10.bitnet)
> (wschen@twnmoe10.edu.tw)
>Computer Center, Ministry of Education Phone #: 011-886-2-7377011
>Taipei, Taiwan, R.O.C. Fax #: 011-886-2-7377043
>
my idea is to have overview presentation of local language support in Chinese,
Japanese, Korean and other languages as appropriate followed by discussion on
possible cooperation/collaboration in this area. the above presentation would
be a good contribution on this matter.
kilnam chon
From uhhyung Wed Sep 22 16:24:29 1993
Return-Path: <uhhyung>
Received: by nic.nm.kr (4.1/SMI-4.1)
id AA10124; Wed, 22 Sep 93 16:24:29 KST
From: uhhyung (Uhhyung Choi)
Message-Id: <9309220724.AA10124@nic.nm.kr>
Errors-To: Postmaster
Subject: Re: next meeting (fwd)
To: chon@cosmos.kaist.ac.kr (Kilnam Chon)
Date: Wed, 22 Sep 1993 16:24:27 +0900 (KST)
Cc: ap-i18n@nic.nm.kr
In-Reply-To: <9309210619.AA09592@cosmos.kaist.ac.kr> from "Kilnam Chon" at Sep 21, 93 03:19:54 pm
X-Mailer: ELM [version 2.4 PL21-h3]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Length: 644
As Kilnam Chon writes:
*
* my idea is to have overview presentation of local language support
* in Chinese, Japanese, Korean and other languages as appropriate followed
* by discussion on possible cooperation/collaboration in this area.
* the above presentation would be a good contribution on this matter.
*
* kilnam chon
I'm planning to make a presentation on Korean localization efforts and status
in the upcoming APCCIRN meeting in Taipei. It would be a lot more productive
session if every presentation could address cooperation/collaboration issues
such as Unicode etc.
--
Uhhyung Choi
Korea Network Information Center
From uhhyung Wed Sep 22 18:34:03 1993
Return-Path: <uhhyung>
Received: by nic.nm.kr (4.1/SMI-4.1)
id AA11270; Wed, 22 Sep 93 18:34:03 KST
From: uhhyung (Uhhyung Choi)
Message-Id: <9309220934.AA11270@nic.nm.kr>
Errors-To: Postmaster
Subject: Presentation at the next APCCIRN
To: mohta@cc.titech.ac.jp
Date: Wed, 22 Sep 1993 18:34:01 +0900 (KST)
Cc: apccirn-i18n@nic.nm.kr
X-Mailer: ELM [version 2.4 PL21-h3]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Length: 2483
M. Ohta San,
I wonder if you can attend the APCCIRN meeting and present current status
of Japan. And could you please send me(or to the list) the latest draft of
your paper to be presented in upcoming JWCC in Taipei?
Issue list, I think, each presenter should address and we can discuss at the
meeting includes(but not limited to):
localization profile
currently supported localized network softwares
(possibility of joint effort for internationalized
network softwares.)
ongoing efforts and future plans
i.e. strategy for Unicode, ISO/IEC10646
Current list of presenter as follows
?(arranged by Wen-Sung Chen) Taiwan
Uhhyung Choi Korea
Masataka Ohta(?) Japan
? Thai
Any comments?
--
Uhhyung Choi
Korea Network Information Center
P.S. I'm forwarding this mail on Chinese Localization for your information.
--------
As (Sam Shiu) writes:
>From shiu@cs.cuhk.hk Mon Sep 20 19:29:36 1993
>Errors-To: Postmaster@cosmos.kaist.ac.kr
>Message-Id: <9309200859.AA01092@hanzix4.cs.cuhk.hk.cs-sun>
>To: nangu@sm.sony.co.jp, jisyoon@cosmos.kaist.ac.kr,
> johnnie@dascohk.attmail.com
>Subject: Hanzix
>Date: Mon, 20 Sep 1993 16:59:01 +0800
>From: Sam Shiu <shiu@cs.cuhk.hk>
>
>
>Hi, how are you ? My name is Sam Shiu. I am the manager of Hanzix, a
>joint effort by CUHK(Chinese University of Hong Kong), CAS(Chinese Academy
>of Science) of Beijing and III(Institute of Information Industry) of
>Taiwan, dedicated to the development and promotion of a standardised
>Open System for Chinese Computing. Currently, we are working on serveral items
>which may interest you.
> - National Profile of locales & charmap for mainland China
> - National Profile for Taiwan
> - Standard interface to input methods
> - Interim-Hanizx, an operating system built on Unix which supports
> I18N and L10N for Chinese Computing based on ISO 10646.
> Some of the highlights include a file announcement mechanism and
> codeset conversion utilities
>
>We are planning to start a Hanzix work group involving industry and
>research organizations where we can work together on an Open System for
>Chinese Computing.
>I am composing a list of contacts who are interested in our work
>especially those from HK. Would you please let me know if you are
>interested ?
>Regards,
>
>Sam Shiu, email : shiu@cs.cuhk.hk
>Manager, Hanzix Tel : (825) 609-8436
> Fax : (825) 603-5024
From mohta@cc.titech.ac.jp Fri Sep 24 17:12:18 1993
Received: from rc.cc.titech.ac.jp by nic.nm.kr (4.1/SMI-4.1)
id AA21720; Fri, 24 Sep 93 17:12:18 KST
Errors-To: Postmaster@nic.nm.kr
Received: from cc.titech.ac.jp (titcce.cc.titech.ac.jp) by rc.cc.titech.ac.jp (5.65+1.5W/r2TM)
id AA13437; Fri, 24 Sep 93 17:10:21 JST
Received: by cc.titech.ac.jp (5.61/cce-1.5/TM)
id AA11056; Fri, 24 Sep 93 17:04:11 +0900
From: Masataka Ohta <mohta@cc.titech.ac.jp>
Return-Path: <mohta@cc.titech.ac.jp>
Message-Id: <9309240804.AA11056@cc.titech.ac.jp>
Subject: Re: Presentation at the next APCCIRN
To: uhhyung@nic.nm.kr (Uhhyung Choi)
Date: Fri, 24 Sep 1993 17:04:07 +0900 (JST)
Cc: mohta@cc.titech.ac.jp, apccirn-i18n@nic.nm.kr
In-Reply-To: <9309220934.AA11270@nic.nm.kr> from "Uhhyung Choi" at Sep 22, 93 06:34:01 pm
X-Mailer: ELM [version 2.4 PL21]
Content-Type: text
Content-Length: 1051
> M. Ohta San,
Sorry for the delayed answer. I have been on vacation.
> I wonder if you can attend the APCCIRN meeting and present current status
> of Japan.
Sure.
> And could you please send me(or to the list) the latest draft of
> your paper to be presented in upcoming JWCC in Taipei?
I think I sent the latest one to the list one or two month ago.
Didn't you received that? Since then, I have changed nothing yet.
> localization profile
My opinion is that while language dependent localization at application
level is necessary, localization of the character set is the major
obstacle to the universality.
> ?(arranged by Wen-Sung Chen) Taiwan
> Uhhyung Choi Korea
> Masataka Ohta(?) Japan
> ? Thai
>
> Any comments?
I'm quite interested in how people in Thai think about the ISO10646,
because the fully duplexed interactive processing of plain Thai text
is, it seems to me, impossible with ISO 10646.
> P.S. I'm forwarding this mail on Chinese Localization for your information.
Thanks. I'll contact him.
Masataka Ohta
From trin@nwg.nectec.or.th Sat Sep 25 01:52:57 1993
Return-Path: <trin@nwg.nectec.or.th>
Received: from munnari.oz.au by nic.nm.kr (4.1/SMI-4.1)
id AA23551; Sat, 25 Sep 93 01:52:57 KST
Errors-To: Postmaster@nic.nm.kr
Received: from nwg.nectec.or.th by munnari.oz.au with SMTP (5.83--+1.3.1+0.50)
id AA01795; Fri, 24 Sep 1993 23:42:11 +1000 (from trin@nwg.nectec.or.th)
From: trin@nwg.nectec.or.th (Trin Tantsetthi)
Message-Id: <9309241340.AA38240@nwg.nectec.or.th>
To: Masataka Ohta <mohta@cc.titech.ac.jp>
Cc: apccirn-i18n@nic.nm.kr
Subject: Re: Presentation at the next APCCIRN
In-Reply-To: Your message of Fri, 24 Sep 93 17:04:07 V.
<9309240804.AA11056@cc.titech.ac.jp>
Date: Fri, 24 Sep 93 20:40:27 +0700
I won't be able to attend the meeting in Taipei. While I'm not positive
that there will be a representative from Thailand, I hope the discussion
won't stop there.
Ohta-san wrote:
>I'm quite interested in how people in Thai think about the ISO10646,
>because the fully duplexed interactive processing of plain Thai text
>is, it seems to me, impossible with ISO 10646.
As far as character set is concerned, it looks okay. Thailand objected
5 "matras" proposed in Unicode 1.0 (U+0E70 thru U+0E74) and ISO10646
dropped these code points. A big issue which has not been resolved is
encoding.
Thai employs combining marks. A cell (graphic character which is bounded
by a rectangular real estate of the output device) may have multiple
"characters" (which is defined as atomic entity in the script). In general,
a cell contains one base character (rendered on the base line) and optional
combining marks (rendered above or below the base character). Since there
might be multiple combining marks, leaving encoding order of them with
a high degree of freedom (i.e. implementation specific) can be dangerous.
For instance, the word "recover" can be encoded as <U+0E01><U+0E39><U+0E49>
or <U+0E01><U+0E49><U+0E39> according to ISO10646 (sect 23.1) and Unicode
1.0 (pages 627-628).
If one performs data entry using one order and another person performs
record search using query key entered with the second order, database
engine might just report "Record not found".
Looking from another angle, this might be classified as input method
issue. IMO, ISO10646 is a done deal. The chance to impose ISO10646 to
include so many Thai-specific information (on encoding) is minimal.
Standard Thai encoding will be announced as a national standard. It is
still in the pipeline of formality. In a few week from now, a new draft
RFC on Thai encoding will be posted to the ietf-charsets mailing list.
This will be an informational RFC, like iso2022-jp. A mapping table for
Thai has been sent to author of the upcoming RFC1345bis. An application
area director of the IETF also suggested that Thailand registers Thai
as part of the ISO 8859 family. This is still under consideration.
Thai keysym has been proposed to the X Consortium.
As far as i18n is concerned, I have a feeling that character set experts
put, perhaps, too much emphasis on code point and encoding. A big missing
piece in order to complete the i18n vision has not been discussed in
the level of detail I wish. That piece is i18n common runtime library.
IMO, XPG4 still have a long way to go to achieve true i18n goal. It does
not seems to handle combining marks and "Indic" well.
Wouldn't it be nice if APCCRIN-I18N could come up with a proposal of
run-time service/API, either new or as an extension of existing API, that
could cover most if not all languages. In my view, Asia/Pacific Rim
has the most diversity in term of script/language requirements. Our
requirements are so different. If we can't settle i18n requirements
among ourselves, let's trash the hope that true i18n environment would
become a reality.
I guess that's all for progress from Thailand. Comments warmly welcome.
Regards,
Trin
From mohta@necom830.cc.titech.ac.jp Sat Sep 25 23:36:39 1993
Received: from daiduk.kaist.ac.kr by nic.nm.kr (4.1/SMI-4.1)
id AA26183; Sat, 25 Sep 93 23:36:39 KST
Errors-To: Postmaster@nic.nm.kr
Received: from necom830.cc.titech.ac.jp by daiduk.kaist.ac.kr (4.1/KAISTNet-Relay-3.2)
id AA24738; Sat, 25 Sep 93 23:40:45 KST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Sat, 25 Sep 93 23:28:35 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9309251428.AA00882@necom830.cc.titech.ac.jp>
Subject: Re: Presentation at the next APCCIRN
To: trin@nwg.nectec.or.th (Trin Tantsetthi)
Date: Sat, 25 Sep 93 23:28:34 JST
Cc: mohta@cc.titech.ac.jp, apccirn-i18n@nic.nm.kr
In-Reply-To: <9309241340.AA38240@nwg.nectec.or.th>; from "Trin Tantsetthi" at Sep 24, 93 8:40 pm
X-Mailer: ELM [version 2.3 PL11]
> I won't be able to attend the meeting in Taipei. While I'm not positive
> that there will be a representative from Thailand, I hope the discussion
> won't stop there.
We can continue the discussion with mail, of course.
> Ohta-san wrote:
> >I'm quite interested in how people in Thai think about the ISO10646,
> >because the fully duplexed interactive processing of plain Thai text
> >is, it seems to me, impossible with ISO 10646.
>
> As far as character set is concerned, it looks okay. Thailand objected
> 5 "matras" proposed in Unicode 1.0 (U+0E70 thru U+0E74) and ISO10646
> dropped these code points. A big issue which has not been resolved is
> encoding.
Agreed.
> Thai employs combining marks. A cell (graphic character which is bounded
> by a rectangular real estate of the output device) may have multiple
> "characters" (which is defined as atomic entity in the script). In general,
> a cell contains one base character (rendered on the base line) and optional
> combining marks (rendered above or below the base character). Since there
> might be multiple combining marks, leaving encoding order of them with
> a high degree of freedom (i.e. implementation specific) can be dangerous.
As long as we use batch or half duplexed environment, that is the only
problem.
The problem with 10646 for Thai is in fully duplexed interactive processing.
> For instance, the word "recover" can be encoded as <U+0E01><U+0E39><U+0E49>
> or <U+0E01><U+0E49><U+0E39> according to ISO10646 (sect 23.1) and Unicode
> 1.0 (pages 627-628).
What happens if <U+0E01> is received? Should it be displayed immediately?
The problem is identified in my JWCC paper as the causality problem.
> If one performs data entry using one order and another person performs
> record search using query key entered with the second order, database
> engine might just report "Record not found".
>
> Looking from another angle, this might be classified as input method
> issue. IMO, ISO10646 is a done deal. The chance to impose ISO10646 to
> include so many Thai-specific information (on encoding) is minimal.
IMHO, anything which requires so much language specific information
is not universal.
So, I think we develop something new by ourselves based on 10646.
> Standard Thai encoding will be announced as a national standard. It is
> still in the pipeline of formality. In a few week from now, a new draft
> RFC on Thai encoding will be posted to the ietf-charsets mailing list.
Though it will share the same causality problem, it does not matter
for MIME, because mail processing is done as batch.
> As far as i18n is concerned, I have a feeling that character set experts
> put, perhaps, too much emphasis on code point and encoding.
I disagree. Code points can be anything, but, encoding is important.
It's you who said:
> A big issue which has not been resolved is
> encoding.
> A big missing
> piece in order to complete the i18n vision has not been discussed in
> the level of detail I wish. That piece is i18n common runtime library.
> IMO, XPG4 still have a long way to go to achieve true i18n goal. It does
> not seems to handle combining marks and "Indic" well.
>
> Wouldn't it be nice if APCCRIN-I18N could come up with a proposal of
> run-time service/API, either new or as an extension of existing API, that
> could cover most if not all languages. In my view, Asia/Pacific Rim
> has the most diversity in term of script/language requirements. Our
> requirements are so different. If we can't settle i18n requirements
> among ourselves, let's trash the hope that true i18n environment would
> become a reality.
I don't think we can expect such a library contain too much Thai specific
specification. So we need really universal encoding which does not contain
much language specific features.
Also, to be able to figure out a reasonable common runtime library, the
encoding should have several aesthetical properties: those such as
described in my JWCC paper. If you also miss the paper, I will remail
the paper, here.
To my knowledge, Thai and Indic can be processed just as easily as
European languages by encoding them with several thousands of precombined
characters.
To process ancient Hangul characters in the same fashion, about a half
mega precombined characters are necessary.
The encoding space will be 21 or 22 bits.
But, does that matter? The font of most characters can be synthesized at
run time, of course.
Then, we will be able to have a unified library routines to process,
to my knowledge, all the characters in the world.
Masataka Ohta
From uhhyung Mon Oct 4 12:40:49 1993
Return-Path: <uhhyung>
Received: by nic.nm.kr (4.1/SMI-4.1)
id AA08041; Mon, 4 Oct 93 12:40:49 KST
From: uhhyung (Uhhyung Choi)
Message-Id: <9310040340.AA08041@nic.nm.kr>
Subject: Comments on your paper
To: mohta@cc.titech.ac.jp
Date: Mon, 4 Oct 1993 12:40:48 +0900 (KST)
Cc: apccirn-i18n@nic.nm.kr
X-Mailer: ELM [version 2.4 PL21-h3]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Length: 1269
Masataka,
Here goes my 2-cent-worth comment on your paper. First of all, there
seems to be a typo in describing the use of four additional bits for
Han characters. The right term for Han characters used by KS standards
is "Hanja".
I understand USC2 fails to qualify the criteria you mentioned in the
paper, but ICODE would not be acceptable unless you have provisions
for supporting at least strict USC2 level 2 and enough justification
for the proposed method.
1. Is the bidirectional display mode bit really nessecery to be included
in the charset definition? Can't it be treated as a regional matter?
2. As for the rendering problem of Han characters, how about designing
a renderer so that it can display the equivalent shape of the character
from current locale information?
3. What do you think of the comments from the UCS BOF that your solution
is not in the general stream of the development of the standard character
set codes and their applications in the computing systems.
I think we should try to feedback proposed solutions and enhancements in
depolyment issues of 10646, and profiling is an unevitable solution to
presumed weakness of UCS. Possibily a comman Asia-Pacific profile?
--
Uhhyung Choi
Korea Network Information Center
From mohta@necom830.cc.titech.ac.jp Mon Oct 4 18:32:21 1993
Received: from necom830.cc.titech.ac.jp by nic.nm.kr (4.1/SMI-4.1)
id AA12511; Mon, 4 Oct 93 18:32:21 KST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Mon, 4 Oct 93 18:24:13 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9310040924.AA04641@necom830.cc.titech.ac.jp>
Subject: Re: Comments on your paper
To: uhhyung@nic.nm.kr (Uhhyung Choi)
Date: Mon, 4 Oct 93 18:24:11 JST
Cc: mohta@cc.titech.ac.jp, apccirn-i18n@nic.nm.kr
In-Reply-To: <9310040340.AA08041@nic.nm.kr>; from "Uhhyung Choi" at Oct 4, 93 12:40 pm
X-Mailer: ELM [version 2.3 PL11]
> Here goes my 2-cent-worth comment on your paper.
Thank you, very much.
> First of all, there
> seems to be a typo in describing the use of four additional bits for
> Han characters. The right term for Han characters used by KS standards
> is "Hanja".
Oops... Sorry.
> I understand USC2 fails to qualify the criteria you mentioned in the
> paper, but ICODE would not be acceptable unless you have provisions
> for supporting at least strict USC2 level 2 and enough justification
> for the proposed method.
Provisions for USC2 level 2 as is is, as I have proved, unacceptable as an
internationalized plain text encoding method for interaractive use. OK?
Still, provisions for the encoding of text represented with USC2 level 2
is possible, as ICODE has much extra encoding space, even in its 21 bit
form.
For example, encoding of all the possible combination of ancient Hangul
requires only 0.5 mega code points.
Encoding of Thai and Devanagari characters as precombined characters
requires several thouthands of code points only, I think. The resulting
encoding will be much shorter than the one in ISO level 2.
As for the justification, I'd be glad if someone show any other
requirement which an internationalized plain text encoding method for
interaractive use should satisfy.
It should be noted that IUTF is upper compatible to UTF-2 and can
provide much shorter representation for frequently used Asian characters.
> 1. Is the bidirectional display mode bit really nessecery to be included
> in the charset definition?
What does "charset definition" mean? I don't know but I don't mind.
The bit is necessary to make text encoding finitely resynchronizable.
I don't mind at all whether you might call the resulting encoding
method "charset definition" or not.
> Can't it be treated as a regional matter?
WHAT!!!!????? Do you think there can be "a regional matter" in an
internationalized plain text encoding?
People who use Arabic use bidirectionality in thier plain text.
If there can be, we don't need any common encoding method. Anyone
can use thier domestic encoding such as existing ISO 2022 with
implicit announcers and call it "international" because the difference
is "a regional matter".
That' why I defined "universality".
Suppose two Arabic users, A in Korea, B in France tried to communicate
each other. What encoding should they use? What is the proper "region"
to be used as "a regional matter"? Can we expect each Arabic users, most
of them are not expert of computers nor linguistics, know all the possible
encoding of Arabic? What happen if the third person, C in Brazil, who can not
read Arabic at all, tries to relay the message adding a short English comment
at the top of the message?
Well, actually, ICODE makes the bidirectionality somewhat regional, that
is, if an implementor want to drop the support for it, he can. The
direction bit is necessary only when the support for the bidirectionnal
text is necessary. But dropping the support for it only to make it 20
bit is, I think, quite meaningless.
> 2. As for the rendering problem of Han characters, how about designing
> a renderer so that it can display the equivalent shape of the character
> from current locale information?
Locale dependence makes the encoding not universal as an internationalized
encoding.
ISO 10646 allows us to have text which contain English, German and French
at the same time, which was impossible with ISO 8859.
Isn't an minimal requirement to internationalized encoding is to allow us
to have text which contain Chinese, Japanese, Korean and any other
languages at the same time?
Don't you want fairness to international things?
> 3. What do you think of the comments from the UCS BOF that your solution
> is not in the general stream of the development of the standard character
> set codes and their applications in the computing systems.
The comment was in charsets ML, not from UCS BOF.
In the ML, no one has shown the definition of "general stream of the
development".
Thouhg I'm not sure what is the definition, I have also shown that there
can be no single encoding method which could be thought ot be in the
"general stream of the development".
Moreover, I have show, in my paper, that both ISO 2022 and ISO 10646 are
inappropriate as an internationalized plain text encoding method for
interactive use.
So, could you tell me what, do you think, the "general stream of
the development" means, at least?
> I think we should try to feedback proposed solutions and enhancements in
> depolyment issues of 10646, and profiling is an unevitable solution to
> presumed weakness of UCS. Possibily a comman Asia-Pacific profile?
No. It is as bad as ISO 2022, then.
Instead, there should be a internationally single profile, which should be
called the universal encoding.
Then, we will be free from specifying profiles.
Masataka Ohta
From mohta@necom830.cc.titech.ac.jp Mon Dec 20 09:27:52 1993
Received: from necom830.cc.titech.ac.jp ([131.112.4.4]) by nic.nm.kr (4.1/SMI-4.1)
id AA01613; Mon, 20 Dec 93 09:27:52 KST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Mon, 20 Dec 93 09:18:54 +0859
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9312200019.AA21974@necom830.cc.titech.ac.jp>
Subject: Interoperable Localizaion/Internationalization
To: ietf-822@dimacs.rutgers.edu, ietf-charsets@innosoft.com,
apccirn-i18n@nic.nm.kr
Date: Mon, 20 Dec 93 9:18:51 JST
X-Mailer: ELM [version 2.3 PL11]
Attached is a memo of ISO-2022-JP-2 encoding sent to the RFC editor
just recently.
At the APCCIRN (Asia Pasific CCIRN) meeting of early December in Taiwan,
it was decided to merge
ISO-2022-JP-2 in Japan
ISO-2022-KR in Korea
CNS in Taiwan
to develop
ISO-2022-INT-1
as a standard track text encoding method of the Internet for which
I am acting as a coordinator.
It is an attempt to merge various interoperale localizations.
It is also intended to further develop:
ISO-2022-INT-2
ISO-2022-INT-3
etc. in a timely fashion.
Any comments?
Masataka Ohta
PS
Please reply to appropriate mailing lists only.
------------------------------------------------------------------------
Network Working Group M. Ohta
Request for Comments: nnnn Tokyo Institute of Technology
Category: Informational K. Handa
ETL
28 November 1993
ISO-2022-JP-2: Multilingual Extension of ISO-2022-JP
Status of this Memo
This memo provides information for the Internet community. This memo
does not specify an Internet standard of any kind. Distribution and
translation of this memo is unlimited.
Introduction
This memo describes a text encoding scheme: "ISO-2022-JP-2", which is
used experimentally for electronic mail [RFC822] and network news
[RFC1036] messages in several Japanese networks. The encoding is a
multilingual extension of "ISO-2022-JP", the existing encoding for
Japanese [2022JP]. The encoding is supported by an Emacs based
multilingual text editor: MULE [MULE].
The name, "ISO-2022-JP-2", is intended to be used in the "charset"
parameter field of MIME headers (see [MIME1] and [MIME2]).
Description
The text with "ISO-2022-JP-2" starts in ASCII [ASCII], and switches
to other character sets of ISO 2022 [ISO2022] through limited
combinations of escape sequences. All the characters are encoded
with 7 bits only.
At the beginning of text, the existence of an announcer sequence:
"ESC 2/0 4/1 ESC 2/0 4/6 ESC 2/0 5/10" is (though omitted) assumed.
Thus, characters of 94 character sets are designated to G0 and
invoked as GL. C1 control characters are represented with 7 bits.
Characters of 96 character sets are designated to G2 and invoked with
SS2 (single shift two, "ESC 4/14" or "ESC N").
For example, the escape sequence "ESC 2/4 2/8 4/3" or "ESC $ ( C"
indicates that the bytes following the escape sequence are Korean KSC
characters, which are encoded in two bytes each. The escape sequence
"ESC 2/14 4/1" or "ESC . A" indicates that ISO 8859-1 is designated
to G2. After the designation, the single shifted sequence "ESC 4/14
4/1" or "ESC N A" is interpreted to represent a character "A with
acute".
Ohta & Handa [Page 1]
.
RFC nnnn ISO-2022-JP-2 28 November 1993
The following table gives the escape sequences and the character sets
used in "ISO-2022-JP-2" messages. The reg# is the registration number
in ISO's registry [ISOREG].
94 character sets
reg# character set ESC sequence designated to
------------------------------------------------------------------
6 ASCII ESC 2/8 4/2 ESC ( B G0
42 JIS X 0208-1978 ESC 2/4 4/0 ESC $ @ G0
87 JIS X 0208-1983 ESC 2/4 4/2 ESC $ B G0
14 JIS X 0201-Roman ESC 2/8 4/10 ESC ( J G0
58 GB2312-1980 ESC 2/4 4/1 ESC $ A G0
149 KSC5601-1987 ESC 2/4 2/8 4/3 ESC $ ( C G0
159 JIS X 0212-1990 ESC 2/4 2/8 4/4 ESC $ ( D G0
96 character sets
reg# character set ESC sequence designated to
------------------------------------------------------------------
100 ISO8859-1 ESC 2/14 4/1 ESC . A G2
126 ISO8859-7(Greek) ESC 2/14 4/6 ESC . F G2
For further information about the character sets and the escape
sequences, see [ISO2022] and [ISOREG].
If there is any G0 designation in text, there must be a switch to
ASCII or to JIS X 0201-Roman before a space character (but not
necessarily before "ESC 4/14 2/0" or "ESC N ' '") or control
characters such as tab or CRLF. This means that the next line starts
in the character set that was switched to before the end of the
previous line. Though the designation to JIS X 0201-Roman is allowed
for backward compatibility to "ISO-2022-JP", its use is discouraged.
Applications such as pagers and editors which randomly seek within a
text file encoded with "ISO-2022-JP-2" may assume that all the lines
begin with ASCII, not with JIS X 0201-Roman.
At the beginning of a line, information on G2 designation of the
previous line is cleared. New designation must be given before a
character in 96 character sets is used in the line.
The text must end in ASCII designated to G0.
As the "ISO-2022-JP", and thus, "ISO-2022-JP-2", is designed to
represent English and modern Japanese, left-to-right directionality
is assumed if the text is displayed horizontally.
Users of "ISO-2022-JP-2" must be aware that some common transport
such as old Bnews can not relay a 7-bit value 7/15 (decimal 127),
which is used to encode, say, "y with diaeresis" of ISO 8859-1.
Ohta & Handa [Page 2]
.
RFC nnnn ISO-2022-JP-2 28 November 1993
Other restrictions are given in the Formal Syntax section below.
Formal Syntax
The notational conventions used here are identical to those used in
RFC 822 [RFC822].
The * (asterisk) convention is as follows:
l*m something
meaning at least l and at most m somethings, with l and m taking
default values of 0 and infinity, respectively.
message = headers 1*(CRLF text)
; see also [MIME1] "body-part"
; note: must end in ASCII
text = *(single-byte-char /
g2-desig-seq /
single-shift-char)
[*segment
reset-seq
*(single-byte-char /
g2-desig-seq /
single-shift-char ) ]
; note: g2-desig-seq must
; precede single-shift-char
headers = <see [RFC822] "fields" and [MIME1] "body-part">
segment = single-byte-segment / double-byte-segment
single-byte-segment = single-byte-seq
*(single-byte-char /
g2-desig-seq /
single-shift-char )
double-byte-segment = double-byte-seq
*((one-of-94 one-of-94) /
g2-desig-seq /
single-shift-char )
reset-seq = ESC "(" ( "B" / "J" )
single-byte-seq = ESC "(" ( "B" / "J" )
double-byte-seq = (ESC "$" ( "@" / "A" / "B" )) /
Ohta & Handa [Page 3]
.
RFC nnnn ISO-2022-JP-2 28 November 1993
(ESC "$" "(" ( "C" / "D" ))
g2-desig-seq = ESC "." ( "A" / "F" )
single-shift-seq = ESC "N"
single-shift-char = single-shift-seq one-of-96
CRLF = CR LF
; ( Octal, Decimal.)
ESC = <ISO 2022 ESC, escape> ; ( 33, 27.)
SI = <ISO 2022 SI, shift-in> ; ( 17, 15.)
SO = <ISO 2022 SO, shift-out> ; ( 16, 14.)
CR = <ASCII CR, carriage return>; ( 15, 13.)
LF = <ASCII LF, linefeed> ; ( 12, 10.)
one-of-94 = <any one of 94 values> ; (41-176, 33.-126.)
one-of-96 = <any one of 96 values> ; (40-177, 32.-127.)
7BIT = <any 7-bit value> ; ( 0-177, 0.-127.)
single-byte-char = <any 7BIT, including bare CR & bare LF, but NOT
including CRLF, and not including ESC, SI, SO>
MIME Considerations
The name given to the character encoding is "ISO-2022-JP-2". This
name is intended to be used in MIME messages as follows:
Content-Type: text/plain; charset=iso-2022-jp-2
The "ISO-2022-JP-2" encoding is already in 7-bit form, so it is not
necessary to use a Content-Transfer-Encoding header. It should be
noted that applying the Base64 or Quoted-Printable encoding will
render the message unreadable in non-MIME-compliant software.
"ISO-2022-JP-2" may also be used in MIME headers. Both "B" and "Q"
encoding could be useful with "ISO-2022-JP-2" text.
References
Ohta & Handa [Page 4]
.
RFC nnnn ISO-2022-JP-2 28 November 1993
[ASCII] American National Standards Institute, "Coded character set
-- 7-bit American national standard code for information
interchange", ANSI X3.4-1986.
[ISO2022] International Organization for Standardization (ISO),
"Information processing -- ISO 7-bit and 8-bit coded character sets
-- Code extension techniques", International Standard, Ref. No. ISO
2022-1986 (E).
[ISOREG] International Organization for Standardization (ISO),
"International Register of Coded Character Sets To Be Used With
Escape Sequences".
[MIME1] N. Borenstein, N. Freed, "MIME (Multipurpose Internet Mail
Extensions) Part One: Mechanisms for Specifying and Describing the
Format of Internet Message Bodies", RFC 1521, September 1993.
[MIME2] K. Moore, "MIME (Multipurpose Internet Mail Extensions) Part
Two: Message Header Extensions for Non-ASCII Text", RFC 1522,
September 1993.
[RFC822] Crocker, D., "Standard for the Format of ARPA Internet Text
Messages", STD 11, RFC 1522, UDEL, August 1982.
[RFC1036] Horton M., and R. Adams, "Standard for Interchange of
USENET Messages", RFC 1036, AT&T Bell Laboratories, Center for
Seismic Studies, December 1987.
[2022JP] J. Murai, M. Crispin, E. van der Poel, "Japanese Character
Encoding for Internet Messages", RFC 1468, June 1993.
[MULE] M. Nishikimi, K. Handa, S. Tomura, "Mule: MULtilingual
Enhancement to GNU Emacs", Proc. of INET'93, August, 1993.
Acknowledgements
This memo is the result of discussion between various people in a
news group: fj.kanji and is reviewed by a mailing list: jp-msg
@iij.ad.jp. The Authors wish to thank in particular Prof. Eiichi
Wada for his suggestions based on profound knowledge in ISO 2022 and
related standards.
Security Considerations
Security issues are not discussed in this memo.
Authors' Addresses
Ohta & Handa [Page 5]
.
RFC nnnn ISO-2022-JP-2 28 November 1993
Masataka Ohta
Tokyo Institute of Technology
2-12-1, O-okayama, Meguro-ku,
Tokyo 152, JAPAN
Phone: +81-3-5499-7084
Fax: +81-3-3729-1940
EMail: mohta@cc.titech.ac.jp
Ken'ichi Handa
Electrotechnical Laboratory
Umezono 1-1-4, Tsukuba,
Ibaraki 305, JAPAN
Phone: +81-298-58-5916
Fax: +81-298-58-5918
EMail: handa@etl.go.jp
Ohta & Handa [Page 6]
.
From mohta@necom830.cc.titech.ac.jp Tue Dec 21 15:44:31 1993
Received: from necom830.cc.titech.ac.jp by nic.nm.kr (4.1/SMI-4.1)
id AA04880; Tue, 21 Dec 93 15:44:31 KST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Tue, 21 Dec 93 15:34:23 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9312210634.AA00144@necom830.cc.titech.ac.jp>
Subject: Re: Proposals for 10646/Unicode in MIME
To: rhys@cs.uq.oz.au, ietf-charsets@innosoft.com, apccirn-i18n@nic.nm.kr
Date: Tue, 21 Dec 93 15:34:21 JST
Cc: dcrocker@mordor.stanford.edu, David_Goldsmith@taligent.com,
ietf-822@dimacs.rutgers.edu, unicored@unicode.org
Reply-To: ietf-charsets@innosoft.com, unicored@unicode.org,
apccirn-i18n@nic.nm.kr
In-Reply-To: <9312202223.AA28439@client>; from "rhys@cs.uq.oz.au" at Dec 21, 93 8:23 am
X-Mailer: ELM [version 2.3 PL11]
Note: As the issue is on text encoding in general, Reply-To: is not
directed to ietf-822.
> I note here that Masataka's proposal for ISO-2022-JP-2 demonstrates what
> we've been arguing all along: it is not enough to just have a character
> encoding.
Recently I avoid to use the word "character" as much as possible and
use the phrase "text encoding", because the concept of "character"
beyond ASCII can not be well defined. Various units of text encoding
are necessary for different purposes.
Thus, I think the names such as MIME charset and ietf-charsets ML
no good.
> There also needs to be some form of markup to distinguish
> different usages of the same character encoding. ISO-2022-JP-2 uses
> escape sequences to do markup, whereas a UNICODE version of text/enriched
> would use <...> tags.
ISO-2022-JP-2 does not do any markup. It is for plain text.
It is finite state. It has no nesting.
I don't think anything with nested structure is plain text.
It is and its successors will be as stateless as practically possible
with ISO 2022.
That is, at the beginning of a line, the state can be assumed to be unique.
> The main difference I can see is that ISO-2022-JP-2
> requires the use of markup, even when the whole message is in the same
> language, but UNICODE can get away without markup for 99% of messages,
It is a meaningless difference.
Whether it is 1% or 100%, you need the same amount of codings, fonts,
settings of config.sys and such, anyway.
> letting local conventions set the default language.
That is one of a very important difference.
Unlike UNICODE, ISO-2022-JP-2 is intended to be used in internationalized
environment. It needs no local conventions. BTW, MIME charsets also, can
not depend on local conventions.
> I still fail to see why Masataka objects to UNICODE since his own proposal has
> to jump through the same markup hoops. The only advantage of ISO-2022-JP-2
> that I can see is that it will work on existing terminals without special
> software in some communities.
Then, you can see nothing.
ISO-2022-JP-2 is produced from long and extensive
localization/internationalization experiences in Japanese computer community
with ISO-2022-JP, EUC, SJIS and such.
First of all, ISO-2022-JP-2 can interoperate with ASCII.
Next, it is 7 bit.
Thus, it can interoperate with any ASCII compatible text encoding such
as EUC (both UJIS and EUC-KR) and SJIS.
More importantly, it can interoperate with the future ultimate ASCII
compatible 8 bit encoding. Of course, UNICODE is NOT the future.
We do know that having two or more uninteroperable encodings such
as EUS and SJIS or ASCII and 16bit-UNICODE is the real pain.
> A specious argument at best, since the rest
> of the world does need special software to view ISO-2022-JP-2 anyway.
ISO-2022-JP-2 is, and ISO-2022-INT-1 will be, designed to aid those
who immediately need localization.
I don't think it be a long term solution.
Both ISO 2022 and ISO 10646/UNICODE has a unified syntax to mix
multilingual characters in the world. ISO 2022 is much better for
us to be able to separate C/J/K characters.
On the other hand, both ISO 2022 and ISO 10646/UNICODE lacks a unified
semantics to mix multilingual characters in the world. ISO 10646/UNICODE
inherits the policy of ISO 2022 to treat characters in different languages
differently. Thus, it is impossible to write a unified text processing
library or application of meaningfully rich functionality.
Thus, for the time being, our solution must be 7 bit ISO 2022.
As a long term solution, I have designed ICODE/IUTF, which has, besides
ASCII compatibility, several useful semantical properties for, as far
as I know, all the characters in the world. With a large enough encoding
space (though not impractically large), the real, semantical, unification
is possible.
> UNICODE has the advantage that if a message gets corrupted and the markup
> is lost, there is still a reasonable character that can be displayed, which
> is close enough not to cause the sky to fall in on the reader. Such corruption
> could easily happen when a message is quoted. What happens with ISO-2022-JP-2?
Misquoting is the issue which MUST be solved by faulty MTAs and other
faulty transports. Providing workarounds will only result in the delay
of the real solution.
Instead, the real state corruption problem is caused in an interactive
environment where individual programs output their own text streams
simultaneously.
With ISO-2022-JP-2, unlike text/enriched, the state is resumed at the
beginning of the next line.
> People have tried time and again to add markup to UNICODE to satisfy Masataka
> (e.g. language tags), but it just doesn't seem to satisfy him. *sigh*
Strange.
I have *ABSOLUTELY* *NO* interest in text/enriched from the beginning.
I and most of the people in the world want to process our natural
languages as plain text in internationalized environment.
We already have a lot of experience to use our languages as plain text.
You can't force us give up plain text.
Masataka Ohta
PS
For more information on ICODE, why ISO 10646/UNICODE is no good and how
can it be improved, see:
"Character Encoding Method for Internationalized Plain
Text Processing", Proceedings of 8th International Joint
Workshop on Computer Communications, Masataka OHTA,
Dec. 1993.
electric copy is available from me.
From rong@watson.ibm.com Wed Dec 22 08:31:48 1993
Return-Path: <rong@watson.ibm.com>
Received: from watson.ibm.com by nic.nm.kr (4.1/SMI-4.1)
id AA07503; Wed, 22 Dec 93 08:31:48 KST
Received: from WATSON by watson.ibm.com (IBM VM SMTP V2R3) with BSMTP id 3129;
Tue, 21 Dec 93 18:26:16 EST
Received: from YKTVMH by watson.vnet.ibm.com with "VAGENT.V1.0"
id 6180; Tue, 21 Dec 1993 18:26:08 EST
Received: from hawpub.watson.ibm.com by yktvmh.watson.ibm.com (IBM VM SMTP V2R3)
with TCP; Tue, 21 Dec 93 18:26:07 EST
Received: by hawpub.watson.ibm.com (AIX 3.2/UCB 5.64/930311)
id AA27752; Tue, 21 Dec 1993 18:26:15 -0500
Date: Tue, 21 Dec 1993 18:26:15 -0500
From: rong@watson.ibm.com (Rong Chang)
Message-Id: <9312212326.AA27752@hawpub.watson.ibm.com>
To: apccirn-i18n@nic.nm.kr
Subject: subscribe
Please add me to the mailing list. I'm sorry for posting this request
to the mailing list because "apccirn-i18n-request@nic.nm.kr" was not
available.
-rong
From mohta@necom830.cc.titech.ac.jp Thu Dec 23 01:30:28 1993
Received: from necom830.cc.titech.ac.jp by nic.nm.kr (4.1/SMI-4.1)
id AA02216; Thu, 23 Dec 93 01:30:28 KST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Thu, 23 Dec 93 01:21:41 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9312221621.AA07900@necom830.cc.titech.ac.jp>
Subject: Re: subscribe
To: rong@watson.ibm.com (Rong Chang)
Date: Thu, 23 Dec 93 1:21:40 JST
Cc: apccirn-i18n@nic.nm.kr
In-Reply-To: <9312212326.AA27752@hawpub.watson.ibm.com>; from "Rong Chang" at Dec 21, 93 6:26 pm
X-Mailer: ELM [version 2.3 PL11]
> Please add me to the mailing list.
Welcome.
Could you introduce yourself to the mailing list?
Who are you and what are you doing in i18n area?
> I'm sorry for posting this request
> to the mailing list because "apccirn-i18n-request@nic.nm.kr" was not
> available.
I was aware about that.
But, formally speaking, apccirn related MLs are not open to the public.
Practically speaking, I, personally, would like to add everyone who
are interested in our activities and give us technical contribution.
Masataka Ohta
From jinho@iti.gov.sg Fri Dec 24 13:16:22 1993
Return-Path: <jinho@iti.gov.sg>
Received: from iti.gov.sg by nic.nm.kr (4.1/SMI-4.1)
id AA08975; Fri, 24 Dec 93 13:16:22 KST
Received: by iti.gov.sg (4.1/SMI-4.1)
id AA14773; Fri, 24 Dec 93 12:09:02 SST
From: jinho@iti.gov.sg (Tan Jin Ho)
Message-Id: <9312240409.AA14773@iti.gov.sg>
Subject: Re: Proposals for 10646/Unicode in MIME
To: ietf-charsets@innosoft.com, unicored@unicode.org, apccirn-i18n@nic.nm.kr
Date: Fri, 24 Dec 93 12:09:00 WST
In-Reply-To: <9312210634.AA00144@necom830.cc.titech.ac.jp>; from "Masataka Ohta" at Dec 21, 93 3:34 pm
X-Mailer: ELM [version 2.3 PL11]
Hi,
I would like to have a soft copy of the following report.
"Character Encoding Method for Internationalized Plain
Text Processing", Proceedings of 8th International Joint
Workshop on Computer Communications, Masataka OHTA,
Dec. 1993.
Could you email it to me @ jinho@ncb.gov.sg. Thank you.
Regards,
Jin-Ho
From mohta@necom830.cc.titech.ac.jp Sat Dec 25 23:12:37 1993
Received: from necom830.cc.titech.ac.jp by nic.nm.kr (4.1/SMI-4.1)
id AA11484; Sat, 25 Dec 93 23:12:37 KST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Sat, 25 Dec 93 23:04:02 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9312251404.AA16696@necom830.cc.titech.ac.jp>
Subject: Re: subscribe
To: rong@watson.ibm.com (Rong Chang)
Date: Sat, 25 Dec 93 23:04:00 JST
Cc: rong@watson.ibm.com, apccirn-i18n@nic.nm.kr
In-Reply-To: <9312221846.AA40392@hawpub.watson.ibm.com>; from "Rong Chang" at Dec 22, 93 1:46 pm
X-Mailer: ELM [version 2.3 PL11]
> I was born in Taiwan, and have been interested in multilingual,
> multimedia mail systems for several years.
Interesting. Can your system handle a single message containing
arbitrary mixed multiple script languages? Or can it only handle multiple
messages each containing a single (or, maybe, double) script language?
> "I18n" is new to me. It would me nice if someone could send me an FAQ
> list regarding i18n.
As I18n is new to everyone, :-) FAQ list is not avilable.
The currently hot topic of I18n is internationalized text encoding,
which necessarily related to multilingual issues.
Masataka Ohta
From apccirn-sec Tue Jan 25 18:33:33 1994
Received: from necom830.cc.titech.ac.jp by nic.nm.kr (8.6.4/8.6.4)
id SAA01316; Tue, 25 Jan 1994 18:32:45 +0900
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Tue, 25 Jan 94 18:23:25 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9401250923.AA17095@necom830.cc.titech.ac.jp>
Subject: Re: ISO-2022-INT-1
To: apccirn-i18n@nic.nm.kr
Date: Tue, 25 Jan 94 18:23:23 JST
In-Reply-To: <no.id>; from "mohta" at Dec 24, 93 5:55 pm
X-Mailer: ELM [version 2.3 PL11]
> I'll be happy if the responses will be returned before 1/25 (the
> earlier the better, of course). I expect much earlier response
> on your personal (not the communities) opinions.
It's 1/25.
According to several comments, I have revised the previous version
of pre-internet-draft of ISO-2022-INT-1.
Major changes are:
It is now fully compatible to ISO-2022-KR (G1 invocation is
allowed)
Greek characters can be efficiently encoded with G1
It is described to be not only for messages but for everything
Introduction of aggregated name of ISO-2022-INT-*
Bidirectionality is not yet supported.
Any comments?
Masataka Ohta
------------------------------------------------------------------------
INTERNET DRAFT APCCIRN-I18N
draft-filename-01.txt February 1994
Internet Multilingual Text Encoding: ISO-2022-INT-*
Status of this Memo
This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas,
and its working groups. Note that other groups may also distribute
working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months. Internet-Drafts may be updated, replaced, or obsoleted by
other documents at any time. It is not appropriate to use Internet-
Drafts as reference material or to cite them other than as a
``working draft'' or ``work in progress.''
To learn the current status of any Internet-Draft, please check the
1id-abstracts.txt listing contained in the Internet-Drafts Shadow
Directories on ds.internic.net, nic.nordu.net, ftp.nisc.sri.com, or
munnari.oz.au.
Abstract
Based on the experience with "ISO-2022-JP-2" (RFC 1554), a
multilingual text encoding scheme, "ISO-2022-INT-1", is designed as
an extension of "ISO-2022-JP" (RFC 1468) and "ISO-2022-KR" (RFC
1557).
The encoding is ASCII compatible and 7-bit, thus, can be used mixed
with any ASCII compatible encoding. The encoding is designed to be
as stateless as practically possible with ISO 2022. That is, no state
information needs to be preserved between lines.
"ISO-2022-INT-1" and its successors have an aggregated name: "ISO-
2022-INT-*".
Introduction
This memo describes a text encoding scheme: "ISO-2022-INT-1", which
is intended to be a text encoding scheme of the Internet including,
but not limited to, for electronic mail [RFC822] and network news
[RFC1036]. The encoding is also useful in multilingual text files.
The encoding is a multilingual extension of "ISO-2022-JP" [2022JP]
and "ISO-2022-KR" [2022KR]. The encoding is supported by an Emacs
based multilingual text editor: MULE [MULE].
APCCIRN-I18N Expires on Aug 1, 1994 [Page 1]
.
INTERNET DRAFT Internet Multilingual Text Encoding February 1994
The name, "ISO-2022-INT-1", is intended to be used in the "charset"
parameter field of MIME headers (see [MIME1] and [MIME2]).
Description
The text with "ISO-2022-INT-1" starts in ASCII [ASCII] designated to
G0 invoked as GL and KS C 5601 [KSC5601] designated to to G1, and
switches to other character sets of ISO 2022 [ISO2022] through
limited combinations of designation/invocation sequences. All the
characters are encoded with 7 bits only.
At the beginning of text, the existence of an announcer sequence:
"ESC 2/0 4/6 ESC 2/0 5/0 ESC 2/0 5/2 ESC 2/0 5/10" and a
designation/invocation sequence: "ESC 2/8 4/2 SI ESC 2/4 2/9 4/3 ESC
2/10 7/14 ESC 2/11 7/14" are (though omitted) assumed. The same
designation/invocation sequence is also assumed (though unnecessary
and, thus, omitted) at the beginning of each line. Thus, C1 control
characters are represented with 7 bits. Characters of 94 character
sets are designated to G0 or G1 and invoked as GL by SI (shift in,
'0/15') and SO (shift out, '0/14') each. Characters of 96 character
sets are designated to G1 and invoked as GL by SO or they may be
designated to G2 and invoked with SS2 (single shift two, "ESC 4/14"
or "ESC N").
For example, the escape sequence "ESC 2/4 2/8 4/3" or "ESC $ ( C"
indicates that the bytes following the escape sequence are Korean KSC
characters, which are encoded in two bytes each. A double byte
sequence enclosed by SO and SI also indicates a KSC string unless
other character sets are designated to G1. The escape sequence "ESC
2/14 4/1" or "ESC . A" indicates that ISO 8859-1 is designated to G2.
After the designation, the single shifted sequence "ESC 4/14 4/1" or
"ESC N A" is interpreted to represent a character "A with acute".
The following table gives the escape sequences and the character sets
used in "ISO-2022-INT-1" messages. The reg# is the registration
number in ISO's registry [ISOREG].
94 character sets
reg# character set ESC sequence designated to
------------------------------------------------------------------
6 ASCII ESC 2/8 4/2 ESC ( B G0
ESC 2/9 4/2 ESC ) B G1
14 JIS X 0201-Roman ESC 2/8 4/10 ESC ( J G0
ESC 2/9 4/10 ESC ) J G1
94*94 character sets
reg# character set ESC sequence designated to
------------------------------------------------------------------
APCCIRN-I18N Expires on Aug 1, 1994 [Page 2]
.
INTERNET DRAFT Internet Multilingual Text Encoding February 1994
42 JIS X 0208-1978 ESC 2/4 4/0 ESC $ @ G0
ESC 2/4 2/9 4/0 ESC $ ) @ G1
58 GB 2312-80 ESC 2/4 4/1 ESC $ A G0
ESC 2/4 2/9 4/1 ESC $ ) A G1
87 JIS X 0208-1983 ESC 2/4 4/2 ESC $ B G0
ESC 2/4 2/9 4/2 ESC $ ) B G1
149 KS C 5601-1987 ESC 2/4 2/8 4/3 ESC $ ( C G0
ESC 2/4 2/9 4/3 ESC $ ) C G1
159 JIS X 0212-1990 ESC 2/4 2/8 4/4 ESC $ ( D G0
ESC 2/4 2/9 4/4 ESC $ ) D G1
171 CNS 11643-1986-1 ESC 2/4 2/8 4/7 ESC $ ( G G0
ESC 2/4 2/9 4/7 ESC $ ) G G1
172 CNS 11643-1986-2 ESC 2/4 2/8 4/8 ESC $ ( H G0
ESC 2/4 2/9 4/8 ESC $ ) H G1
96 character sets
reg# character set ESC sequence designated to
------------------------------------------------------------------
100 ISO8859-1 ESC 2/13 4/1 ESC - A G1
ESC 2/14 4/1 ESC . A G2
126 ISO8859-7(Greek) ESC 2/13 4/6 ESC - F G1
ESC 2/14 4/6 ESC . F G2
Handling of code points not specified in each standard is
implementation dependent. For further information about the
character sets and the escape sequences, see [ISO2022] and [ISOREG].
Some Asian standards are also described in chapter 3 and 4 of
[LUNDE].
If there is any G0 designation other than ASCII in text, there must
be a switch back to ASCII before a space character '2/0' (but not
necessarily before '2/0' code of 96 character set such as "ESC 4/14
2/0" or "ESC N ' '") or control characters such as tab or CRLF. If
there is any G1 designation other than KS C [KSC5601] in text, there
must be a switch back to KS C before the end of line. If there is
any G1 invocation in text, there must be a switch back to G0
invocation before a space character (but not necessarily before "ESC
4/14 2/0" or "ESC N ' '") or control characters such as tab or CRLF.
This means that the next line starts in the ASCII character set that
was switched to before the end of the previous line.
Though ISO 2022 [ISO2022] and related standards permits long term,
persistent states, "ISO-2022-INT-1" is designed not to need such
states be preserved between lines. Applications such as pagers and
editors which randomly seek within a text file encoded with "ISO-
2022-INT-1" can assume that the state is same as that of the
beginning of the text.
APCCIRN-I18N Expires on Aug 1, 1994 [Page 3]
.
INTERNET DRAFT Internet Multilingual Text Encoding February 1994
Thus, in each line containing 96 character sets, G2 designation must
be given before 96 character set is used.
The text will end in ASCII designated to G0.
Left-to-right directionality is assumed if the text is displayed
horizontally.
Users of "ISO-2022-INT-1" must be aware that some common transport
such as old Bnews can not relay a 7-bit value 7/15 (decimal 127),
which is used to encode, say, "y with diaeresis" of ISO 8859-1.
Other restrictions are given in the Formal Syntax section below.
Formal Syntax
The notational conventions used here are identical to those used in
RFC 822 [RFC822].
The * (asterisk) convention is as follows:
l*m something
meaning at least l and at most m somethings, with l and m taking
default values of 0 and infinity, respectively.
text = *(line CRLF)
line = *(single-byte-char /
single-shift-char /
(*g0-segment reset-desig-seq) /
g1-segment /
g1-desig-seq /
g2-desig-seq )
; note: within a line,
; g2-desig-seq must precede
; single-shift-char
; note2: must end KS C
; designated to G1
g0-segment = single-byte-g0-segment /
double-byte-g0-segment
single-byte-g0-segment = single-byte-g0-seq
*(single-byte-char / single-shift-char)
double-byte-g0-segment = double-byte-g0-seq
*((one-of-94 one-of-94) / single-shift-char)
APCCIRN-I18N Expires on Aug 1, 1994 [Page 4]
.
INTERNET DRAFT Internet Multilingual Text Encoding February 1994
g1-segment = single-byte-g1-94-segment /
single-byte-g1-96-segment /
double-byte-g1-segment
; note: an appropriate segment
; should be selected according
; to the current state of G1
; designation
single-byte-g1-94-segment = SO *(one-of-94 / single-shift-char) SI
single-byte-g1-96-segment = SO *(one-of-96 / single-shift-char) SI
double-byte-g1-segment = SO
*((one-of-94 one-of-94) /
single-shift-char )
SI
reset-desig-seq = ESC "(" "B"
single-byte-g0-seq = ESC "(" ( "B" / "J" )
g1-desig-seq = single-byte-g1-seq / double-byte-g1-seq
single-byte-g1-seq = (ESC ")" ( "B" / "J" )) /
(ESC "-" ( "A" / "F" ))
double-byte-g1-seq = ESC "$" "(" ( "@" / "A" / "B" /
"C" / "D" / "G" / "H" )
g2-desig-seq = ESC "." ( "A" / "F" )
single-shift-seq = ESC "N"
single-shift-char = single-shift-seq one-of-96
CRLF = CR LF
; ( Octal, Decimal.)
ESC = <ISO 2022 ESC, escape> ; ( 33, 27.)
SI = <ISO 2022 SI, shift-in> ; ( 17, 15.)
SO = <ISO 2022 SO, shift-out> ; ( 16, 14.)
CR = <ASCII CR, carriage return>; ( 15, 13.)
LF = <ASCII LF, linefeed> ; ( 12, 10.)
APCCIRN-I18N Expires on Aug 1, 1994 [Page 5]
.
INTERNET DRAFT Internet Multilingual Text Encoding February 1994
one-of-94 = <any one of 94 values> ; (41-176, 33.-126.)
one-of-96 = <any one of 96 values> ; (40-177, 32.-127.)
7BIT = <any 7-bit value> ; ( 0-177, 0.-127.)
single-byte-char = <any 7BIT, including bare CR & bare LF, but NOT
including CRLF, and not including ESC, SI, SO>
Mail System Considerations
"ISO-2022-INT-1" is designed to be purely 7-bit, so that it can be
used with any transport which conforms to STD 11, RFC822 [RFC822]
without MIME, which is the current practice in Japan to use "ISO-
20220-JP" [2022JP].
If "ISO-2022-INT-1" is used with MIME, MIME charset name may be given
as follows:
Content-Type: text/plain; charset=iso-2022-int-1
Even if charset parameters are omitted, multilingual applications
should, in spite of [MIME1], still assume iso-2022-int-1 or its
latest available successor (see the section "Future Extension Plan"),
not US-ASCII, is used.
The "ISO-2022-INT-1" encoding is already in 7-bit form, so it is not
necessary to use a Content-Transfer-Encoding header. It should be
noted that applying the Base64 or Quoted-Printable encoding will
render the message unreadable in non-MIME-compliant software.
"ISO-2022-INT-1" may also be used in mail headers. If bare STD11,
RFC822 without MIME is used, appropriate quoting of special
characters as "quoted string" might be necessary with structured
headers, which might not be supported in all the common environment.
In MIME headers, Both "B" and "Q" encoding could be useful with
"ISO-2022-INT-1" text.
Future Extension Plan
Future extensions of "ISO-2022-INT-1" will be named "ISO-2022-INT-2",
"ISO-2022-INT-3" and so on. MIME charset names beginning with "ISO-
2022-INT-" are reserved for them. The family of encoding has an
aggregated name: "ISO-2022-INT-*".
The extensions will be solely by adding extra character sets of ISO
2022, though other extensions such as for bidirectionality support
are possible. To avoid duplicated assignment of escape sequences,
APCCIRN-I18N Expires on Aug 1, 1994 [Page 6]
.
INTERNET DRAFT Internet Multilingual Text Encoding February 1994
formal ISO registry [ISOREG] will be required.
The current feature of an initial designation of KS C 5601 to G1 will
be removed in the versions of near future. Users of ISO-2022-INT-1
are recommended to explicitly designate KS C 5601 to G1.
To minimize the number of character sets, those which is already
covered by the larger character sets and not so widely used should
not be added. For example, Katakana character set of "JIS X 0201-
Kana" is omitted because the set is completely covered by "JIS X
0208-1978" and not used at all in the Internet community of Japan.
In any event, the property of "ISO-2022-INT-1" that:
Though ISO 2022 [ISO2022] and related standards permits long term,
persistent states, "ISO-2022-INT-1" is designed not to need such
states be preserved between lines. Applications such as pagers
and editors which randomly seek within a text file encoded with
"ISO-2022-INT-1" can assume that the state is same as that of the
beginning of the text.
will be preserved.
References
[ASCII] American National Standards Institute, "Coded character set
-- 7-bit American national standard code for information
interchange", ANSI X3.4-1986.
[ISO2022] International Organization for Standardization (ISO),
"Information processing -- ISO 7-bit and 8-bit coded
character sets -- Code extension techniques", International
Standard, Ref. No. ISO 2022-1986 (E).
[ISOREG] International Organization for Standardization (ISO),
"International Register of Coded Character Sets To Be Used
With Escape Sequences".
[MIME1] Borenstein, N., and Freed, N., "MIME (Multipurpose Internet
Mail Extensions) Part One: Mechanisms for Specifying and
Describing the Format of Internet Message Bodies", RFC 1521,
September 1993.
[MIME2] Moore, K., "MIME (Multipurpose Internet Mail Extensions) Part
Two: Message Header Extensions for Non-ASCII Text", RFC 1522,
September 1993.
[RFC822] Crocker, D., "Standard for the Format of ARPA Internet Text
APCCIRN-I18N Expires on Aug 1, 1994 [Page 7]
.
INTERNET DRAFT Internet Multilingual Text Encoding February 1994
Messages", STD 11, RFC 822, August 1982.
[RFC1036] Horton M., and Adams, R., "Standard for Interchange of
USENET Messages", RFC 1036, AT&T Bell Laboratories, Center
for Seismic Studies, December 1987.
[2022JP] Murai, J., Crispin, M., and E. van der Poel, "Japanese
Character Encoding for Internet Messages", RFC 1468, June
1993.
[2022JP2] Ohta, M., and Handa, K. "ISO-2022-JP-2: Multilingual
Extension of ISO-2022-JP", RFC 1554, December 1993.
[2022KR] U. Choi, K., and Chon, H. Park, "Korean Character Encoding
for Internet Messages", RFC 1557, December 1993.
[KSC5601] Korea Industrial Standards Association, "Code for
Information Interchange (Hangul and Hanja)," Korean
Industrial Standard, 1987, Ref. No. KS C 5601-1987.
[MULE] Nishikimi, M., Handa, K., and Tomura, S., "Mule: MULtilingual
Enhancement to GNU Emacs", Proc. of INET'93, August, 1993.
[LUNDE] Lunde, K., "Understanding Japanese Information Processing,",
O'Reilly & Associates, Inc., ISBN 1-56592-043-0, 1993.
Acknowledgements
(to be supplied)
Security Considerations
Security issues are not discussed in this memo.
Authors' Addresses
(to be supplied)
APCCIRN-I18N Expires on Aug 1, 1994 [Page 8]
.
From apccirn-sec Tue Jan 25 18:47:02 1994
Received: from necom830.cc.titech.ac.jp by nic.nm.kr (8.6.4/8.6.4)
id SAA01350; Tue, 25 Jan 1994 18:46:51 +0900
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Tue, 25 Jan 94 18:37:20 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9401250937.AA17209@necom830.cc.titech.ac.jp>
Subject: Re: ISO-2022-INT-1
To: mohta@necom830.cc.titech.ac.jp (Masataka Ohta)
Date: Tue, 25 Jan 94 18:37:19 JST
Cc: apccirn-i18n@nic.nm.kr
In-Reply-To: <9401250923.AA17095@necom830.cc.titech.ac.jp>; from "Masataka Ohta" at Jan 25, 94 6:23 pm
X-Mailer: ELM [version 2.3 PL11]
>
> > I'll be happy if the responses will be returned before 1/25 (the
> > earlier the better, of course). I expect much earlier response
> > on your personal (not the communities) opinions.
>
> It's 1/25.
>
> According to several comments, I have revised the previous version
> of pre-internet-draft of ISO-2022-INT-1.
> Any comments?
I forgot to mention that I'll post a draft with further revision as an
Internet Draft early in February.
Masataka Ohta
From apccirn-sec Wed Jan 26 14:11:18 1994
Received: from necom830.cc.titech.ac.jp by nic.nm.kr (8.6.4/8.6.4)
id OAA04486; Wed, 26 Jan 1994 14:10:45 +0900
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Wed, 26 Jan 94 14:01:32 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9401260501.AA21732@necom830.cc.titech.ac.jp>
Subject: Re: ISO-2022-INT-1
To: mohta@necom830.cc.titech.ac.jp (Masataka Ohta)
Date: Wed, 26 Jan 94 14:01:30 JST
Cc: apccirn-i18n@nic.nm.kr
In-Reply-To: <9401250923.AA17095@necom830.cc.titech.ac.jp>; from "Masataka Ohta" at Jan 25, 94 6:23 pm
X-Mailer: ELM [version 2.3 PL11]
The following is a slimed down version.
The changes to the yeasterdays draft are:
A single character set is designated only to G0 or G1
G2 and SS2 is not used
Error of designatio sequence in formal syntax section is
corrected
Which one do you like better?
Masataka Ohta
------------------------------------------------------------------------
INTERNET DRAFT APCCIRN-I18N
draft-filename-01.txt February 1994
Internet Multilingual Text Encoding: ISO-2022-INT-*
Status of this Memo
This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas,
and its working groups. Note that other groups may also distribute
working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months. Internet-Drafts may be updated, replaced, or obsoleted by
other documents at any time. It is not appropriate to use Internet-
Drafts as reference material or to cite them other than as a
``working draft'' or ``work in progress.''
To learn the current status of any Internet-Draft, please check the
1id-abstracts.txt listing contained in the Internet-Drafts Shadow
Directories on ds.internic.net, nic.nordu.net, ftp.nisc.sri.com, or
munnari.oz.au.
Abstract
Based on the experience with "ISO-2022-JP-2" (RFC 1554), a
multilingual text encoding scheme, "ISO-2022-INT-1", is designed as
an extension of "ISO-2022-JP" (RFC 1468) and "ISO-2022-KR" (RFC
1557).
The encoding is ASCII compatible and 7-bit, thus, can be used mixed
with any ASCII compatible encoding. The encoding is designed to be
as stateless as practically possible with ISO 2022. That is, no state
information needs to be preserved between lines.
"ISO-2022-INT-1" and its successors have an aggregated name: "ISO-
2022-INT-*".
Introduction
This memo describes a text encoding scheme: "ISO-2022-INT-1", which
is intended to be a text encoding scheme of the Internet including,
but not limited to, for electronic mail [RFC822] and network news
[RFC1036]. The encoding is also useful in multilingual text files.
The encoding is a multilingual extension of "ISO-2022-JP" [2022JP]
and "ISO-2022-KR" [2022KR]. The encoding is supported by an Emacs
based multilingual text editor: MULE [MULE].
APCCIRN-I18N Expires on Aug 1, 1994 [Page 1]
.
INTERNET DRAFT Internet Multilingual Text Encoding February 1994
The name, "ISO-2022-INT-1", is intended to be used in the "charset"
parameter field of MIME headers (see [MIME1] and [MIME2]).
Description
The text with "ISO-2022-INT-1" starts in ASCII [ASCII] designated to
G0 invoked as GL and KS C 5601 [KSC5601] designated to to G1, and
switches to other character sets of ISO 2022 [ISO2022] through
limited combinations of designation/invocation sequences. All the
characters are encoded with 7 bits only.
At the beginning of text, the existence of an announcer sequence:
"ESC 2/0 4/2" and a designation/invocation sequence: "ESC 2/8 4/2 SI
ESC 2/4 2/9 4/3 ESC 2/10 7/14 ESC 2/11 7/14" are (though omitted)
assumed. The same designation/invocation sequence is also assumed
(though unnecessary and, thus, omitted) at the beginning of each
line. Thus, characters of 94 character sets are designated to G0 or
G1 and invoked as GL by SI (shift in, '0/15') and SO (shift out,
'0/14') each. Characters of 96 character sets are designated to G1
and invoked as GL by SO. To make the encoding almost unique, a
character set is designated only to either G0 or G1 and not both.
For example, the escape sequence "ESC 2/4 4/2" or "ESC $ B" indicates
that the bytes following the escape sequence are Japanese JIS X
0208-1983 characters, which are encoded in two bytes each. A double
byte sequence enclosed by SO and SI indicates a KSC string unless
other character sets are designated to G1. The escape sequence "ESC
2/13 4/1" or "ESC - A" indicates that ISO 8859-1 is designated to G1.
After the designation, a character code '4/1' is interpreted to
represent a character "A with acute".
The following table gives the escape sequences and the character sets
used in "ISO-2022-INT-1" messages. The reg# is the registration
number in ISO's registry [ISOREG].
94 character sets
reg# character set ESC sequence designated to
------------------------------------------------------------------
6 ASCII ESC 2/8 4/2 ESC ( B G0
14 JIS X 0201-Roman ESC 2/8 4/10 ESC ( J G0
94*94 character sets
reg# character set ESC sequence designated to
------------------------------------------------------------------
42 JIS X 0208-1978 ESC 2/4 4/0 ESC $ @ G0
58 GB 2312-80 ESC 2/4 4/1 ESC $ A G0
87 JIS X 0208-1983 ESC 2/4 4/2 ESC $ B G0
149 KS C 5601-1987 ESC 2/4 2/9 4/3 ESC $ ) C G1
APCCIRN-I18N Expires on Aug 1, 1994 [Page 2]
.
INTERNET DRAFT Internet Multilingual Text Encoding February 1994
159 JIS X 0212-1990 ESC 2/4 2/8 4/4 ESC $ ( D G0
171 CNS 11643-1986-1 ESC 2/4 2/8 4/7 ESC $ ( G G0
172 CNS 11643-1986-2 ESC 2/4 2/8 4/8 ESC $ ( H G0
96 character sets
reg# character set ESC sequence designated to
------------------------------------------------------------------
100 ISO8859-1 ESC 2/13 4/1 ESC - A G1
126 ISO8859-7(Greek) ESC 2/13 4/6 ESC - F G1
Handling of code points not specified in each standard is
implementation dependent. For further information about the
character sets and the escape sequences, see [ISO2022] and [ISOREG].
Some Asian standards are also described in chapter 3 and 4 of
[LUNDE].
If there is any G0 designation other than ASCII in text, there must
be a switch back to ASCII before a space character '2/0' (but not
necessarily before '2/0' code of 96 character set, which usually
represent non-breaking space) or control characters such as tab or
CRLF. If there is any G1 designation other than KS C [KSC5601] in
text, there must be a switch back to KS C before the end of line. If
there is any G1 invocation in text, there must be a switch back to G0
invocation before a space character or control characters such as tab
or CRLF. This means that the next line starts in the ASCII character
set that was switched to before the end of the previous line.
Though ISO 2022 [ISO2022] and related standards permits long term,
persistent states, "ISO-2022-INT-1" is designed not to need such
states be preserved between lines. Applications such as pagers and
editors which randomly seek within a text file encoded with "ISO-
2022-INT-1" can assume that the state is same as that of the
beginning of the text.
The text will end in ASCII designated to G0.
Left-to-right directionality is assumed if the text is displayed
horizontally.
Users of "ISO-2022-INT-1" must be aware that some common transport
such as old Bnews can not relay a 7-bit value 7/15 (decimal 127),
which is used to encode, say, "y with diaeresis" of ISO 8859-1.
Other restrictions are given in the Formal Syntax section below.
Formal Syntax
The notational conventions used here are identical to those used in
APCCIRN-I18N Expires on Aug 1, 1994 [Page 3]
.
INTERNET DRAFT Internet Multilingual Text Encoding February 1994
RFC 822 [RFC822].
The * (asterisk) convention is as follows:
l*m something
meaning at least l and at most m somethings, with l and m taking
default values of 0 and infinity, respectively.
text = *(line CRLF)
line = *(single-byte-char /
(*g0-segment reset-desig-seq) /
g1-segment /
g1-desig-seq )
; note: must end KS C
; designated to G1
g0-segment = single-byte-g0-segment /
double-byte-g0-segment
single-byte-g0-segment = single-byte-g0-seq *single-byte-char
double-byte-g0-segment = double-byte-g0-seq *(one-of-94 one-of-94)
g1-segment = single-byte-g1-96-segment /
double-byte-g1-segment
; note: an appropriate segment
; should be selected according
; to the current state of G1
; designation
single-byte-g1-96-segment = SO *one-of-96 SI
double-byte-g1-segment = SO *(one-of-94 one-of-94) SI
reset-desig-seq = ESC "(" "B"
single-byte-g0-seq = ESC "(" ("B" / "J")
double-byte-g0-seq = (ESC "$" ("@" / "A" / "B")) /
(ESC "$" "(" ("D" / "G" / "H")
g1-desig-seq = single-byte-g1-seq / double-byte-g1-seq
single-byte-g1-seq = (ESC "-" ("A" / "F"))
double-byte-g1-seq = ESC "$" ")" "C"
APCCIRN-I18N Expires on Aug 1, 1994 [Page 4]
.
INTERNET DRAFT Internet Multilingual Text Encoding February 1994
CRLF = CR LF
; ( Octal, Decimal.)
ESC = <ISO 2022 ESC, escape> ; ( 33, 27.)
SI = <ISO 2022 SI, shift-in> ; ( 17, 15.)
SO = <ISO 2022 SO, shift-out> ; ( 16, 14.)
CR = <ASCII CR, carriage return>; ( 15, 13.)
LF = <ASCII LF, linefeed> ; ( 12, 10.)
one-of-94 = <any one of 94 values> ; (41-176, 33.-126.)
one-of-96 = <any one of 96 values> ; (40-177, 32.-127.)
7BIT = <any 7-bit value> ; ( 0-177, 0.-127.)
single-byte-char = <any 7BIT, including bare CR & bare LF, but NOT
including CRLF, and not including ESC, SI, SO>
Mail System Considerations
"ISO-2022-INT-1" is designed to be purely 7-bit, so that it can be
used with any transport which conforms to STD 11, RFC822 [RFC822]
without MIME, which is the current practice in Japan to use "ISO-
20220-JP" [2022JP].
If "ISO-2022-INT-1" is used with MIME, MIME charset name may be given
as follows:
Content-Type: text/plain; charset=iso-2022-int-1
Even if charset parameters are omitted, multilingual applications
should, in spite of [MIME1], still assume iso-2022-int-1 or its
latest available successor (see the section "Future Extension Plan"),
not US-ASCII, is used.
The "ISO-2022-INT-1" encoding is already in 7-bit form, so it is not
necessary to use a Content-Transfer-Encoding header. It should be
noted that applying the Base64 or Quoted-Printable encoding will
render the message unreadable in non-MIME-compliant software.
"ISO-2022-INT-1" may also be used in mail headers. If bare STD11,
RFC822 without MIME is used, appropriate quoting of special
characters as "quoted string" might be necessary with structured
APCCIRN-I18N Expires on Aug 1, 1994 [Page 5]
.
INTERNET DRAFT Internet Multilingual Text Encoding February 1994
headers, which might not be supported in all the common environment.
In MIME headers, Both "B" and "Q" encoding could be useful with
"ISO-2022-INT-1" text.
Future Extension Plan
Future extensions of "ISO-2022-INT-1" will be named "ISO-2022-INT-2",
"ISO-2022-INT-3" and so on. MIME charset names beginning with "ISO-
2022-INT-" are reserved for them. The family of encoding has an
aggregated name: "ISO-2022-INT-*".
The extensions will be solely by adding extra character sets of ISO
2022, though other extensions such as for bidirectionality support
are possible. To avoid duplicated assignment of escape sequences,
formal ISO registry [ISOREG] will be required.
The current feature of an initial designation of KS C 5601 to G1 will
be removed in the versions of near future. Users of ISO-2022-INT-1
are recommended to explicitly designate KS C 5601 to G1.
To minimize the number of character sets, those which is already
covered by the larger character sets and not so widely used should
not be added. For example, Katakana character set of "JIS X 0201-
Kana" is omitted because the set is completely covered by "JIS X
0208-1978" and not used at all in the Internet community of Japan.
In any event, the property of "ISO-2022-INT-1" that:
Though ISO 2022 [ISO2022] and related standards permits long term,
persistent states, "ISO-2022-INT-1" is designed not to need such
states be preserved between lines. Applications such as pagers
and editors which randomly seek within a text file encoded with
"ISO-2022-INT-1" can assume that the state is same as that of the
beginning of the text.
will be preserved.
References
[ASCII] American National Standards Institute, "Coded character set
-- 7-bit American national standard code for information
interchange", ANSI X3.4-1986.
[ISO2022] International Organization for Standardization (ISO),
"Information processing -- ISO 7-bit and 8-bit coded
character sets -- Code extension techniques", International
Standard, Ref. No. ISO 2022-1986 (E).
APCCIRN-I18N Expires on Aug 1, 1994 [Page 6]
.
INTERNET DRAFT Internet Multilingual Text Encoding February 1994
[ISOREG] International Organization for Standardization (ISO),
"International Register of Coded Character Sets To Be Used
With Escape Sequences".
[MIME1] Borenstein, N., and Freed, N., "MIME (Multipurpose Internet
Mail Extensions) Part One: Mechanisms for Specifying and
Describing the Format of Internet Message Bodies", RFC 1521,
September 1993.
[MIME2] Moore, K., "MIME (Multipurpose Internet Mail Extensions) Part
Two: Message Header Extensions for Non-ASCII Text", RFC 1522,
September 1993.
[RFC822] Crocker, D., "Standard for the Format of ARPA Internet Text
Messages", STD 11, RFC 822, August 1982.
[RFC1036] Horton M., and Adams, R., "Standard for Interchange of
USENET Messages", RFC 1036, AT&T Bell Laboratories, Center
for Seismic Studies, December 1987.
[2022JP] Murai, J., Crispin, M., and E. van der Poel, "Japanese
Character Encoding for Internet Messages", RFC 1468, June
1993.
[2022JP2] Ohta, M., and Handa, K. "ISO-2022-JP-2: Multilingual
Extension of ISO-2022-JP", RFC 1554, December 1993.
[2022KR] U. Choi, K., and Chon, H. Park, "Korean Character Encoding
for Internet Messages", RFC 1557, December 1993.
[KSC5601] Korea Industrial Standards Association, "Code for
Information Interchange (Hangul and Hanja)," Korean
Industrial Standard, 1987, Ref. No. KS C 5601-1987.
[MULE] Nishikimi, M., Handa, K., and Tomura, S., "Mule: MULtilingual
Enhancement to GNU Emacs", Proc. of INET'93, August, 1993.
[LUNDE] Lunde, K., "Understanding Japanese Information Processing,",
O'Reilly & Associates, Inc., ISBN 1-56592-043-0, 1993.
Acknowledgements
(to be supplied)
Security Considerations
Security issues are not discussed in this memo.
APCCIRN-I18N Expires on Aug 1, 1994 [Page 7]
.
INTERNET DRAFT Internet Multilingual Text Encoding February 1994
Authors' Addresses
(to be supplied)
APCCIRN-I18N Expires on Aug 1, 1994 [Page 8]
From apccirn-sec Tue Feb 1 13:12:09 1994
Received: from necom830.cc.titech.ac.jp by nic.nm.kr (8.6.4/8.6.4)
id NAA06212; Tue, 1 Feb 1994 13:11:08 +0900
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Tue, 1 Feb 94 13:01:34 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9402010401.AA12115@necom830.cc.titech.ac.jp>
Subject: Re: ISO-2022-INT-1
To: apccirn-i18n@nic.nm.kr, jp-msg@iij.ad.jp
Date: Tue, 1 Feb 94 13:01:32 JST
In-Reply-To: <no.id>; from "mohta" at Dec 24, 93 5:55 pm
X-Mailer: ELM [version 2.3 PL11]
I'm going to post the finished draft (the slim one) this afternoon.
Any objections or corrections?
Masataka Ohta
------------------------------------------------------------------------
INTERNET DRAFT APCCIRN-I18N
draft-ohta-text-encoding-00.txt February 1994
Internet Multilingual Text Encoding: ISO-2022-INT-*
Status of this Memo
This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas,
and its working groups. Note that other groups may also distribute
working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months. Internet-Drafts may be updated, replaced, or obsoleted by
other documents at any time. It is not appropriate to use Internet-
Drafts as reference material or to cite them other than as a
``working draft'' or ``work in progress.''
To learn the current status of any Internet-Draft, please check the
1id-abstracts.txt listing contained in the Internet-Drafts Shadow
Directories on ds.internic.net, nic.nordu.net, ftp.nisc.sri.com, or
munnari.oz.au.
Abstract
APCCIRN internationalization group has, based on the experience with
"ISO-2022-JP-2" (RFC 1554), designed a multilingual text encoding
scheme, "ISO-2022-INT-1", as an extension of "ISO-2022-JP" (RFC 1468)
and "ISO-2022-KR" (RFC 1557).
The encoding is ASCII compatible and 7-bit, thus, can be used mixed
with any ASCII compatible encoding. The encoding is designed to be
as stateless as practically possible with ISO 2022. That is, no state
information needs to be preserved between lines.
"ISO-2022-INT-1" and its successors have an aggregated name: "ISO-
2022-INT-*".
Introduction
This memo describes a text encoding scheme: "ISO-2022-INT-1", which
is intended to be a multilingual text encoding scheme of the Internet
including, but not limited to, for electronic mail [RFC822] and
network news [RFC1036]. The encoding is also useful in multilingual
text files. The encoding is a multilingual extension of "ISO-2022-
JP" [2022JP] and "ISO-2022-KR" [2022KR]. The encoding is supported
by an Emacs based multilingual text editor: MULE [MULE].
APCCIRN-I18N Expires on Aug 4, 1994 [Page 1]
.
INTERNET DRAFT Internet Multilingual Text Encoding February 1994
The name, "ISO-2022-INT-1", is intended to be used in the "charset"
parameter field of MIME headers (see [MIME1] and [MIME2]).
Description
The text with "ISO-2022-INT-1" starts in ASCII [ASCII] designated to
G0 invoked as GL and KS C 5601 [KSC5601] designated to to G1, and
switches to other character sets of ISO 2022 [ISO2022] through
limited combinations of designation/invocation sequences. All the
characters are encoded with 7 bits only.
At the beginning of text, the existence of an announcer sequence:
"ESC 2/0 4/2" and a designation/invocation sequence: "ESC 2/8 4/2 SI
ESC 2/4 2/9 4/3 ESC 2/10 7/14 ESC 2/11 7/14" are (though omitted)
assumed. The same designation/invocation sequence is also assumed
(though unnecessary and, thus, omitted) at the beginning of each
line. Thus, characters of 94 character sets are designated to G0 or
G1 and invoked as GL by SI (shift in, "0/15") and SO (shift out,
"0/14") each. Characters of 96 character sets are designated to G1
and invoked as GL by SO. To make the encoding almost unique, a
character set is designated only to either G0 or G1 and not to both.
For example, the escape sequence "ESC 2/4 4/2" or "ESC $ B" indicates
that the bytes following the escape sequence are Japanese JIS X
0208-1983 characters, which are encoded in two bytes each. A double
byte sequence enclosed by SO and SI indicates a KS C 5601 [KSC5601]
string unless other character sets are designated to G1. The escape
sequence "ESC 2/13 4/1" or "ESC - A" indicates that ISO 8859-1 is
designated to G1. After the designation, a character code "4/1" is
interpreted to represent a character "A with acute", not ASCII "A".
The following table gives the escape sequences and the character sets
used in "ISO-2022-INT-1" messages. The reg# is the registration
number in ISO's registry [ISOREG].
94 character sets
reg# character set ESC sequence designated to
------------------------------------------------------------------
6 ASCII ESC 2/8 4/2 ESC ( B G0
14 JIS X 0201-Roman ESC 2/8 4/10 ESC ( J G0
94*94 character sets
reg# character set ESC sequence designated to
------------------------------------------------------------------
42 JIS X 0208-1978 ESC 2/4 4/0 ESC $ @ G0
58 GB 2312-80 ESC 2/4 4/1 ESC $ A G0
87 JIS X 0208-1983 ESC 2/4 4/2 ESC $ B G0
149 KS C 5601-1987 ESC 2/4 2/9 4/3 ESC $ ) C G1
APCCIRN-I18N Expires on Aug 4, 1994 [Page 2]
.
INTERNET DRAFT Internet Multilingual Text Encoding February 1994
159 JIS X 0212-1990 ESC 2/4 2/8 4/4 ESC $ ( D G0
171 CNS 11643-1986-1 ESC 2/4 2/8 4/7 ESC $ ( G G0
172 CNS 11643-1986-2 ESC 2/4 2/8 4/8 ESC $ ( H G0
96 character sets
reg# character set ESC sequence designated to
------------------------------------------------------------------
100 ISO8859-1 ESC 2/13 4/1 ESC - A G1
126 ISO8859-7(Greek) ESC 2/13 4/6 ESC - F G1
Handling of code points not specified in each standard is
implementation dependent. For further information about the
character sets and the escape sequences, see [ISO2022] and [ISOREG].
Some Asian standards are also described in chapter 3 and 4 of
[LUNDE].
If there is any G0 designation other than ASCII in text, there must
be a switch back to ASCII before a space character "2/0" (but not
necessarily before "2/0" code of 96 character set, which usually
represent non-breaking space) or control characters such as tab or
CRLF. If there is any G1 designation other than KS C [KSC5601] in
text, there must be a switch back to KS C before the end of line. If
there is any G1 invocation in text, there must be a switch back to G0
invocation before a space character or control characters such as tab
or CRLF. This means that the next line starts in the ASCII character
set that was switched to before the end of the previous line.
Though ISO 2022 [ISO2022] and related standards permits long term,
persistent states, "ISO-2022-INT-1" is designed not to need such
states be preserved between lines. Applications such as pagers and
editors which randomly seek within a text file encoded with "ISO-
2022-INT-1" can assume that the state is same as that of the
beginning of the text.
The text will end in ASCII designated to G0.
Left-to-right directionality is assumed if the text is displayed
horizontally.
Users of "ISO-2022-INT-1" must be aware that some common transport
such as old Bnews in Japan can not relay a 7-bit value "7/15"
(decimal 127), which is used to encode, say, "y with diaeresis" of
ISO 8859-1.
Other restrictions are given in the Formal Syntax section below.
Formal Syntax
APCCIRN-I18N Expires on Aug 4, 1994 [Page 3]
.
INTERNET DRAFT Internet Multilingual Text Encoding February 1994
The notational conventions used here are identical to those used in
STD11, RFC 822 [RFC822].
The * (asterisk) convention is as follows:
l*m something
meaning at least l and at most m somethings, with l and m taking
default values of 0 and infinity, respectively.
text = *(line CRLF)
line = *(single-byte-char /
(*g0-segment reset-desig-seq) /
g1-segment /
g1-desig-seq )
; note: must end KS C
; designated to G1
g0-segment = single-byte-g0-segment /
double-byte-g0-segment
single-byte-g0-segment = single-byte-g0-seq *single-byte-char
double-byte-g0-segment = double-byte-g0-seq *(one-of-94 one-of-94)
g1-segment = single-byte-g1-96-segment /
double-byte-g1-segment
; note: an appropriate segment
; should be selected according
; to the current state of G1
; designation
single-byte-g1-96-segment = SO *one-of-96 SI
double-byte-g1-segment = SO *(one-of-94 one-of-94) SI
reset-desig-seq = ESC "(" "B"
single-byte-g0-seq = ESC "(" ("B" / "J")
double-byte-g0-seq = (ESC "$" ("@" / "A" / "B")) /
(ESC "$" "(" ("D" / "G" / "H")
g1-desig-seq = single-byte-g1-seq / double-byte-g1-seq
single-byte-g1-seq = (ESC "-" ("A" / "F"))
APCCIRN-I18N Expires on Aug 4, 1994 [Page 4]
.
INTERNET DRAFT Internet Multilingual Text Encoding February 1994
double-byte-g1-seq = ESC "$" ")" "C"
CRLF = CR LF
; ( Octal, Decimal.)
ESC = <ISO 2022 ESC, escape> ; ( 33, 27.)
SI = <ISO 2022 SI, shift-in> ; ( 17, 15.)
SO = <ISO 2022 SO, shift-out> ; ( 16, 14.)
CR = <ASCII CR, carriage return>; ( 15, 13.)
LF = <ASCII LF, linefeed> ; ( 12, 10.)
one-of-94 = <any one of 94 values> ; (41-176, 33.-126.)
one-of-96 = <any one of 96 values> ; (40-177, 32.-127.)
7BIT = <any 7-bit value> ; ( 0-177, 0.-127.)
single-byte-char = <any 7BIT, including bare CR & bare LF, but NOT
including CRLF, and not including ESC, SI, SO>
Mail System Considerations
"ISO-2022-INT-1" is designed to be purely 7-bit, so that it can be
used with any transport which conforms to STD 11, RFC822 [RFC822]
without MIME, which is the current practice in Japan to use "ISO-
20220-JP" [2022JP].
If "ISO-2022-INT-1" is used with MIME, MIME charset name may be given
as follows:
Content-Type: text/plain; charset=iso-2022-int-1
Even if charset parameters are omitted, multilingual applications
should still assume "ISO-2022-INT-1" or its latest available
successor (see the section "Future Extension Plan"), not US-ASCII of
MIME default, is used.
The "ISO-2022-INT-1" encoding is already in 7-bit form, so it is not
necessary to use a Content-Transfer-Encoding header. It should be
noted that applying the Base64 or Quoted-Printable encoding will
render the message unreadable in non-MIME-compliant software.
"ISO-2022-INT-1" may also be used in mail headers. If bare STD11,
APCCIRN-I18N Expires on Aug 4, 1994 [Page 5]
.
INTERNET DRAFT Internet Multilingual Text Encoding February 1994
RFC822 without MIME is used, appropriate quoting of special
characters as "quoted string" might be necessary with structured
headers, which might not be supported in all the common environment.
In MIME headers, Both "B" and "Q" encoding could be useful with
"ISO-2022-INT-1" text.
Future Extension Plan
Future extensions of "ISO-2022-INT-1" will be named "ISO-2022-INT-2",
"ISO-2022-INT-3" and so on. MIME charset names beginning with "ISO-
2022-INT-" are reserved for them. The family of encoding has an
aggregated name: "ISO-2022-INT-*".
The extensions will be solely by adding extra character sets of ISO
2022, though other extensions such as for bidirectionality support
are possible. To avoid duplicated assignment of escape sequences,
formal ISO registry [ISOREG] will, in general, be required, which
does not deny the future possibility of IANA registration of escape
sequences for private use purposes.
The current feature of an initial designation of KS C 5601 to G1 will
be removed in the versions of near future. Users of ISO-2022-INT-1
are recommended to explicitly designate KS C 5601 to G1.
To minimize the number of character sets, those which is already
covered by the larger character sets and not so widely used should
not be added. For example, Katakana character set of "JIS X 0201-
Kana" is omitted because the set is completely covered by "JIS X
0208-1978" and not used at all in the Internet community of Japan.
In any event, the property of "ISO-2022-INT-1" that:
Though ISO 2022 [ISO2022] and related standards permits long term,
persistent states, "ISO-2022-INT-1" is designed not to need such
states be preserved between lines. Applications such as pagers
and editors which randomly seek within a text file encoded with
"ISO-2022-INT-1" can assume that the state is same as that of the
beginning of the text.
will be preserved.
References
[2022JP] Murai, J., Crispin, M., and E. van der Poel, "Japanese
Character Encoding for Internet Messages", RFC 1468, June
1993.
[2022JP2] Ohta, M., and Handa, K. "ISO-2022-JP-2: Multilingual
APCCIRN-I18N Expires on Aug 4, 1994 [Page 6]
.
INTERNET DRAFT Internet Multilingual Text Encoding February 1994
Extension of ISO-2022-JP", RFC 1554, December 1993.
[2022KR] U. Choi, K., and Chon, H. Park, "Korean Character Encoding
for Internet Messages", RFC 1557, December 1993.
[ASCII] American National Standards Institute, "Coded character set
-- 7-bit American national standard code for information
interchange", ANSI X3.4-1986.
[ISO2022] International Organization for Standardization (ISO),
"Information processing -- ISO 7-bit and 8-bit coded
character sets -- Code extension techniques", International
Standard, Ref. No. ISO 2022-1986 (E).
[ISOREG] International Organization for Standardization (ISO),
"International Register of Coded Character Sets To Be Used
With Escape Sequences".
[KSC5601] Korea Industrial Standards Association, "Code for
Information Interchange (Hangul and Hanja)," Korean
Industrial Standard, 1987, Ref. No. KS C 5601-1987.
[LUNDE] Lunde, K., "Understanding Japanese Information Processing,",
O'Reilly & Associates, Inc., ISBN 1-56592-043-0, 1993.
[MIME1] Borenstein, N., and Freed, N., "MIME (Multipurpose Internet
Mail Extensions) Part One: Mechanisms for Specifying and
Describing the Format of Internet Message Bodies", RFC 1521,
September 1993.
[MIME2] Moore, K., "MIME (Multipurpose Internet Mail Extensions) Part
Two: Message Header Extensions for Non-ASCII Text", RFC 1522,
September 1993.
[MULE] Nishikimi, M., Handa, K., and Tomura, S., "Mule: MULtilingual
Enhancement to GNU Emacs", Proc. of INET'93, August, 1993.
[RFC822] Crocker, D., "Standard for the Format of ARPA Internet Text
Messages", STD 11, RFC 822, August 1982.
[RFC1036] Horton M., and Adams, R., "Standard for Interchange of
USENET Messages", RFC 1036, AT&T Bell Laboratories, Center
for Seismic Studies, December 1987.
Acknowledgements
This memo is the product of APCCIRN (Asian Pacific CCIRN)
Internationalization group and reviewed by various people in a news
APCCIRN-I18N Expires on Aug 4, 1994 [Page 7]
.
INTERNET DRAFT Internet Multilingual Text Encoding February 1994
group: fj.kanji and by a mailing list: jp-msg@iij.ad.jp. Many people
have contributed. In particular, Prof. Eiichi Wada of Tokyo
University and Ken Lunde of Adobe Systems, Inc. has helped us based
on profound knowledge in ISO 2022 and related standards. Uhhyung
Choi of Korea Advanced Institute of Science and Technology has
contributed to make the encoding upper compatible to ISO-2022-KR.
Prof. Kilnam Chon of Korea Advanced Institute of Science and
Technology and Prof. Jun Mirai of Keio University have provided the
framework of international cooperation. The Authors wish to thank
all the people who have helped to provide the memo.
Security Considerations
Security issues are not discussed in this memo.
Authors' Addresses
Masataka Ohta
Tokyo Institute of Technology
2-12-1, O-okayama, Meguro-ku,
Tokyo 152, JAPAN
Phone: +81-3-5499-7084
Fax: +81-3-3729-1940
EMail: mohta@cc.titech.ac.jp
Ken'ichi Handa
Electrotechnical Laboratory
Umezono 1-1-4, Tsukuba,
Ibaraki 305, JAPAN
Phone: +81-298-58-5916
Fax: +81-298-58-5918
EMail: handa@etl.go.jp
APCCIRN-I18N Expires on Aug 4, 1994 [Page 8]
.
From apccirn-sec Tue Feb 1 22:55:23 1994
Received: from necom830.cc.titech.ac.jp by nic.nm.kr (8.6.4/8.6.4)
id WAA08087; Tue, 1 Feb 1994 22:54:52 +0900
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Tue, 1 Feb 94 22:45:40 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9402011345.AA14454@necom830.cc.titech.ac.jp>
Subject: Instructions to RFC translators (resend)
To: apccirn-i18n@nic.nm.kr
Date: Tue, 1 Feb 94 22:45:38 JST
In-Reply-To: <no.id>; from "mohta" at Feb 1, 94 10:37 pm
X-Mailer: ELM [version 2.3 PL11]
Sorry if I have post a garbage.
I think the following Internet Draft should be important for
internationalization.
Any comments?
Masataka Ohta
------------------------------------------------------------------------
INTERNET DRAFT M. Ohta
draft-ohta-translation-instr-00.txt Tokyo Institute of Technology
January 1994
Instructions to RFC Translators
Status of this Memo
This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas,
and its working groups. Note that other groups may also distribute
working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months. Internet-Drafts may be updated, replaced, or obsoleted by
other documents at any time. It is not appropriate to use Internet-
Drafts as reference material or to cite them other than as a
``working draft'' or ``work in progress.''
To learn the current status of any Internet-Draft, please check the
1id-abstracts.txt listing contained in the Internet-Drafts Shadow
Directories on ds.internic.net, nic.nordu.net, ftp.nisc.sri.com, or
munnari.oz.au.
Abstract
A framework is given to coordinate the worldwide effort of RFC
translation into various languages.
Translated RFCs will be encoded in 7bit ISO 2022 and will have a name
"TRFC NNNN-LLL-MM" where "NNNN" is the RFC number of the original
RFC, "LLL" is the language code of ISO 639 and "MM" is the sequence
number to identify different translations.
Formatting rules similar to ASCII RFCs are also described.
M. Ohta Expires on August 4, 1994 [Page 1]
.
INTERNET DRAFT Instructions to RFC Translators January 1994
Index
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Editorial Policy . . . . . . . . . . . . . . . . . . . . . . 4
3. Format Rules . . . . . . . . . . . . . . . . . . . . . . . . 4
3a. Plain Text Format Rules . . . . . . . . . . . . . . . . . . 5
3b. PostScript Format Rules . . . . . . . . . . . . . . . . . . 5
4. Headers and Footers . . . . . . . . . . . . . . . . . . . . 6
4a. First Page . . . . . . . . . . . . . . . . . . . . . . . . 6
4b. Running Headers . . . . . . . . . . . . . . . . . . . . . . 8
4c. Running Footers . . . . . . . . . . . . . . . . . . . . . . 8
5. Status Section . . . . . . . . . . . . . . . . . . . . . . . 8
6. Translation History Section . . . . . . . . . . . . . . . . 9
7. Contact . . . . . . . . . . . . . . . . . . . . . . . . . . 9
8. RFC Index . . . . . . . . . . . . . . . . . . . . . . . . . 9
9. Copyright Considerations . . . . . . . . . . . . . . . . . . 10
10. Security Considerations . . . . . . . . . . . . . . . . . . 10
11. References . . . . . . . . . . . . . . . . . . . . . . . . . 10
12. Author's Address . . . . . . . . . . . . . . . . . . . . . . 10
M. Ohta Expires on August 4, 1994 [Page 2]
.
INTERNET DRAFT Instructions to RFC Translators January 1994
1. Introduction
This memo provides information about the translation of RFCs, and
certain policies relating to the publication of the translated RFCs
(TRFCs). This memo gives the minimal framework to coordinate the
world wide translation efforts.
Memos translated from the existing RFCs or TRFCs may be submitted as
TRFCs by anyone to the RFC Editor.
TRFCs are distributed online by being stored as public access files.
The online files are copied by the interested people and printed or
displayed at their site on their equipment. This means that the
format of the online files must meet the constraints of a wide
variety of printing and display equipment. (TRFCs may also be
returned via e-mail in response to an e-mail query, or TRFCs may be
found using information and database searching tools such as Gopher,
Wais, WWW, or Mosaic.)
TRFCs are published in plain text encoded with ISO-2022-INT-*
[2022INT]. ISO-2022-INT-* is chosen because 1) it is based on ISO
2022, an internationally widely available standard, 2) it is 7 bit
and can safely be transferred by SMTP and FTP ASCII mode and 3) it
is, in itself, multilingual and, thus, no designation or negotiation
to use other encoding system is necessary.
TRFCs in PostScript are encoded with ASCII.
In any event, TRFCs are secondary or alternative versions and the
original ASCII RFC is the primary version for reference purposes.
It is unlikely that, for each language in the world, the RFC Editor
hires professional translators who also have engineering knowledge of
the Internet to check the quality of translation. Moreover, French
translations may be provided by France, Belgium, Canada, New
Caledonia or even Japan, from which, it is politically difficult to
choose the best one. Thus, all such translations are treated equally
regardless of the quality of the translation and serial numbers are
assigned to them. The quality could be as bad as that of machine
translation. Or it may have even better quality than the original
RFC, if it is written in the native language of the author of the
original RFC. In any event, the primary version is the untranslated
ASCII one and those who need the authoritative information should not
depend on TRFCs.
Multiple versions are also necessary to accommodate the versions of
improved translation quality. That is, while improved RFCs will have
M. Ohta Expires on August 4, 1994 [Page 3]
.
INTERNET DRAFT Instructions to RFC Translators January 1994
new RFC numbers, both the original and the improved translation must
share the same RFC number of the original ASCII one.
The TRFCs will have a name "TRFC NNNN-LLL-MM" where "NNNN" is the RFC
number of the original RFC, "LLL" is the language code of ISO 639
[ISO639] (ISO639 is not the two letter country code of ISO3166, of
course) and "MM" is the sequence number assigned by the RFC Editor to
identify different versions of translations. For example, the first
Japanese translation of RFC1543 will be named "TRFC 1543-JA-1".
TRFCs will generally have file names of "trfcNNNN-LLL-MM.txt" (plain
text) or "trfcNNNN-LLL-MM.ps" (PostScript).
2. Editorial Policy
TRFCs are reviewed by the RFC Editor and possibly by other reviewers
he selects.
Usually, the review is only on the formalities described in this memo
and no further check will be done as to the quality of translation.
The result of the review may be to suggest to the author some
improvements to the document before publication.
In some cases it may be determined that the submitted document is not
appropriate material to be published as a TRFC.
The RFC Editor may make minor changes to the document, especially in
the areas of style and format, but on some occasions also to the
text. Sometimes the RFC Editor will undertake to make more
significant changes, especially when the format rules (see below) are
not followed. However, more often the memo will be returned to the
author for the additional work.
Due to various time pressures on the RFC Editorial staff the time
elapsed between submission and publication can vary greatly. It is
always acceptable to query (ping) the RFC Editor about the status of
a TRFC during this time (but not more than once a week). The two
weeks preceding an IETF meeting are generally very busy, so TRFCs
submitted shortly before an IETF meeting are most likely to be
published after the meeting.
3. Format Rules
To meet the distribution constraints, the following rules established
for the two allowed formats for TRFCs: plain text and PostScript.
The RFC Editor attempts to ensure a consistent RFC style. It is much
M. Ohta Expires on August 4, 1994 [Page 4]
.
INTERNET DRAFT Instructions to RFC Translators January 1994
easier to do this if the submission matches the style of the most
recent RFCs and TRFCs. Please do look at some recent RFCs and TRFCs
and prepare yours in the same style.
You must submit an editable online document to the RFC Editor. The
RFC Editor may require minor changes in format or style and will
insert the actual sequence number.
3a. Plain Text Format Rules
The character code is ISO-2022-INT-* [2022INT].
If printed on paper by common printers, standard page size,
excluding margins, is 7.2 by 10 inches.
Each page must be followed by a form feed on a line by itself.
Each line must be followed by carriage return and line feed.
No overstriking (or underlining) is allowed, unless the language
used needs some special characters represented only by
overstriking (current draft of ISO-2022-INT-* does not contain any
such characters).
These "height" and "width" advices include any headers, footers,
page numbers, or left side indenting.
Use single spaced text within a paragraph, and one blank line
between paragraphs.
TRFCs in plain text Format must be submitted to the RFC Editor in
e-mail messages (or as online files) in the finished publication
format.
3b. PostScript Format Rules
Standard page size is 8 1/2 by 11 inches.
Margin of 1 inch on all sides (top, bottom, left, and right).
ASCII characters in main text should have a point size of no less
than 10 points with a line spacing of 12 points.
ASCII characters in footnotes and graph notations no smaller than
8 points with a line spacing of 9.6 points.
Three fonts are acceptable: Helvetica, Times Roman, and Courier.
Plus their bold-face and italic versions. These are the three
M. Ohta Expires on August 4, 1994 [Page 5]
.
INTERNET DRAFT Instructions to RFC Translators January 1994
standard fonts on most PostScript printers. Shape information of
other fonts must be included explicitly within the PostScript
text.
Prepare diagrams and images based on lowest common denominator
PostScript. Consider common PostScript printer functionality and
memory requirements.
The following PostScript commands should not be used:
initgraphics, erasepage, copypage, grestoreall, initmatrix,
initclip, banddevice, framedevice, nulldevice and renderbands.
These PostScript rules are likely to changed and expanded as
experience is gained.
TRFCs in PostScript Format may be submitted to the RFC Editor in
e-mail messages (or as online files). If you plan to submit a
document in PostScript please consult the RFC Editor first.
4. Headers and Footers
There is the first page heading, the running headers, and the running
footers.
All headers and footers (except for the translated title) must be
coded with ASCII.
4a. First Page
On the first page there is no running header. The top of the
first page has the following items:
Network Working Group
The traditional heading for the group that founded the RFC
series. This appears on the first line on the left hand side
of the heading.
Request for Comments: NNNN-LLL-MM
Identifies this as a request for comments and specifies the
number. Indicated on the second line on the left side. The
actual value of "MM" is filled in at the last moment before
publication by the RFC Editor.
Author
The author's name (first initial and last name only) indicated
M. Ohta Expires on August 4, 1994 [Page 6]
.
INTERNET DRAFT Instructions to RFC Translators January 1994
on the first line on the right side of the heading.
Organization
The author's organization, indicated on the second line on the
right side.
Translator
The translator's name (first initial and last name only)
preceded by a phrase: "Translated by" indicated on the third
line on the right side of the heading.
Organization
The translator's organization, indicated on the fourth line on
the right side.
Date
This is the Month and Year of the original RFC Publication
followed by the parenthesized publication date of the
translated version. For example:
January 1994 (translated on February 1994)
Indicated on the fifth line on the right side.
Updates or Obsoletes
If the original RFC Updates or Obsoletes another RFC, this is
indicated as third line on the left side of the heading.
Category
The category header of the TRFC is always Informational. This
is indicated on the third (if there is no Updates or Obsoletes
indication) or fourth line of the left side.
Original Category
The category of the original RFC, one of: Standards Track,
Informational, or Experimental. This is indicated on the
fourth (if there is no Updates or Obsoletes indication) or
fifth line of the left side.
Title
M. Ohta Expires on August 4, 1994 [Page 7]
.
INTERNET DRAFT Instructions to RFC Translators January 1994
The title appears, centered, below the rest of the heading.
Translated Title
Translated title with non ASCII characters may follow the
original title.
If there are multiple authors or translators and if the multiple
authors or translators are from multiple organizations the right
side heading may have additional lines to accommodate them and to
associate the authors and translators with the organizations
properly.
4b. Running Headers
The running header in one line (on page 2 and all subsequent
pages) has the TRFC name on the left (RFC NNNN-LLL-MM), the
(possibly a shortened form) ASCII title centered, and the original
date (Month Year) on the right.
4c. Running Footers
The running footer in one line (on all pages) has the author's
last name on the left and the page number on the right ([Page N]).
5. Status Section
Each TRFC must include on its first page the "Status of this Memo"
section which contains a paragraph describing the type of the TRFC
first in English with ASCII. Then the status section must be
repeated in the language the RFC is translated into.
The content of this section will be one of the three following
statements.
Standards Track
"This memo provides information for the Internet community. This
memo does not specify an Internet standard of any kind. This memo
is a translation of RFC-NNNN. The quality of the translation is,
by no means, assured. Use at your own risk. The original
document: RFC-NNNN specifies an Internet standards track protocol
for the Internet community, and requests discussion and
suggestions for improvements. Please refer to the current edition
of the "Internet Official Protocol Standards" (STD 1) for the
standardization state and status of this protocol. Distribution
of this memo is unlimited. Modification of this memo to improve
the quality of translation and the distribution of the modified
M. Ohta Expires on August 4, 1994 [Page 8]
.
INTERNET DRAFT Instructions to RFC Translators January 1994
result through the RFC Editor is also unlimited."
Experimental
"This memo provides information for the Internet community. This
memo does not specify an Internet standard of any kind. This memo
is a translation of RFC-NNNN. The quality of the translation is,
by no means, assured. Use at your own risk. The original memo:
RFC-NNNN defines an Experimental Protocol for the Internet
community. This memo does not specify an Internet standard of any
kind. Discussion and suggestions for improvement are requested.
Distribution of this memo is unlimited. Modification of this memo
to improve the quality of translation and the distribution of the
modified result through the RFC Editor is also unlimited."
Informational
"This memo provides information for the Internet community. This
memo does not specify an Internet standard of any kind. This memo
is a translation of RFC-NNNN. The quality of the translation is,
by no means, assured. Use at your own risk. The original memo:
RFC-NNNN provides information for the Internet community. This
memo does not specify an Internet standard of any kind.
Distribution of this memo is unlimited. Modification of this memo
to improve the quality of translation and the distribution of the
modified result through the RFC Editor is also unlimited."
6. Translation History Section
Each TRFC must have at the very end a section giving the brief
history of the translation and the translator's address, including
the name and postal address, the telephone number, (optional: a FAX
number) and the Internet e-mail address.
The section must be written in English and coded with ASCII.
7. Contact
To contact the RFC Editor send an email message to
"RFC-Editor@ISI.EDU".
8. RFC Index
Several organizations maintain TRFC Index files, generally using the
file name "rfc-index-LLL.txt". The contents of such a file copied
from one site may not be identical to that copied from another site.
M. Ohta Expires on August 4, 1994 [Page 9]
.
INTERNET DRAFT Instructions to RFC Translators January 1994
9. Copyright Considerations
This memo does not address the issue on how the permission for the
translation can be obtained from the copyright holders of the
original RFCs, except that the translation and the redistribution
after the translation of this memo is unlimited.
10. Security Considerations
This memo raises no security issues.
11. References
[2022INT]
(to be pulished as an Internet Draft with file name of
"draft-ohta-text-encoding-nn.txt", RFC 1554 shows
rough sketch on how will it be)
[ISO639]
International Organization for Standardization (ISO),
"Code for the representation of names of languages",
International Standard, Ref. No. ISO 639:1988 (E/F)
[RFCAUTH]
Postel, J., "Instructions to RFC Authors", RFC 1543,
October 1993.
12. Author's Address
Masataka Ohta
Tokyo Institute of Technology
2-12-1, O-okayama, Meguro-ku,
Tokyo 152, JAPAN
Phone: +81-3-5499-7084
Fax: +81-3-3729-1940
EMail: mohta@cc.titech.ac.jp
M. Ohta Expires on August 4, 1994 [Page 10]
.
From apccirn-sec Tue Feb 1 22:47:04 1994
Received: from necom830.cc.titech.ac.jp by nic.nm.kr (8.6.4/8.6.4)
id WAA08077; Tue, 1 Feb 1994 22:46:36 +0900
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Tue, 1 Feb 94 22:37:17 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9402011337.AA14379@necom830.cc.titech.ac.jp>
Subject: Instructions to RFC translators
To: apccirn-i18n@nic.nm.kr
Date: Tue, 1 Feb 94 22:37:15 JST
In-Reply-To: <no.id>; from "mohta" at Feb 1, 94 1:01 pm
X-Mailer: ELM [version 2.3 PL11]
>
> I'm going to post the finished draft (the slim one) this afternoon.
>
> Any objections or corrections?
>
> Masataka Ohta
> ------------------------------------------------------------------------
>
>
>
>
>
> INTERNET DRAFT APCCIRN-I18N
> draft-ohta-text-encoding-00.txt February 1994
>
>
> Internet Multilingual Text Encoding: ISO-2022-INT-*
>
> Status of this Memo
>
> This document is an Internet-Draft. Internet-Drafts are working
> documents of the Internet Engineering Task Force (IETF), its areas,
> and its working groups. Note that other groups may also distribute
> working documents as Internet-Drafts.
>
> Internet-Drafts are draft documents valid for a maximum of six
> months. Internet-Drafts may be updated, replaced, or obsoleted by
> other documents at any time. It is not appropriate to use Internet-
> Drafts as reference material or to cite them other than as a
> ``working draft'' or ``work in progress.''
>
> To learn the current status of any Internet-Draft, please check the
> 1id-abstracts.txt listing contained in the Internet-Drafts Shadow
> Directories on ds.internic.net, nic.nordu.net, ftp.nisc.sri.com, or
> munnari.oz.au.
>
> Abstract
>
> APCCIRN internationalization group has, based on the experience with
> "ISO-2022-JP-2" (RFC 1554), designed a multilingual text encoding
> scheme, "ISO-2022-INT-1", as an extension of "ISO-2022-JP" (RFC 1468)
> and "ISO-2022-KR" (RFC 1557).
>
> The encoding is ASCII compatible and 7-bit, thus, can be used mixed
> with any ASCII compatible encoding. The encoding is designed to be
> as stateless as practically possible with ISO 2022. That is, no state
> information needs to be preserved between lines.
>
> "ISO-2022-INT-1" and its successors have an aggregated name: "ISO-
> 2022-INT-*".
>
> Introduction
>
> This memo describes a text encoding scheme: "ISO-2022-INT-1", which
> is intended to be a multilingual text encoding scheme of the Internet
> including, but not limited to, for electronic mail [RFC822] and
> network news [RFC1036]. The encoding is also useful in multilingual
> text files. The encoding is a multilingual extension of "ISO-2022-
> JP" [2022JP] and "ISO-2022-KR" [2022KR]. The encoding is supported
> by an Emacs based multilingual text editor: MULE [MULE].
>
>
>
> APCCIRN-I18N Expires on Aug 4, 1994 [Page 1]
> .
> INTERNET DRAFT Internet Multilingual Text Encoding February 1994
>
>
> The name, "ISO-2022-INT-1", is intended to be used in the "charset"
> parameter field of MIME headers (see [MIME1] and [MIME2]).
>
> Description
>
> The text with "ISO-2022-INT-1" starts in ASCII [ASCII] designated to
> G0 invoked as GL and KS C 5601 [KSC5601] designated to to G1, and
> switches to other character sets of ISO 2022 [ISO2022] through
> limited combinations of designation/invocation sequences. All the
> characters are encoded with 7 bits only.
>
> At the beginning of text, the existence of an announcer sequence:
> "ESC 2/0 4/2" and a designation/invocation sequence: "ESC 2/8 4/2 SI
> ESC 2/4 2/9 4/3 ESC 2/10 7/14 ESC 2/11 7/14" are (though omitted)
> assumed. The same designation/invocation sequence is also assumed
> (though unnecessary and, thus, omitted) at the beginning of each
> line. Thus, characters of 94 character sets are designated to G0 or
> G1 and invoked as GL by SI (shift in, "0/15") and SO (shift out,
> "0/14") each. Characters of 96 character sets are designated to G1
> and invoked as GL by SO. To make the encoding almost unique, a
> character set is designated only to either G0 or G1 and not to both.
>
> For example, the escape sequence "ESC 2/4 4/2" or "ESC $ B" indicates
> that the bytes following the escape sequence are Japanese JIS X
> 0208-1983 characters, which are encoded in two bytes each. A double
> byte sequence enclosed by SO and SI indicates a KS C 5601 [KSC5601]
> string unless other character sets are designated to G1. The escape
> sequence "ESC 2/13 4/1" or "ESC - A" indicates that ISO 8859-1 is
> designated to G1. After the designation, a character code "4/1" is
> interpreted to represent a character "A with acute", not ASCII "A".
>
> The following table gives the escape sequences and the character sets
> used in "ISO-2022-INT-1" messages. The reg# is the registration
> number in ISO's registry [ISOREG].
>
> 94 character sets
> reg# character set ESC sequence designated to
> ------------------------------------------------------------------
> 6 ASCII ESC 2/8 4/2 ESC ( B G0
> 14 JIS X 0201-Roman ESC 2/8 4/10 ESC ( J G0
>
> 94*94 character sets
> reg# character set ESC sequence designated to
> ------------------------------------------------------------------
> 42 JIS X 0208-1978 ESC 2/4 4/0 ESC $ @ G0
> 58 GB 2312-80 ESC 2/4 4/1 ESC $ A G0
> 87 JIS X 0208-1983 ESC 2/4 4/2 ESC $ B G0
> 149 KS C 5601-1987 ESC 2/4 2/9 4/3 ESC $ ) C G1
>
>
>
> APCCIRN-I18N Expires on Aug 4, 1994 [Page 2]
> .
> INTERNET DRAFT Internet Multilingual Text Encoding February 1994
>
>
> 159 JIS X 0212-1990 ESC 2/4 2/8 4/4 ESC $ ( D G0
> 171 CNS 11643-1986-1 ESC 2/4 2/8 4/7 ESC $ ( G G0
> 172 CNS 11643-1986-2 ESC 2/4 2/8 4/8 ESC $ ( H G0
>
> 96 character sets
> reg# character set ESC sequence designated to
> ------------------------------------------------------------------
> 100 ISO8859-1 ESC 2/13 4/1 ESC - A G1
> 126 ISO8859-7(Greek) ESC 2/13 4/6 ESC - F G1
>
> Handling of code points not specified in each standard is
> implementation dependent. For further information about the
> character sets and the escape sequences, see [ISO2022] and [ISOREG].
> Some Asian standards are also described in chapter 3 and 4 of
> [LUNDE].
>
> If there is any G0 designation other than ASCII in text, there must
> be a switch back to ASCII before a space character "2/0" (but not
> necessarily before "2/0" code of 96 character set, which usually
> represent non-breaking space) or control characters such as tab or
> CRLF. If there is any G1 designation other than KS C [KSC5601] in
> text, there must be a switch back to KS C before the end of line. If
> there is any G1 invocation in text, there must be a switch back to G0
> invocation before a space character or control characters such as tab
> or CRLF. This means that the next line starts in the ASCII character
> set that was switched to before the end of the previous line.
>
> Though ISO 2022 [ISO2022] and related standards permits long term,
> persistent states, "ISO-2022-INT-1" is designed not to need such
> states be preserved between lines. Applications such as pagers and
> editors which randomly seek within a text file encoded with "ISO-
> 2022-INT-1" can assume that the state is same as that of the
> beginning of the text.
>
> The text will end in ASCII designated to G0.
>
> Left-to-right directionality is assumed if the text is displayed
> horizontally.
>
> Users of "ISO-2022-INT-1" must be aware that some common transport
> such as old Bnews in Japan can not relay a 7-bit value "7/15"
> (decimal 127), which is used to encode, say, "y with diaeresis" of
> ISO 8859-1.
>
> Other restrictions are given in the Formal Syntax section below.
>
> Formal Syntax
>
>
>
>
> APCCIRN-I18N Expires on Aug 4, 1994 [Page 3]
> .
> INTERNET DRAFT Internet Multilingual Text Encoding February 1994
>
>
> The notational conventions used here are identical to those used in
> STD11, RFC 822 [RFC822].
>
> The * (asterisk) convention is as follows:
>
> l*m something
>
> meaning at least l and at most m somethings, with l and m taking
> default values of 0 and infinity, respectively.
>
> text = *(line CRLF)
>
> line = *(single-byte-char /
> (*g0-segment reset-desig-seq) /
> g1-segment /
> g1-desig-seq )
> ; note: must end KS C
> ; designated to G1
>
> g0-segment = single-byte-g0-segment /
> double-byte-g0-segment
>
> single-byte-g0-segment = single-byte-g0-seq *single-byte-char
>
> double-byte-g0-segment = double-byte-g0-seq *(one-of-94 one-of-94)
>
> g1-segment = single-byte-g1-96-segment /
> double-byte-g1-segment
> ; note: an appropriate segment
> ; should be selected according
> ; to the current state of G1
> ; designation
>
> single-byte-g1-96-segment = SO *one-of-96 SI
>
> double-byte-g1-segment = SO *(one-of-94 one-of-94) SI
>
> reset-desig-seq = ESC "(" "B"
>
> single-byte-g0-seq = ESC "(" ("B" / "J")
>
> double-byte-g0-seq = (ESC "$" ("@" / "A" / "B")) /
> (ESC "$" "(" ("D" / "G" / "H")
>
> g1-desig-seq = single-byte-g1-seq / double-byte-g1-seq
>
> single-byte-g1-seq = (ESC "-" ("A" / "F"))
>
>
>
>
> APCCIRN-I18N Expires on Aug 4, 1994 [Page 4]
> .
> INTERNET DRAFT Internet Multilingual Text Encoding February 1994
>
>
> double-byte-g1-seq = ESC "$" ")" "C"
>
> CRLF = CR LF
>
> ; ( Octal, Decimal.)
>
> ESC = <ISO 2022 ESC, escape> ; ( 33, 27.)
>
> SI = <ISO 2022 SI, shift-in> ; ( 17, 15.)
>
> SO = <ISO 2022 SO, shift-out> ; ( 16, 14.)
>
> CR = <ASCII CR, carriage return>; ( 15, 13.)
>
> LF = <ASCII LF, linefeed> ; ( 12, 10.)
>
> one-of-94 = <any one of 94 values> ; (41-176, 33.-126.)
>
> one-of-96 = <any one of 96 values> ; (40-177, 32.-127.)
>
> 7BIT = <any 7-bit value> ; ( 0-177, 0.-127.)
>
> single-byte-char = <any 7BIT, including bare CR & bare LF, but NOT
> including CRLF, and not including ESC, SI, SO>
>
> Mail System Considerations
>
> "ISO-2022-INT-1" is designed to be purely 7-bit, so that it can be
> used with any transport which conforms to STD 11, RFC822 [RFC822]
> without MIME, which is the current practice in Japan to use "ISO-
> 20220-JP" [2022JP].
>
> If "ISO-2022-INT-1" is used with MIME, MIME charset name may be given
> as follows:
>
> Content-Type: text/plain; charset=iso-2022-int-1
>
> Even if charset parameters are omitted, multilingual applications
> should still assume "ISO-2022-INT-1" or its latest available
> successor (see the section "Future Extension Plan"), not US-ASCII of
> MIME default, is used.
>
> The "ISO-2022-INT-1" encoding is already in 7-bit form, so it is not
> necessary to use a Content-Transfer-Encoding header. It should be
> noted that applying the Base64 or Quoted-Printable encoding will
> render the message unreadable in non-MIME-compliant software.
>
> "ISO-2022-INT-1" may also be used in mail headers. If bare STD11,
>
>
>
> APCCIRN-I18N Expires on Aug 4, 1994 [Page 5]
> .
> INTERNET DRAFT Internet Multilingual Text Encoding February 1994
>
>
> RFC822 without MIME is used, appropriate quoting of special
> characters as "quoted string" might be necessary with structured
> headers, which might not be supported in all the common environment.
> In MIME headers, Both "B" and "Q" encoding could be useful with
> "ISO-2022-INT-1" text.
>
> Future Extension Plan
>
> Future extensions of "ISO-2022-INT-1" will be named "ISO-2022-INT-2",
> "ISO-2022-INT-3" and so on. MIME charset names beginning with "ISO-
> 2022-INT-" are reserved for them. The family of encoding has an
> aggregated name: "ISO-2022-INT-*".
>
> The extensions will be solely by adding extra character sets of ISO
> 2022, though other extensions such as for bidirectionality support
> are possible. To avoid duplicated assignment of escape sequences,
> formal ISO registry [ISOREG] will, in general, be required, which
> does not deny the future possibility of IANA registration of escape
> sequences for private use purposes.
>
> The current feature of an initial designation of KS C 5601 to G1 will
> be removed in the versions of near future. Users of ISO-2022-INT-1
> are recommended to explicitly designate KS C 5601 to G1.
>
> To minimize the number of character sets, those which is already
> covered by the larger character sets and not so widely used should
> not be added. For example, Katakana character set of "JIS X 0201-
> Kana" is omitted because the set is completely covered by "JIS X
> 0208-1978" and not used at all in the Internet community of Japan.
>
> In any event, the property of "ISO-2022-INT-1" that:
>
> Though ISO 2022 [ISO2022] and related standards permits long term,
> persistent states, "ISO-2022-INT-1" is designed not to need such
> states be preserved between lines. Applications such as pagers
> and editors which randomly seek within a text file encoded with
> "ISO-2022-INT-1" can assume that the state is same as that of the
> beginning of the text.
>
> will be preserved.
>
> References
>
> [2022JP] Murai, J., Crispin, M., and E. van der Poel, "Japanese
> Character Encoding for Internet Messages", RFC 1468, June
> 1993.
>
> [2022JP2] Ohta, M., and Handa, K. "ISO-2022-JP-2: Multilingual
>
>
>
> APCCIRN-I18N Expires on Aug 4, 1994 [Page 6]
> .
> INTERNET DRAFT Internet Multilingual Text Encoding February 1994
>
>
> Extension of ISO-2022-JP", RFC 1554, December 1993.
>
> [2022KR] U. Choi, K., and Chon, H. Park, "Korean Character Encoding
> for Internet Messages", RFC 1557, December 1993.
>
> [ASCII] American National Standards Institute, "Coded character set
> -- 7-bit American national standard code for information
> interchange", ANSI X3.4-1986.
>
> [ISO2022] International Organization for Standardization (ISO),
> "Information processing -- ISO 7-bit and 8-bit coded
> character sets -- Code extension techniques", International
> Standard, Ref. No. ISO 2022-1986 (E).
>
> [ISOREG] International Organization for Standardization (ISO),
> "International Register of Coded Character Sets To Be Used
> With Escape Sequences".
>
> [KSC5601] Korea Industrial Standards Association, "Code for
> Information Interchange (Hangul and Hanja)," Korean
> Industrial Standard, 1987, Ref. No. KS C 5601-1987.
>
> [LUNDE] Lunde, K., "Understanding Japanese Information Processing,",
> O'Reilly & Associates, Inc., ISBN 1-56592-043-0, 1993.
>
> [MIME1] Borenstein, N., and Freed, N., "MIME (Multipurpose Internet
> Mail Extensions) Part One: Mechanisms for Specifying and
> Describing the Format of Internet Message Bodies", RFC 1521,
> September 1993.
>
> [MIME2] Moore, K., "MIME (Multipurpose Internet Mail Extensions) Part
> Two: Message Header Extensions for Non-ASCII Text", RFC 1522,
> September 1993.
>
> [MULE] Nishikimi, M., Handa, K., and Tomura, S., "Mule: MULtilingual
> Enhancement to GNU Emacs", Proc. of INET'93, August, 1993.
>
> [RFC822] Crocker, D., "Standard for the Format of ARPA Internet Text
> Messages", STD 11, RFC 822, August 1982.
>
> [RFC1036] Horton M., and Adams, R., "Standard for Interchange of
> USENET Messages", RFC 1036, AT&T Bell Laboratories, Center
> for Seismic Studies, December 1987.
>
> Acknowledgements
>
> This memo is the product of APCCIRN (Asian Pacific CCIRN)
> Internationalization group and reviewed by various people in a news
>
>
>
> APCCIRN-I18N Expires on Aug 4, 1994 [Page 7]
> .
> INTERNET DRAFT Internet Multilingual Text Encoding February 1994
>
>
> group: fj.kanji and by a mailing list: jp-msg@iij.ad.jp. Many people
> have contributed. In particular, Prof. Eiichi Wada of Tokyo
> University and Ken Lunde of Adobe Systems, Inc. has helped us based
> on profound knowledge in ISO 2022 and related standards. Uhhyung
> Choi of Korea Advanced Institute of Science and Technology has
> contributed to make the encoding upper compatible to ISO-2022-KR.
> Prof. Kilnam Chon of Korea Advanced Institute of Science and
> Technology and Prof. Jun Mirai of Keio University have provided the
> framework of international cooperation. The Authors wish to thank
> all the people who have helped to provide the memo.
>
> Security Considerations
>
> Security issues are not discussed in this memo.
>
> Authors' Addresses
>
> Masataka Ohta
> Tokyo Institute of Technology
> 2-12-1, O-okayama, Meguro-ku,
> Tokyo 152, JAPAN
>
> Phone: +81-3-5499-7084
> Fax: +81-3-3729-1940
> EMail: mohta@cc.titech.ac.jp
>
>
> Ken'ichi Handa
> Electrotechnical Laboratory
> Umezono 1-1-4, Tsukuba,
> Ibaraki 305, JAPAN
>
> Phone: +81-298-58-5916
> Fax: +81-298-58-5918
> EMail: handa@etl.go.jp
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> APCCIRN-I18N Expires on Aug 4, 1994 [Page 8]
> .
>
From apccirn-sec Sun Feb 27 21:50:24 1994
Received: from necom830.cc.titech.ac.jp by nic.nm.kr (8.6.4/8.6.4)
id VAA27934; Sun, 27 Feb 1994 21:50:14 +0900
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Sun, 27 Feb 94 21:39:57 +0859
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9402271240.AA19166@necom830.cc.titech.ac.jp>
Subject: IETF WG proposal
To: apccirn-i18n@nic.nm.kr
Date: Sun, 27 Feb 94 21:39:55 JST
In-Reply-To: <no.id>; from "mohta" at Feb 1, 94 10:45 pm
X-Mailer: ELM [version 2.3 PL11]
Dear members of APCCIRN I18N group;
After the successful publication of the Internet Draft on ISO-2022-INT-*,
I'm now trying to negotiate with IESG on the creation of the following
IETF working group.
I think our group can host the WG.
Any comments?
Masataka Ohta
Name:
Internationalization (i18n)
Areas:
USV & APP
Description of the Working Group:
The purpose of the i18n working group is to promote the
internationalization of the Internet.
The main goal of the working group is to develop a single text
encoding scheme useful for all the plain text in the world.
The group may address other issues which require technical
consideration about internationalization.
The group does not address politics of international coordination.
The working group is jointly operated by IETF and APCCIRN.
Goals:
Submit "Internet Multilingual Text Encoding: ISO-2022-INT-*" to the
IESG for consideration as a standard track document.
Submit an Informational RFC on why ISO 10646/UNICODE is inappropriate
as the single text encoding method in the world.
Submit "Mid- to long-term Architecture on Internet Text Encoding" to
the IESG for consideration as a standard track document.
Submit an Informational RFC of "instructions for RFC translators".
Internet Drafts:
The following two related Internet Drafts are posted today and will
soon be available.
draft-ohta-text-encoding-00.txt written by APCCIRN-I18N
Internet Multilingual Text Encoding: ISO-2022-INT-*
draft-ohta-translation-instr-00.txt written by me
Instructions to RFC Translators
The following related Internet Draft will soon be posted.
draft-ohta-mime-charset-names-00.txt written by me
MIME charset names for ISO 10646
From apccirn-sec Wed Mar 30 19:00:46 1994
Received: from cosmos.kaist.ac.kr by krnic.net (8.6.4/8.6.4)
id TAA26895; Wed, 30 Mar 1994 19:00:45 +0900
Received: from localhost (chon@localhost) by cosmos.kaist.ac.kr (8.6.4/8.6.4) id TAA15245 for ap-i18n@krnic.net; Wed, 30 Mar 1994 19:07:03 +0900
Date: Wed, 30 Mar 1994 19:07:03 +0900
From: Kilnam Chon <chon@cosmos.kaist.ac.kr>
Message-Id: <199403301007.TAA15245@cosmos.kaist.ac.kr>
To: ap-i18n@krnic.net
Subject: issue for ap-18n group
this is the issue for the i18n group of apccirn. would like to spend sometime
on this and other matters at the next apccirn meeting in june 17-18.
kilnam chon
------------------------------------------------------------------------
IESG Secretary writes:
>From root Fri Mar 25 12:29:56 1994
>To: Internet Architecture Board <iab@isi.edu>
>cc: The Internet Engineering Steering Group <IESG@CNRI.Reston.VA.US>
>cc: IETF-Announce:;
>Sender: ietf-announce-request@IETF.CNRI.Reston.VA.US
>From: IESG Secretary <iesg-secretary@CNRI.Reston.VA.US>
>Subject: Character Sets and other issues of Internationalization
>Date: Thu, 24 Mar 94 19:10:13 -0500
>X-Orig-Sender: scoya@CNRI.Reston.VA.US
>Message-ID: <9403241910.aa22138@IETF.CNRI.Reston.VA.US>
>
>
>Work in either character set (or coding) development or
>"internationalization" has major long-term architectural and policy
>implications for the Internet. It is clear that the work is important;
>it is clear that others, including several ISO/IEC JTC1 committees, are
>working parts of the issue. Much of the work and the success criteria
>for it are cultural and political, not engineering/technical.
>
>The IESG believes these issues need to be addressed by the IAB, and
>requests that they advise the IETF on architectural frameworks, and on
>what should be done within IETF and what should be done elsewhere.
>
>The IESG also requests the IAB to initiate liaisons with other groups
>(e.g. ISO/IEC JTC1 subgroups, especially SC2 and SC22, APCCIRN, RARE,
>CEN, French Ministry of Culture, etc.) as they believe would facilitate
>the work and reduce the odds of redundant or conflicting work and
>recommendations, and of concerned parties "shopping" for a standards
>body who can be persuaded to adoped approaches rejected elsewhere.
>
>Pending availability of this advice and recommendations, the IESG will
>refer any proposals to initiate standards-track character set work,
>other than requirements to narrowly profile existing and deployed
>standards for Internet use, to the IAB for your deliberations.
>
>
--LAB13340.764995780/cosmos.kaist.ac.kr--
From apng-sec Tue Dec 13 23:53:27 1994
Received: from necom830.cc.titech.ac.jp (necom830.cc.titech.ac.jp [131.112.32.132]) by cosmos.kaist.ac.kr (8.6.9H1/8.6.9) with SMTP id XAA09935 for <apng-i18n@apng.org>; Tue, 13 Dec 1994 23:53:14 +0900
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Tue, 13 Dec 94 23:52:53 +0859
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9412131453.AA00250@necom830.cc.titech.ac.jp>
Subject: Apng-i18n charter
To: apng-i18n@apng.org
Date: Tue, 13 Dec 94 23:52:52 JST
X-Mailer: ELM [version 2.3 PL11]
Dear APNG-I18N members;
FYI, the following is the current charter of the group. Any comments,
questions or new proposals?
Masataka Ohta
=============================================================================
APNG Internationalization/Localization Working Group (apng-i18n)
Last updated on 1994.12.13
CHARTER
1. Coordinator(s):
M. Ohta <mohta@cc.titech.ac.jp>
TEL: +81-3-5734-3299
FAX: +81-3-5734-3415
2. Description of Working Group:
The purpose of the i18n working group is to promote the
internationalization of the Internet.
The main goal of the working group is to develop a single text
encoding scheme useful for all the plain text in the world,
where a lot of Asian-Pacific specific issues still remaining.
The group may address other issues which require technical
consideration about internationalization.
The group does not handle politics on policy determination of
international coordination but may produce purely technical
guidelines for it.
3. Members:
Jimmy Hwang <jhwang@wiley.csusb.edu>,
M. Ohta <mohta@cc.titech.ac.jp>,
H.T. Koanatakool <htk@ipied.tu.ac.th>,
Trin Tantsetthi <trin@nwg.nectec.or.th>,
Jaekyung Song <jksong@cosmos.kaist.ac.kr>,
Woohyung Choi <whchoi@krnic.net>,
Kyuho Kim <kyuho@cosmos.kaist.ac.kr>,
APNG Secretariat <apng-sec@apng.org>,
Kilnam Chon <chon@cosmos.kaist.ac.kr>,
<handa@etl.go.jp>,
Abhaya Indurawa <abhaya@cse.mrt.ac.lk>,
<cheng@nwg.nectec.or.th>,
<shin@iij.ad.jp>,
<wschen@twnmoe10.edu.tw>,
Jun Murai <jun@wide.ad.jp>,
<nazo@sfc.wide.ad.jp>,
Jun Matsukata <jm@eng.isas.ac.jp>,
Shunichi Akazawa <akazawa@who.ch>,
Sunyoung Han <syhan@cosmos.kaist.ac.kr>,
<rong@watson.ibm.com>,
<ute@cc.noda.sut.ac.jp>,
<fuku@c1.kagu.sut.ac.jp>,
<lwbbs@shakti.ncst.ernet.in>,
Barry Greene <barry@singnet.com.sg>,
Lim Gek Meng <gmlim@singnet.com.sg>,
Ong Wee Cheong <ongwc@singnet.com.sg>,
Chang Wai Leong <cwl@singnet.com.sg>,
Lee Hyung-Seok <hyslee@coregate.kaist.ac.kr>,
Michell Chiang <michelle@technet.sg>,
Masaki Hirabaru <hi@nic.ad.jp>,
Suguru Yamaguchi <suguru@is.aist-nara.ac.jp>,
Akko Oka <oka@slab.ntt.jp>,
Glenn Mansfiend <glenn@aic.co.jp>,
Xiaoling Teng <ccteng@pkn.edu.cn>,
Susan S. Zhu <szhu@net.edu.cn>,
Haifeng Zhu <zhf@ns.net.edu.cn>,
P. T. Ho <hpt@cc.hku.hk>,
Shigeki Goto <goto@ntt-20.ntt.jp>,
Lawrence Law <cclaw@usthk.ust.hk>,
Hock-Koon Lim <lim@ctron.com>,
Shuichi Tashiro <tashiro@etl.go.jp>,
Qiming Li <liqm@bepc2.ihep.ac.cn>,
Raymond Poon <ccrpoon@cityu.edu.hk>,
Ming Lu <luming@tsinghua.edu.cn>
4. Mailing Lists:
General Discussion: apng-i18n@apng.org
To Subscribe: listserv@apng.org
Archive: apng.org:/apng/mail.archive/apng-i18n
5. Remark:
==============================================================================
From apng-sec Tue May 23 03:33:52 1995
Received: from josef.ifi.unizh.ch (josef.ifi.unizh.ch [130.60.48.10]) by cosmos.kaist.ac.kr (8.6.9H1/8.6.9) with SMTP id DAA23350 for <apng-i18n@cosmos.kaist.ac.kr>; Tue, 23 May 1995 03:33:48 +0900
Message-Id: <199505221833.DAA23350@cosmos.kaist.ac.kr>
Received: from ifi.unizh.ch by josef.ifi.unizh.ch
id <01499-0@josef.ifi.unizh.ch>; Mon, 22 May 1995 20:34:21 +0200
Subject: Re: UN: Unification Method
To: apng-i18n@cosmos.kaist.ac.kr
Date: Mon, 22 May 1995 20:34:20 +0200 (MET DST)
Cc: mduerst@ifi.unizh.ch, zhf@net.edu.cn, apng-cc@apng.org
In-Reply-To: <199505180837.RAA17044@necom830.cc.titech.ac.jp> from "Masataka Ohta" at May 18, 95 05:37:26 pm
X-Mailer: ELM [version 2.4 PL11]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 8bit
Content-Length: 2982
From: Martin J Duerst <mduerst@ifi.unizh.ch>
Sender: mduerst@ifi.unizh.ch
Masataka Ohta wrote (in comments to a posting of mine):
>> The Macintosh is a good example that
>> uses no escape sequences at all and is multilingual to a higher degree
>> than any other widely available system.
>
>Mac with or without UNICODE is merely as good as EUC.
Already a very simlpe application such as Hypercard is highly
multilingual.
I really wonder what the Mac can't do that escape sequences can.
Could you give examples?
>> And many applications and data formats that are not directly
>> related to high-quality printing will need no escape sequences
>> and no additional information as it is available via fonts and
>> scripts on the Mac.
>
>Try Greek people use Latin alphabet only except on high-quality printing.
Greek and Latin are neatly separated in Unicode/ISO 10646,
so your example is not appropriate. The case in question, namely
e.g. reading simplified Chinese with a Unicode font that contains
the glyphs for tradional Chinese (in case both glyphs are so close
as to share the same code point), is better compared to reading
Latin in an Italic font vs. reading it in a Roman font.
>We are already needing the distinction ignored in Unicode even for
>low quality bitmap display. That's the fact of daily life. There
>are no room of discussion.
Low quality bitmap display introduces many distortions of characters,
esp. where they have many strokes. The additional distortions
introduced *in the worst case* by Unicode are not as big.
And there is in general no need to use the wrong font, whereas
the distortions due to low resolution bitmaps, on a low resolution
device, cannot be helped.
>> To have the same code for things that are considered the same
>> is a very important benefit of unification.
>
>The problem is that, even though sample character shapes in CNS, GB, JIS
>and KS C may have some correspondence, the code points cover different
>area of allowable shape variation.
The standards don't say what shape variations they cover. Basically,
whatever shapes a font designer comes up with that are identifiable and
accepted by the public in the circumstances they are used are okay.
The Japanese standard gives explicit, but not exhaustive examples
of shapes that fall under the same code point. I do not know about
the Chinese standards, but maybe somebody from China could
give this information.
In general, in the case of the characters unified in Unicode/ISO 10646,
the allowable shape variations of a unified character clearly overlap
to a high degree, even if the "center of gravity" of the shape regions,
i.e. the preferred glyph shape according to average typographic
practice, may not be the same.
>> Unicode uses this principle wherever possible.
>
>Unicode is completely broken in this sense. Unicode is unusable in
>multi-lingual environment.
Unicode is very useful in a multi-lingual environment, more than
any other character encoding. If you don't want to use it, that's
your problem, but not ours.
Regards, Martin.
From apng-sec Thu May 25 17:52:31 1995
Received: from necom830.cc.titech.ac.jp (necom830.cc.titech.ac.jp [131.112.32.132]) by cosmos.kaist.ac.kr (8.6.9H1/8.6.9) with ESMTP id RAA16327 for <apng-i18n@cosmos.kaist.ac.kr>; Thu, 25 May 1995 17:51:20 +0900
Received: by necom830.cc.titech.ac.jp (8.6.11/necom-mx-rg); Thu, 25 May 1995 17:46:32 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Message-Id: <199505250846.RAA07226@necom830.cc.titech.ac.jp>
Subject: Re: UN: Unification Method
To: mduerst@ifi.unizh.ch (Martin J Duerst)
Date: Thu, 25 May 95 17:46:30 JST
Cc: apng-i18n@cosmos.kaist.ac.kr, mduerst@ifi.unizh.ch, zhf@net.edu.cn,
apng-cc@apng.org
In-Reply-To: <199505221834.DAA23353@cosmos.kaist.ac.kr>; from "Martin J Duerst" at May 22, 95 8:34 pm
X-Mailer: ELM [version 2.3 PL11]
> >> The Macintosh is a good example that
> >> uses no escape sequences at all and is multilingual to a higher degree
> >> than any other widely available system.
> >
> >Mac with or without UNICODE is merely as good as EUC.
>
> Already a very simlpe application such as Hypercard is highly
> multilingual.
They are multilingual in a way that they can be configured to
be multiple single lingual instances, which is what EUC already
done.
> I really wonder what the Mac can't do that escape sequences can.
> Could you give examples?
Just compare full ISO 2022 and EUC.
> >> And many applications and data formats that are not directly
> >> related to high-quality printing will need no escape sequences
> >> and no additional information as it is available via fonts and
> >> scripts on the Mac.
> >
> >Try Greek people use Latin alphabet only except on high-quality printing.
>
> Greek and Latin are neatly separated in Unicode/ISO 10646,
Yes, double standard.
> so your example is not appropriate.
Why don't you try to force Greek and Russian use ISO 8859/1 only?
> >We are already needing the distinction ignored in Unicode even for
> >low quality bitmap display. That's the fact of daily life. There
> >are no room of discussion.
>
> Low quality bitmap display introduces many distortions of characters,
> esp. where they have many strokes.
We have our own definition on what is the acceptable distortions.
> The additional distortions
> introduced *in the worst case* by Unicode are not as big.
We already judged it unacceptable.
> >> To have the same code for things that are considered the same
> >> is a very important benefit of unification.
> >
> >The problem is that, even though sample character shapes in CNS, GB, JIS
> >and KS C may have some correspondence, the code points cover different
> >area of allowable shape variation.
>
> The standards don't say what shape variations they cover.
So, you must supply that information, which is the problem.
> Basically,
> whatever shapes a font designer comes up with that are identifiable and
> accepted by the public in the circumstances they are used are okay.
Okay for monocultural environment.
> In general, in the case of the characters unified in Unicode/ISO 10646,
> the allowable shape variations of a unified character clearly overlap
> to a high degree,
Urrr, I don't think you have any real world expertise to judge it.
> Unicode is very useful in a multi-lingual environment, more than
> any other character encoding.
If you think so, use EUC.
Masataka Ohta
From apng-sec Fri May 26 00:12:49 1995
Received: from josef.ifi.unizh.ch (josef.ifi.unizh.ch [130.60.48.10]) by cosmos.kaist.ac.kr (8.6.9H1/8.6.9) with SMTP id AAA19271 for <apng-i18n@cosmos.kaist.ac.kr>; Fri, 26 May 1995 00:12:34 +0900
Message-Id: <199505251512.AAA19271@cosmos.kaist.ac.kr>
Received: from ifi.unizh.ch by josef.ifi.unizh.ch
id <00584-0@josef.ifi.unizh.ch>; Thu, 25 May 1995 17:12:51 +0200
Subject: Re: UN: Unification Method
To: mohta@necom830.cc.titech.ac.jp (Masataka Ohta)
Date: Thu, 25 May 1995 17:12:50 +0200 (MET DST)
Cc: apng-i18n@cosmos.kaist.ac.kr, apng-cc@apng.org
In-Reply-To: <199505250846.RAA07226@necom830.cc.titech.ac.jp> from "Masataka Ohta" at May 25, 95 05:46:30 pm
X-Mailer: ELM [version 2.4 PL11]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 8bit
Content-Length: 4674
From: Martin J Duerst <mduerst@ifi.unizh.ch>
Sender: mduerst@ifi.unizh.ch
Masataka Ohta wrote, in response to a contribution of mine:
>> >> The Macintosh is a good example that
>> >> uses no escape sequences at all and is multilingual to a higher degree
>> >> than any other widely available system.
>> >
>> >Mac with or without UNICODE is merely as good as EUC.
>>
>> Already a very simlpe application such as Hypercard is highly
>> multilingual.
>
>They are multilingual in a way that they can be configured to
>be multiple single lingual instances, which is what EUC already
>done.
>
>> I really wonder what the Mac can't do that escape sequences can.
>> Could you give examples?
>
>Just compare full ISO 2022 and EUC.
From what you are saying, I have to conclude that you are not
very familiar with the multilingual capabilities of the Mac.
In Hypercard, you can have Japanese, Chinese, Arabic, Hebrew,
Korean, and so on, in one and the same single field on a single
card, all with correct high-quality glyphs (True Type or Postscript).
And this just because Hypercard uses the basic text facilities of
the Mac OS, rather than trying to do better like some word processing
programs.
I would still like to hear what you think the Mac would do better
if it used Escape sequences. Please give actual examples. Just
refering to EUC doesn't help, as there is a big difference between
multiscript/multilingual Mac text processing and EUC based Unix
localization.
>> >Try Greek people use Latin alphabet only except on high-quality printing.
>>
>> Greek and Latin are neatly separated in Unicode/ISO 10646,
>
>Yes, double standard.
There is no double standard. Claiming so shows that you are not
familliar with the principles used in Unicode/ISO10646 and with
your own Japanese character standard JIS X 0208.
According to the shape criteria of Unicode, Latin, Greek, and
Cyrillic 'A' could have been unified. But for backward compatibility,
Unicode excluded unification of characters that have separate code
points in well used standards, so as to allow round-trip conversion.
JIS X 0208 is one of the few standards that contains code points
for all the three. Unifying them would have ment that it would
be impossible to convert a text from JIS, SJIS, or Japanese EUC
encoding to Unicode and back without loss of information.
>> >We are already needing the distinction ignored in Unicode even for
>> >low quality bitmap display. That's the fact of daily life. There
>> >are no room of discussion.
>>
>> Low quality bitmap display introduces many distortions of characters,
>> esp. where they have many strokes.
>
>We have our own definition on what is the acceptable distortions.
>
>> The additional distortions
>> introduced *in the worst case* by Unicode are not as big.
>
>We already judged it unacceptable.
Is this "We" a pluralis maiestatis? Or have you done controlled
experiments? I would be interested to hear about them. The
only experiments I have heard about have been small scale,
but point out that the differences are ignored by most
subjects unless you give them very strong hints to help
get avare of the differences.
>> >> To have the same code for things that are considered the same
>> >> is a very important benefit of unification.
>> >
>> >The problem is that, even though sample character shapes in CNS, GB, JIS
>> >and KS C may have some correspondence, the code points cover different
>> >area of allowable shape variation.
>>
>> The standards don't say what shape variations they cover.
>
>So, you must supply that information, which is the problem.
By saying that you have to supply some information, you are admitting
that the standards don't define it, and are contradicting your previous
statement.
>> Basically,
>> whatever shapes a font designer comes up with that are identifiable and
>> accepted by the public in the circumstances they are used are okay.
>
>Okay for monocultural environment.
It may be perfectly possible that a good font designer comes up
with a new font that is accepted in all CJK areas and doesn't
need glyph distinctions. On the other hand, it would be very
difficult to get Japanese used to e.g. a Long Song type of font,
even if it used Japanese glyph shapes.
>> In general, in the case of the characters unified in Unicode/ISO 10646,
>> the allowable shape variations of a unified character clearly overlap
>> to a high degree,
>
>Urrr, I don't think you have any real world expertise to judge it.
What real-world experience do you have? How many times have
you looked at font specimen from different sources, and found
that they don't agree, for quite some characters, on the details
you consider 'out of discussion'? I can give you examples, if
necessary.
Regards, Martin.
From apng-sec Fri May 26 15:16:20 1995
Received: from net.edu.cn (ns.net.edu.cn [166.111.1.10]) by cosmos.kaist.ac.kr (8.6.9H1/8.6.9) with SMTP id PAA24196; Fri, 26 May 1995 15:16:10 +0900
Received: from [166.111.1.115] by net.edu.cn (4.0/SMI-4.0)
id AA00866; Fri, 26 May 95 14:02:31 CST
From: "Zhu, Haifeng" <zhf@net.edu.cn>
Date: Fri, 26 May 95 01:48:34 CST
Message-Id: <612.zhf@net.edu.cn_POPMail/PC_3.2.2>
Reply-To: <zhf@net.edu.cn>
X-Popmail-Charset: English
To: mduerst@ifi.unizh.ch
Cc: mohta@necom830.cc.titech.ac.jp, apng-cc@apng.org, apng-i18n@apng.org
Subject: Re: UN: Unification Method
On Thu, 25 May 1995 16:29:56 +0200 (ME, Martin J Duerst wrote:
>Zhu, Haifeng long ago has indicated that we will discuss
>unification and related issues in apng-cc, and at that time
>as well as later when he in fact opened the discussion,
>there was no complaint. Also, it is clear that unification
>is related to the topic of this group.
>
>Regards, Martin.
Since this is also related to the scope of i18n, in a sense. I think
we can CC these mail to apng-i18n@apng.org too, if Mr. Ohta or i18n
agree.
Best Regards.
-- Haifeng --
Zhu,Haifeng
Coordinator of APNG-CC (Asia-Pacific Networking Group)
Dept. of Computer Sci.&Tech., Tsinghua University
Institute of Networking, Tsinghua University
Beijing 100084, People's Republic of China
Tel: +86-1-2561144 ext 3492 Fax: +86-1-2564173
Email: zhf@net.edu.cn
From apng-sec Fri May 26 15:40:42 1995
Received: from necom830.cc.titech.ac.jp (necom830.cc.titech.ac.jp [131.112.32.132]) by cosmos.kaist.ac.kr (8.6.9H1/8.6.9) with ESMTP id PAA24317; Fri, 26 May 1995 15:40:24 +0900
Received: by necom830.cc.titech.ac.jp (8.6.11/necom-mx-rg); Fri, 26 May 1995 15:30:55 +0859
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Message-Id: <199505260631.PAA12421@necom830.cc.titech.ac.jp>
Subject: Re: UN: Unification Method
To: zhf@net.edu.cn
Date: Fri, 26 May 95 15:30:54 JST
Cc: mduerst@ifi.unizh.ch, apng-cc@apng.org, apng-i18n@apng.org
In-Reply-To: <612.zhf@net.edu.cn_POPMail/PC_3.2.2>; from "Zhu, Haifeng" at May 26, 95 1:48 am
X-Mailer: ELM [version 2.3 PL11]
> Since this is also related to the scope of i18n, in a sense. I think
> we can CC these mail to apng-i18n@apng.org too, if Mr. Ohta or i18n
> agree.
No, use apng-i18n only. We should suspend the discussion 2 or 3 days
so that all interested parties can also register to apng-i18n ML.
Masataka Ohta
From apng-sec Fri May 26 16:02:51 1995
Received: from net.edu.cn (ns.net.edu.cn [166.111.1.10]) by cosmos.kaist.ac.kr (8.6.9H1/8.6.9) with SMTP id PAA24429; Fri, 26 May 1995 15:58:44 +0900
Received: from [166.111.1.115] by net.edu.cn (4.0/SMI-4.0)
id AA00978; Fri, 26 May 95 14:57:06 CST
From: "Zhu, Haifeng" <zhf@net.edu.cn>
Date: Fri, 26 May 95 02:43:10 CST
Message-Id: <458.zhf@net.edu.cn_POPMail/PC_3.2.2>
Reply-To: <zhf@net.edu.cn>
X-Popmail-Charset: English
To: mohta@necom830.cc.titech.ac.jp
Cc: mduerst@ifi.unizh.ch, apng-cc@apng.org, apng-i18n@apng.org
Subject: Re: UN: Unification Method
On Fri, 26 May 95 15:30:54 JST, Masataka Ohta wrote:
>> Since this is also related to the scope of i18n, in a sense. I think
>> we can CC these mail to apng-i18n@apng.org too, if Mr. Ohta or i18n
>> agree.
>
>No, use apng-i18n only. We should suspend the discussion 2 or 3 days
>so that all interested parties can also register to apng-i18n ML.
Why, if we concentrate on Chinese ? If we use i18n, I'm afraid we are not
discussing Chinese transfer method using unified methods, which is insisted
to be used by some experts.
Noted that unifief coding is also a way of Chinese transfer, it could be
evaluated in this group.
-- Haifeng --
Zhu,Haifeng
Coordinator of APNG-CC (Asia-Pacific Networking Group)
Dept. of Computer Sci.&Tech., Tsinghua University
Institute of Networking, Tsinghua University
Beijing 100084, People's Republic of China
Tel: +86-1-2561144 ext 3492 Fax: +86-1-2564173
Email: zhf@net.edu.cn
From apng-sec Fri May 26 16:39:14 1995
Received: from toad.lake.cs.wwu.edu (toad.lake.cs.wwu.EDU [140.160.138.10]) by cosmos.kaist.ac.kr (8.6.9H1/8.6.9) with SMTP id QAA24670 for <apng-i18n@apng.org>; Fri, 26 May 1995 16:39:08 +0900
Received: by toad.lake.cs.wwu.edu (5.0/SMI-SVR4)
id AA29243; Fri, 26 May 1995 00:36:14 -0700
Date: Fri, 26 May 1995 00:36:14 -0700
From: n8442161@toad.lake.cs.wwu.edu (Patrick Tuttle)
Message-Id: <9505260736.AA29243@toad.lake.cs.wwu.edu>
To: apng-i18n@apng.org
Subject: subscribe n8442161@toad.lake.cs.wwu.edu Patrick Tuttle
content-length: 55
subscribe n8442161@toad.lake.cs.wwu.edu Patrick Tuttle
From apng-sec Fri May 26 16:39:42 1995
Received: from necom830.cc.titech.ac.jp (necom830.cc.titech.ac.jp [131.112.32.132]) by cosmos.kaist.ac.kr (8.6.9H1/8.6.9) with ESMTP id QAA24679; Fri, 26 May 1995 16:39:29 +0900
Received: by necom830.cc.titech.ac.jp (8.6.11/necom-mx-rg); Fri, 26 May 1995 16:34:59 +0859
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Message-Id: <199505260735.QAA12879@necom830.cc.titech.ac.jp>
Subject: Re: UN: Unification Method
To: zhf@net.edu.cn
Date: Fri, 26 May 95 16:34:58 JST
Cc: mduerst@ifi.unizh.ch, apng-cc@apng.org, apng-i18n@apng.org
In-Reply-To: <458.zhf@net.edu.cn_POPMail/PC_3.2.2>; from "Zhu, Haifeng" at May 26, 95 2:43 am
X-Mailer: ELM [version 2.3 PL11]
> >> Since this is also related to the scope of i18n, in a sense. I think
> >> we can CC these mail to apng-i18n@apng.org too, if Mr. Ohta or i18n
> >> agree.
> >
> >No, use apng-i18n only. We should suspend the discussion 2 or 3 days
> >so that all interested parties can also register to apng-i18n ML.
>
> Why, if we concentrate on Chinese ?
Concentrate on Chinese? Then, use apng-cc only. Members of apng-i18n are
already notified the existence of apng-cc.
> Noted that unifief coding is also a way of Chinese transfer, it could be
> evaluated in this group.
Sure. But, according to you, if the scope is communication in Chinese,
GB 2312 is a universal, fixed length encoding.
So, what are the remaining points?
Masataka Ohta
From apng-sec Mon May 29 06:18:40 1995
Received: from josef.ifi.unizh.ch (josef.ifi.unizh.ch [130.60.48.10]) by cosmos.kaist.ac.kr (8.6.9H1/8.6.9) with SMTP id GAA10309; Mon, 29 May 1995 06:18:31 +0900
Message-Id: <199505282118.GAA10309@cosmos.kaist.ac.kr>
Received: from ifi.unizh.ch by josef.ifi.unizh.ch
id <00902-0@josef.ifi.unizh.ch>; Sun, 28 May 1995 23:19:02 +0200
Subject: Re: UN: Scope of discussion
To: apng-i18n@apng.org, apng-cc@apng.org
Date: Sun, 28 May 1995 23:19:01 +0200 (MET DST)
X-Mailer: ELM [version 2.4 PL11]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 8bit
Content-Length: 2785
From: Martin J Duerst <mduerst@ifi.unizh.ch>
Sender: mduerst@ifi.unizh.ch
Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
and Zhu, Haifeng (zhf@net.edu.cn) have made some
comments on what should be discussed on which
mailing list.
I think we all agree that apng-cc is dedicated to
transfer of Chinese text, and apng-i18n to more
general issues, and that this shouldn't be changed.
The main problem seems to be that discussing unifi-
cation for Chinese, as we have set out to do in apng-cc,
can in many cases not so easily be separated from
other aspects, such as multilingual issues (which
we have already between Mandarin and Cantonese),
other scripts such as Latin and Greek (which are
part of the existing Chinese standards, and are used),
general advantages and disadvantages of unification
and Unicode (because many of them directly or
indirectly apply to Chinese) or even such specific
issues like glyph shapes in Japanese (because both
Masataka Ohta and I are more familliar with Japanese
than with Chinese).
All these issues are related to our main topic, and
therefore they will pop up from time to time. Getting
the greater picture is often advisable when trying
to make decisions.
>> Yes, I think it is needed to be dicussed if concentrated on needs of Chinese
>> Internet communication, as the charter described "mixed/unified method".
>
>As long as it is unrelated to multilingual issues, that's OK.
>
>The problem is in Martin who unnecessarily confuse Chinese and non-Chinese
>issues.
I have just quickly re-read the mails in our thread. The result
was interesting. Many of the points that at a later stage were
criticized to be unappropriate for apng-cc started out as
unneccessary, unsubstantiated, and/or factually incorrect side-
remarks from the person who is most criticizing that the topics
are unappropriate once the arguments lie on the table.
[Just a bit of historic reference for those that have been on
apng-cc for a while: I remember a specific situation where
somebody opened a special mailing list at a point where it
tournend out that he had run out of arguments. I wouldn't
like to get the same impression now (the only difference
being that the mailing list already exists).]
>Apng-cc has the specific purpose and is NOT the place of general
>discussion between Chinese people. And, Martin is not a Chinese.
Nice of Masataka Ohta to tell me (and the list).
Guess I now have to tell him that he isn't Chinese, either, but
Japanese. Guess I also could tell him that these facts are not
relevant to our mailing lists (but that I won't open a new mailing
list to discuss the issues; just to remove any doubts and get
the readers on equal settings for both of us, I will add here
that I am Swiss).
I appologize in advance to all those on the lists that already
knew the above facts, or didn't care anyway.
Regards, Martin.
From apng-sec Mon May 29 14:16:24 1995
Received: from net.edu.cn (ns.net.edu.cn [166.111.1.10]) by cosmos.kaist.ac.kr (8.6.9H1/8.6.9) with SMTP id OAA12699; Mon, 29 May 1995 14:15:01 +0900
Received: from [166.111.1.115] by net.edu.cn (4.0/SMI-4.0)
id AA00570; Mon, 29 May 95 12:49:24 CST
From: "Zhu, Haifeng" <zhf@net.edu.cn>
Date: Mon, 29 May 95 00:34:56 CST
Message-Id: <476.zhf@net.edu.cn_POPMail/PC_3.2.2>
Reply-To: <zhf@net.edu.cn>
X-Popmail-Charset: English
To: mduerst@ifi.unizh.ch
Cc: apng-cc@apng.org, apng-i18n@apng.org
Subject: Re: UN: Scope of discussion
On Sun, 28 May 1995 23:19:01 +0200 (ME, Martin J Duerst wrote:
>Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
>and Zhu, Haifeng (zhf@net.edu.cn) have made some
>comments on what should be discussed on which
>mailing list.
>
>I think we all agree that apng-cc is dedicated to
>transfer of Chinese text, and apng-i18n to more
>general issues, and that this shouldn't be changed.
>
>The main problem seems to be that discussing unifi-
>cation for Chinese, as we have set out to do in apng-cc,
>can in many cases not so easily be separated from
>other aspects, such as multilingual issues (which
>we have already between Mandarin and Cantonese),
>other scripts such as Latin and Greek (which are
>part of the existing Chinese standards, and are used),
>general advantages and disadvantages of unification
>and Unicode (because many of them directly or
>indirectly apply to Chinese) or even such specific
>issues like glyph shapes in Japanese (because both
>Masataka Ohta and I are more familliar with Japanese
>than with Chinese).
>All these issues are related to our main topic, and
>therefore they will pop up from time to time. Getting
>the greater picture is often advisable when trying
>to make decisions.
Agree, they could refered if related with Chinese. Unification especially
Unicode/10646 for Chinese transfer encoding should be discussed in apng-cc.
-- Haifeng --
Zhu,Haifeng
Coordinator of APNG-CC (Asia-Pacific Networking Group)
Dept. of Computer Sci.&Tech., Tsinghua University
Institute of Networking, Tsinghua University
Beijing 100084, People's Republic of China
Tel: +86-1-2561144 ext 3492 Fax: +86-1-2564173
Email: zhf@net.edu.cn
From apng-sec Sun Jun 11 22:21:17 1995
Received: from necom830.cc.titech.ac.jp (necom830.cc.titech.ac.jp [131.112.32.132]) by cosmos.kaist.ac.kr (8.6.9H1/8.6.9) with ESMTP id WAA11567; Sun, 11 Jun 1995 22:21:08 +0900
Received: by necom830.cc.titech.ac.jp (8.6.11/necom-mx-rg); Sun, 11 Jun 1995 22:18:01 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Message-Id: <199506111318.WAA00763@necom830.cc.titech.ac.jp>
Subject: Agenda for APNG-I18N meeting at Honolulu
To: apng-all@apng.org, apng-i18n@apng.org, apng-cc@apng.org
Date: Sun, 11 Jun 95 22:18:00 JST
X-Mailer: ELM [version 2.3 PL11]
Dear members of APNG;
Below is the current agenda on the upcoming apng-i18n meeting:
Date: 1 July 1995(9:00 - 12:00)
Location: Sheraton Waikiki Hotel
1. General issues
2. Font CDROM Project by Shuichi Tashiro
3. Report of the work of APNG-CC by Prof. Hu
4. Report of APNG-CC RFC-to-be draft of "Chinese Encoding in the
Internet" by Prof. Hu
3 and 4 might be able to be merged.
Any comments?
Masataka Ohta
From apng-sec Mon Jun 12 18:39:56 1995
Received: from net.edu.cn (ns.net.edu.cn [166.111.1.10]) by cosmos.kaist.ac.kr (8.6.9H1/8.6.9) with SMTP id SAA17725; Mon, 12 Jun 1995 18:39:22 +0900
Received: from [166.111.1.115] by net.edu.cn (4.0/SMI-4.0)
id AA01298; Mon, 12 Jun 95 17:43:25 CST
From: "Zhu, Haifeng" <zhf@net.edu.cn>
Date: Mon, 12 Jun 95 17:31:22 CST
Message-Id: <15.zhf@net.edu.cn_POPMail/PC_3.2.2>
Reply-To: <zhf@net.edu.cn>
X-Popmail-Charset: English
To: mohta@necom830.cc.titech.ac.jp
Cc: apng-i18n@apng.org, apng-cc@apng.org
Subject: Re: Agenda for APNG-I18N meeting at Honolulu
On Sun, 11 Jun 95 22:18:00 JST, Masataka Ohta wrote:
> Date: 1 July 1995(9:00 - 12:00)
> Location: Sheraton Waikiki Hotel
>
> 1. General issues
> 2. Font CDROM Project by Shuichi Tashiro
> 3. Report of the work of APNG-CC by Prof. Hu
> 4. Report of APNG-CC RFC-to-be draft of "Chinese Encoding in the
> Internet" by Prof. Hu
>
>3 and 4 might be able to be merged.
>
>Any comments?
Prof. Hu is not in Beijing now, and the report of APNG-CC is now being
written. So, Prof Hu told me that he'd like to recommend Prof. Li Xing to
report on APNG-CC's work and RFC-to-be draft, and he'll preside the APNG-CC
meeting.
Is it ok ?
-- Haifeng --
Zhu,Haifeng
Coordinator of APNG-CC (Asia-Pacific Networking Group)
Dept. of Computer Sci.&Tech., Tsinghua University
Institute of Networking, Tsinghua University
Beijing 100084, People's Republic of China
Tel: +86-1-2561144 ext 3492 Fax: +86-1-2564173
Email: zhf@net.edu.cn
From apng-sec Thu Jul 13 22:04:25 1995
Received: from necom830.cc.titech.ac.jp (necom830.cc.titech.ac.jp [131.112.32.132]) by cosmos.kaist.ac.kr (8.6.9H1/8.6.9) with ESMTP id WAA24424; Thu, 13 Jul 1995 22:00:25 +0900
Received: by necom830.cc.titech.ac.jp (8.6.11/necom-mx-rg); Thu, 13 Jul 1995 21:55:43 +0900
Date: Thu, 13 Jul 1995 21:55:43 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Message-Id: <199507131255.VAA13313@necom830.cc.titech.ac.jp>
To: apng-all@apng.org, apng-cc@apng.org, apng-i18n@apng.org,
bal@umacmr2.umac.mo, ding@asiainfo.com, edith-wu@cuhk.hk,
hwpark@garam.kreonet.re.kr, j.boellaard@genie.com, jesmith@well.com,
kei@rd.nacsis.ac.jp, mcpong@hkusub.hku.hk, mingfung@cuhk.hk,
mohta@necom830.cc.titech.ac.jp, nschen@cc.nsysu.edu.tw,
oka@slab.ntt.jp, sean@hntp2.hinet.net, sstseng@cis.nctu.edu.tw,
tashiro@etl.go.jp, tsenglm@mbox.ee.ncu.edu.tw, xing@cernet.edu.cn
Subject: Draft Minutes of the APNG-I18N Meeting at Honolulu
Dear APNG members;
Please review the following draft minutes of the APNG-I18N meeting
at Honolulu.
Comments should be sent to
apng-i18n@apng.org
or
apng-cc@apng.org
Masataka Ohta
PS
This is a resent message. Those who have received the previous mail,
sorry for the wrong addresses of apng mailinglist.
------------------------------------------------------------------------
I18N WG Meeting Minutes (DRAFT)
Participants:
Shuichi Tashiro ETL, JAPAN tashiro@etl.go.jp
Xing Li CERNET xing@cernet.edu.cn
Man-Chi Pong The Univ. of Hong Kong mcpong@hkusub.hku.hk
Alex Lai Univ. of Macau bal@umacmr2.umac.mo
Nian-Shing Chen National Sun Yat-sen nschen@cc.nsysu.edu.tw
Univ. Taiwan
Shian-Shyong Tseng TANet sstseng@cis.nctu.edu.tw
Edith Wu The Chinese Univ. of edith-wu@cuhk.hk
Hong Kong
Atsuko Oka NTT oka@slab.ntt.jp
Yusheng Ji NACSIS, Japan kei@rd.nacsis.ac.jp
Jeff Smith Bridge to Asia jesmith@well.com
Jerry Boellaard COMTECH-Hawaii j.boellaard@genie.com
James Ding Asia Info Services, Inc ding@asiainfo.com
Kinming Fung Chinese University of mingfung@cuhk.hk
Hong Kong
Masataka Ohta Tokyo Inst. of Tech mohta@necom830.cc.titech.ac.jp
HyoungWoo Park SERI, KOREA hwpark@garam.kreonet.re.kr
Chen Shyang-yih DCI sean@c2.hinet.net
Tseng Li-Ming CC.MOE.TAIWAN tsenglm@ncuee.ncu.edu.tw
Chair: Masataka Ohta
Documents:
Chinese Character Encoding for Internet Message <DRAFT>
Some experts of APNG-CC
Agenda:
Solicite a Volunteer for Note-taking
Agenda Bashing
Presentation of Font CDROM Project (by Dr. Shuichi Tashiro)
APNG-CC Charter Review
APNG-CC Political Discussion (Final Decision)
APNG-CC Draft Review (by Prof. Li Xing)
New Work Items
APNG-CC Rescheduling
Election
Summary of the Discussion:
1. Font CD ROM project:
- CD-ROM Project
Concept:Font is necessary!
CD-ROM $200/disk
Original intention: To supply small company for TV GAME or ...
Not for Internet
Important Thing:
What type of font should be used?
We can use it immediately.
Respecting original language culture
Cheap! IPR free if possible
Question:
How many fonts already are there?
Can we use Copy Right Font?
- maybe, make new font sets.
Let's discuss apng-i18n ML.
2. APNG-CC Charter Review
The following Charter of APNG-CC compiled by Zhu, Haifeng was
reviewd and approved.
Since there are more and more Chinese using the Internet, the Chinese
tranfer encoding method should be developed. People in P.R.C, Taiwan,
HongKong and Singarpore are using methodes quite different from each other.
We hope to build a suitable mechnism and write an RFC-to-be Internet Draft
to solve this problem.
The work might include: study on available standards/non-standards,
feasibility study on how to mix/unify them including political/cultural
aspects, design of encoding method, write an RFC-to-be Internet Draft.
The work should be done as much as possible through email.
It was stressed that the charter says:
to build a suitable mechnism
and NOT "multiple suitable mechnisms" NOR "the suitable mechnism".
3. APNG-CC Political Discussion (Final Decision)
The wording in the current draft is reviewed and approved by
all the particpants of both sides of the Taiwan straight.
4. APNG-CC Draft Review
Prof. Li Xing has presented an ISO-2022-CN draft.
The following issues are discussed:
Designation
Formally registered ones only
Sub scheme switching (Escape Sequences)
No consensus was formed.
Conformance v.s. Interoperability
have a minimum conformance of
ASCII,
GB2312
CN5 116431,2
It should be stated in the draft that, text beyond the
minimum conformance is not assured to be interoperable.
It was pointed out that GB 2312 font is copyright protected.
So, we agreed that APNG-I18N should strongly recomend P.R.C.
to make GB2312 font copyright free.
Treatment of other encoding mechanisms (HZ, EUC-GB, Big5, ISO 10646...)
It was agreed that APNG-CC draft should not recommend nor
discourage other encoding mechanisms. It may, instead,
give information references.
The current paragraph which discourages HZ MUST be removed.
Liason to HZ developing groups
The followings are the locations of HZ developing group
ftp://ftp.edu.tw/
ftp://cnd.org/
ftp://ftp.ifcss.org/
http://www.ifcss.org/
ML: soft-author@ifcss.org
soft-author-request@ifcss.org
It was requested that "apng.net" should have an aliasing pointer
to "soft-author@ifcss.org".
5. New Work Items
It was agreed that it might be a good idea to provide a separate, new
document "Implementation guidelines for ISO-2022-CN", which covers
information on:
free font locator
conversion tools
editor(s)
x related tools
But, there was no volunteer found.
The volunteers are still being sought in the mailing list.
6. APNG-CC Rescheduling
It will be good if a draft is finalized within a month. After two
weeks of review as an Internet Draft, it should be sent to the RFC
editor.
7. Election
While it would be a good idea to formally elect a chair of APNG-I18N
WG, no one has any idea about that. The current tentative chair
will consult the newly elected chair of APNG on the appropriate
procedure.
From apng-sec Tue Jun 18 21:25:44 1996
Return-Path: demizu@space.csl.sony.co.jp
Received: from space.csl.sony.co.jp (root@space.csl.sony.co.jp [133.138.1.86]) by ins.apng.org (8.6.12+2.4W/3.4W-1.0) with ESMTP id VAA14097 for <apng-i18n@apng.org>; Tue, 18 Jun 1996 21:25:44 +0900
Received: from space.csl.sony.co.jp by space.csl.sony.co.jp (8.7.3/2.8Wb)
id MAA28158; Tue, 18 Jun 1996 12:26:39 GMT
Message-Id: <199606181226.MAA28158@space.csl.sony.co.jp>
From: Noritoshi Demizu <demizu@csl.sony.co.jp>
To: apng-i18n@apng.org
Subject: APNG i18n WG Home page
X-Mailer: Mew version 1.05+ on Emacs 19.28.4, Mule 2.3
Mime-Version: 1.0
Content-Type: Text/Plain; charset=us-ascii
Date: Tue, 18 Jun 1996 21:26:39 +0900
Sender: demizu@space.csl.sony.co.jp
Dear APNG i18n WG members,
APNG i18n WG home page has been compiled at
<URL:http://www.csl.sony.co.jp/person/demizu/apng-i18n/>.
Any comments are welcome.
To make this page more complete, could you send me following
information which isn't on this page yet?
- WG meeting menutes which aren't on this page
- Sample texts for any charsets on the Charset page
- Any activities/products/pages related to i18n/l10n
(especially those in Asia-Pacific area)
Thank you very much.
Best Regards,
Noritoshi Demizu, Sony CSL
From apng-sec Sat Jun 22 23:33:45 1996
Return-Path: apng-sec@rs.krnic.net
Received: from rs.krnic.net (rs.krnic.net [202.30.64.23]) by ins.apng.org (8.6.12+2.4W/3.4W-1.0) with ESMTP id XAA14810 for <apng-i18n@apng.org>; Sat, 22 Jun 1996 23:33:43 +0900
Received: from Mail.IDT.NET by rs.krnic.net (8.6.4/8.6.4)
id AAA26436; Sun, 23 Jun 1996 00:33:44 +1000
Received: from pm1-29.ppp.satelnet.org (pm1-29.ppp.satelnet.org [204.157.227.88]) by Mail.IDT.NET (8.7.4/8.7.3) with SMTP id EAA26392; Sat, 22 Jun 1996 04:39:11 -0400 (EDT)
Message-Id: <199606220839.EAA26392@Mail.IDT.NET>
Comments: Authenticated sender is <hardwear@mail.idt.net>
From: "Neil" <hardwear@mail.idt.net>
To: hardwear@idt.net, hardwear@idt.net, hardwear@idt.net, hardwear@idt.net,
hardwear@idt.net, hardwear@idt.net, hardwear@idt.net, hardwear@idt.net,
hardwear@idt.net, hardwear@idt.net
Date: Sat, 22 Jun 1996 04:39:01 +0000
MIME-Version: 1.0
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT
Subject: Jewelry for Computer Lovers!!!
Reply-to: hardwear@idt.net
Priority: normal
X-mailer: Pegasus Mail for Windows (v2.33)
Hello,
If you like jewelry and computers check out the WEB site
http://hardwear.com
You will not receive any more messages from us
Thank you
From apng-sec Sun Jun 23 02:49:25 1996
Return-Path: apng-sec@rs.krnic.net
Received: from rs.krnic.net (rs.krnic.net [202.30.64.23]) by ins.apng.org (8.6.12+2.4W/3.4W-1.0) with ESMTP id CAA14849 for <apng-i18n@apng.org>; Sun, 23 Jun 1996 02:49:24 +0900
Received: from Mail.IDT.NET by rs.krnic.net (8.6.4/8.6.4)
id DAA26802; Sun, 23 Jun 1996 03:49:24 +1000
Received: from pm2-23.ppp.satelnet.org (pm2-23.ppp.satelnet.org [204.157.227.112]) by Mail.IDT.NET (8.7.4/8.7.3) with SMTP id JAA13412; Sat, 22 Jun 1996 09:01:37 -0400 (EDT)
Message-Id: <199606221301.JAA13412@Mail.IDT.NET>
Comments: Authenticated sender is <hardwear@mail.idt.net>
From: "Neil" <hardwear@mail.idt.net>
To: hardwear@idt.net, hardwear@idt.net, hardwear@idt.net, hardwear@idt.net,
hardwear@idt.net, hardwear@idt.net, hardwear@idt.net, hardwear@idt.net,
hardwear@idt.net, hardwear@idt.net
Date: Sat, 22 Jun 1996 08:28:28 +0000
MIME-Version: 1.0
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT
Subject: Jewelry for Computer Lovers!!!
Reply-to: hardwear@idt.net
Priority: normal
X-mailer: Pegasus Mail for Windows (v2.33)
Hello,
If you like jewelry and computers check out the WEB site
http://hardwear.com
You will not receive any more messages from us
Thank you
From apng-sec Wed Aug 7 01:43:15 1996
Return-Path: maeda@ulis.ac.jp
Received: from bach.ulis.ac.jp (bach.ulis.ac.jp [133.51.32.2]) by ins.apng.org (8.6.12+2.4W/3.4W-1.0) with SMTP id BAA27011 for <apng-i18n@apng.org>; Wed, 7 Aug 1996 01:43:14 +0900
Received: from ulis.ac.jp (eboshi) by bach.ulis.ac.jp (4.2/6.4JAIN-ulis-bach2)
id AA21138; Wed, 7 Aug 96 01:45:05 JST
Message-Id: <9608061645.AA21138@bach.ulis.ac.jp>
To: apng-i18n@apng.org
Cc: maeda@ulis.ac.jp
Date: Wed, 07 Aug 1996 01:45:04 +0900
From: Akira MAEDA <maeda@ulis.ac.jp>
subscribe apng-i18n
From apng-sec Tue Nov 12 09:32:31 1996
Return-Path: mohta@necom830.hpcl.titech.ac.jp
Received: from necom830.hpcl.titech.ac.jp (necom830.hpcl.titech.ac.jp [131.112.32.132]) by ins.apng.org (8.6.12+2.4W/3.4W-1.0) with ESMTP id JAA10129 for <apng-i18n@apng.org>; Tue, 12 Nov 1996 09:32:30 +0900
From: Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp>
Message-Id: <199611120033.JAA05413@necom830.hpcl.titech.ac.jp>
Received: by necom830.hpcl.titech.ac.jp (8.6.11/TM2.1)
id JAA05413; Tue, 12 Nov 1996 09:33:33 +0900
Subject: APNG I18N WG Hong Kong meeting
To: apng-i18n@apng.org
Date: Tue, 12 Nov 96 9:33:32 JST
X-Mailer: ELM [version 2.3 PL11]
Dear APNG-I18N members;
Do you or your colleague have any topic about Internationalization and/or
Localization to be discussed/presented in the upcoming APNG meeting?
Proposals to initiate an action of defining how specific local
characters should be encoded on the Internet, like RFC 1922, are welcome.
Please reply to this maling list or privately to me.
I can make a presentation of how was the revision of JIS X 0208
and how will JIS X 0213, the third and forth level Kanji characters,
be, if some of you may be interested in them.
Masataka Ohta
From apng-sec Tue Nov 12 11:49:06 1996
Return-Path: tashiro@media.etl.go.jp
Received: from etlpost.etl.go.jp (etlpost.etl.go.jp [192.31.197.33]) by ins.apng.org (8.6.12+2.4W/3.4W-1.0) with ESMTP id LAA10161 for <apng-i18n@apng.org>; Tue, 12 Nov 1996 11:49:03 +0900
Received: from etlpom.etl.go.jp by etlpost.etl.go.jp (8.6.9+2.4W/2.7W)
id LAA14056; Tue, 12 Nov 1996 11:51:03 +0900
Received: by etlpom.etl.go.jp (4.1/6.4J.6-ETLpom.MASTER)
id AA13062; Tue, 12 Nov 96 11:51:02 JST
Received: by media.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE)
id LAA27943; Tue, 12 Nov 1996 11:51:01 +0900
Message-Id: <199611120251.LAA27943@media.etl.go.jp>
From: tashiro@etl.go.jp (Shuichi TASHIRO)
To: apng-i18n@apng.org
Subject: Re: APNG I18N WG Hong Kong meeting
In-Reply-To: Your message of "Tue, 12 Nov 1996 09:33:32 JST"
References: <199611120033.JAA05413@necom830.hpcl.titech.ac.jp>
Mime-Version: 1.0
Content-Type: text/plain;charset="ISO-2022-JP"
Date: Tue, 12 Nov 1996 11:50:57 +0900
Sender: tashiro@media.etl.go.jp
> Do you or your colleague have any topic about Internationalization and/or
> Localization to be discussed/presented in the upcoming APNG meeting?
We are planning to have a symposium on multilingual information
processing area at Singapore on two days from April to Jun of 1997.
The title of the symposium is tentatively "International Symposium on
the Standardization of Multilingual Information Technologies"
Around 100 people will be invited from countries in Asia.
MITI will support the cost of conference place and travel fee of
speakers (and maybe some participants).
I would like to announce this symposium and discuss the detail of the
symposium (program, theme, speakers, etc.) at the APNG.
--
Shuichi Tashiro
Electrotechnical Laboratory
From apng-sec Tue Nov 12 18:18:20 1996
Return-Path: nakayama
Received: (from nakayama@localhost) by ins.apng.org (8.6.12+2.4W/3.4W-1.0) id SAA10225; Tue, 12 Nov 1996 18:18:20 +0900
Message-Id: <199611120918.SAA10225@ins.apng.org>
To: apng-i18n@apng.org
Subject: Check your e-mail address.
X-Mailer: Mew version 1.06 on Emacs 19.28.1, Mule 2.3
Mime-Version: 1.0
Content-Type: Text/Plain; charset=us-ascii
Date: Tue, 12 Nov 1996 18:18:19 +0900
From: Masaya Nakayama <nakayama>
I removed the folowing entry because of such reasons.
# Jimmy Hwang <jhwang@wiley.csusb.edu> User Unknown
# Woohyung Choi <whchoi@krnic.net> User Unknown
# <lwbbs@shakti.ncst.ernet.in> User Unknown
# Hock-Koon Lim <lim@ctron.com> User Unknown
# Ming Lu <luming@tsinghua.edu.cn> User Unknown
# zhf@captain.net.tsinghua.edu.cn User Unknown
# <fuku@c1.kagu.sut.ac.jp> Host UnKnown
# Xiaoling Teng <ccteng@pkn.edu.cn> Host UnKnown
When you will chenge your e-mail address, please update your
entry by yourself.
We are maintaining MLs by majordomo system. If you don't know
that system, please send a mail to "listserv@apng.org" or
"majordomo@apng.org" with "help" line in its body.
Thanks for your coorperation.
--
Masaya Nakayama, APNG secretariat
From apng-sec Fri May 16 17:52:48 1997
Return-Path: mohta@necom830.hpcl.titech.ac.jp
Received: from necom830.hpcl.titech.ac.jp (necom830.hpcl.titech.ac.jp [131.112.32.132]) by ins.apng.org (8.8.5/3.4W-1.0) with ESMTP id RAA09886; Fri, 16 May 1997 17:52:48 +0900 (JST)
From: Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp>
Message-Id: <199705160852.RAA17108@necom830.hpcl.titech.ac.jp>
Received: by necom830.hpcl.titech.ac.jp (8.6.11/TM2.1)
id RAA17108; Fri, 16 May 1997 17:52:32 +0900
Subject: Kuala Lumpur APNG
To: apng-i18n@apng.org, apng-cc@apng.org
Date: Fri, 16 May 97 17:52:31 JST
X-Mailer: ELM [version 2.3 PL11]
Dear members of APNG I18N WG;
The next APNG meeting will be held at Kuala Lumpur, Malaysia on
June 27 and 28 just after INET'97.
If you have any topic related to APNG I18N WG to be discussed there,
please let me know through e-mail to me or to apng-i18n@apng.org.
Masataka Ohta
Updated: 2012.8.19
Contact sec at InternetHistory.asia for further information.