apng-i18n

From chon@cosmos.kaist.ac.kr Wed Apr 28 10:12:49 1993

Return-Path: <chon@cosmos.kaist.ac.kr>

Received: from cosmos.kaist.ac.kr by mani.kaist.ac.kr (4.1/SMI-4.1)

id AA08284; Wed, 28 Apr 93 10:12:49 KST

Errors-To: Postmaster@cosmos.kaist.ac.kr

Received: by cosmos.kaist.ac.kr (4.1/SMI-4.1)

id AA00552; Wed, 28 Apr 93 10:17:33 KST

Date: Wed, 28 Apr 93 10:17:33 KST

From: chon@cosmos.kaist.ac.kr (Kilnam Chon)

Message-Id: <9304280117.AA00552@cosmos.kaist.ac.kr>

Errors-To: Postmaster@cosmos.kaist.ac.kr

To: apccirn-i18n@nic.nm.kr

Subject: first mail

Will you acknowledge this mail to form the mailing list on internationalization

and localization?

The goal of this group is to make the networking friendly to non-English

speakers. The current networking does not support Asian languages properly,

and we need to do the following;

Internationalization to provide the framework

Localization to provide the local language/culture support

Kilnam Chon

PS: the mailing list will be (minimally) moderated by apccirn-sec initially

until the moderator/chair is elected.

From mohta@necom830.cc.titech.ac.jp Wed Apr 28 11:47:12 1993

Received: from kum.kaist.ac.kr by mani.kaist.ac.kr (4.1/SMI-4.1)

id AA08935; Wed, 28 Apr 93 11:47:12 KST

Errors-To: Postmaster@necom830.cc.titech.ac.jp

Received: from necom830.cc.titech.ac.jp by kum.kaist.ac.kr (4.1/KUM-0.1)

id AA05450; Wed, 28 Apr 93 11:53:53 KST

Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Wed, 28 Apr 93 11:43:55 +0859

From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>

Return-Path: <mohta@necom830.cc.titech.ac.jp>

Message-Id: <9304280244.AA21562@necom830.cc.titech.ac.jp>

Subject: Re: first mail

To: chon@cosmos.kaist.ac.kr (Kilnam Chon)

Date: Wed, 28 Apr 93 11:43:53 JST

Cc: apccirn-i18n@nic.nm.kr

In-Reply-To: <9304280117.AA00552@cosmos.kaist.ac.kr>; from "Kilnam Chon" at Apr 28, 93 10:17 am

X-Mailer: ELM [version 2.3 PL11]

> Will you acknowledge this mail to form the mailing list on internationalization

> and localization?

Yes, I will.

> The goal of this group is to make the networking friendly to non-English

> speakers. The current networking does not support Asian languages properly,

> and we need to do the following;

>

> Internationalization to provide the framework

> Localization to provide the local language/culture support

I'm also interested in the possibility of

Internationalization to provide the local language/culture

support without localization

Masataka Ohta

From trin@nwg.nectec.or.th Wed Apr 28 16:25:53 1993

Return-Path: <trin@nwg.nectec.or.th>

Received: from munnari.oz.au by mani.kaist.ac.kr (4.1/SMI-4.1)

id AA10404; Wed, 28 Apr 93 16:25:53 KST

Errors-To: Postmaster@nwg.nectec.or.th

Received: from [192.150.251.31] by munnari.oz.au with SMTP (5.83--+1.3.1+0.50)

id AA10547; Wed, 28 Apr 1993 17:29:05 +1000 (from trin@nwg.nectec.or.th)

From: trin@nwg.nectec.or.th (Trin Tantsetthi)

Message-Id: <9304290554.AA26274@nwg.nectec.or.th>

To: apccirn-i18n@nic.nm.kr

Subject: Re: first mail

In-Reply-To: Your message of Wed, 28 Apr 93 10:17:33 T.

<9304280117.AA00552@cosmos.kaist.ac.kr>

Date: Wed, 28 Apr 93 12:54:08 -1700

Hello,

This is an acknowledgement per request by chon@cosmos.kaist.ac.kr

(Kilnam Chon).

I am the secretariat of the internationalization and international

standards coexistence working group of the Thai national standards body

(TISI/TC536/SC2/WG2). TISI is an O-member of ISO. Thai character set is

registered with ECMA as ISO-IR-166; the corresponding local version

is called the TIS 620-2533 standard.

Regards,

-Trin

From htk@ipied.tu.ac.th Wed Apr 28 23:31:18 1993

Return-Path: <htk@ipied.tu.ac.th>

Received: from kum.kaist.ac.kr by mani.kaist.ac.kr (4.1/SMI-4.1)

id AA11120; Wed, 28 Apr 93 23:31:18 KST

Errors-To: Postmaster@ipied.tu.ac.th

Received: from chulkn.chula.ac.th by kum.kaist.ac.kr (4.1/KUM-0.1)

id AA17562; Wed, 28 Apr 93 23:37:57 KST

Received: by chulkn.chula.ac.th (Smail3.1.28.1 #12)

id m0noD7y-0003QmC; Wed, 28 Apr 93 21:28 BKK

Received: by ipied.tu.ac.th (4.1/SMI-3.2A+08)

id AA03321; Wed, 28 Apr 93 21:30:28+0700

Date: Wed, 28 Apr 1993 21:30:03 +0700 (GMT+0700)

From: Hugh Thaweesak Koanantakool <htk@ipied.tu.ac.th>

Subject: Re: first mail

To: Kilnam Chon <chon@cosmos.kaist.ac.kr>

Cc: apccirn-i18n@nic.nm.kr

In-Reply-To: <9304280117.AA00552@cosmos.kaist.ac.kr>

Message-Id: <Pine.3.07.9304282100.B3308-8100000@ipied.tu.ac.th>

Mime-Version: 1.0

Content-Type: TEXT/PLAIN; charset=US-ASCII

On Wed, 28 Apr 1993, Kilnam Chon wrote:

> Will you acknowledge this mail to form the mailing list on internationalization

> and localization?

Here is it!

Thaweesak.

From chon@cosmos.kaist.ac.kr Fri Apr 30 12:40:47 1993

Return-Path: <chon@cosmos.kaist.ac.kr>

Received: from cosmos.kaist.ac.kr by mani.kaist.ac.kr (4.1/SMI-4.1)

id AA05266; Fri, 30 Apr 93 12:40:47 KST

Errors-To: Postmaster@cosmos.kaist.ac.kr

Received: by cosmos.kaist.ac.kr (4.1/SMI-4.1)

id AA17172; Fri, 30 Apr 93 12:45:33 KST

Date: Fri, 30 Apr 93 12:45:33 KST

From: chon@cosmos.kaist.ac.kr (Kilnam Chon)

Message-Id: <9304300345.AA17172@cosmos.kaist.ac.kr>

Errors-To: Postmaster@cosmos.kaist.ac.kr

To: apccirn-i18n@nic.nm.kr

Subject: this group and JWCC session and first activity

Internationalization/localization(i18n/l10n) is one of the most important issues

for the Asian networking community. I would like to see orderly delivery of

networking software with appropriate local language/culture support to Asian

community in timely manner. The APCCIRN and this group is to help/guide such

capability to be realized.

As the first step, I would like to propose the special session on local language

support at the JWCC(Joint Workshop on Computer Communications) in Taipei in Dec.

12-14 immediately after the planned APCCIRN Meeting at the same city. The

deadline of papers are due June 30. My proposal is the panel discussion with

brief description of what is available now in Chinese, Japanese, Korean and

other langauges, and what are the major issues we are facing now(such as

Unicode).

Can you recommend the panel/session chair with panelists/speakers, and comment

on the content?

The above activity would give us the current status in the region, and we can

start working on what to do next.

From mohta@necom830.cc.titech.ac.jp Thu May 13 21:47:15 1993

Received: from daiduk.kaist.ac.kr by nic.nm.kr (4.1/SMI-4.1)

id AA04754; Thu, 13 May 93 21:47:15 KST

Errors-To: Postmaster@nic.nm.kr

Received: from necom830.cc.titech.ac.jp by daiduk.kaist.ac.kr (4.1/KAISTNet-Relay-3.2)

id AA02632; Thu, 13 May 93 21:41:50 KST

Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Thu, 13 May 93 21:14:27 +0900

From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>

Return-Path: <mohta>

Message-Id: <9305131214.AA03128@necom830.cc.titech.ac.jp>

Subject: JWCC and i18n

To: apccirn-i18n@nic.nm.kr

Date: Thu, 13 May 93 21:14:26 JST

X-Mailer: ELM [version 2.3 PL11]

As Kilnam suggested, as part of the activity of APCCIRN i18n group,

let's promote a research on issues of Asian languages at the next

JWCC.

First of all, I would like to ask how many of you are planning to

participate in the next JWCC (1993 Dec. 12~14, Taipei, Taiwan).

The possible topics are:

1) how local languages are currently supported

2) special feature of local languages

3) how local languages should be supported

4) how local language support should be internationalized

As for a formal procedure, we must send papers or a panel proposal to

the program committee before 7/1.

I think topics 1) and 2) are not so much research oriented but could be

an interesting presentation as a panel session (I could be wrong).

But, if we could submit enough number of research papers on language

issues to JWCC, a paper session is possible in which topics 1) and 2)

could also be covered in the introduction parts of papers.

Any suggestions are welcome.

Masataka Ohta

From chon@cosmos.kaist.ac.kr Mon May 17 14:36:52 1993

Return-Path: <chon@cosmos.kaist.ac.kr>

Received: from kum.kaist.ac.kr by nic.nm.kr (4.1/SMI-4.1)

id AA08887; Mon, 17 May 93 14:36:52 KST

Errors-To: Postmaster@cosmos.kaist.ac.kr

Received: from cosmos.kaist.ac.kr by kum.kaist.ac.kr (4.1/KUM-0.1)

id AA01594; Mon, 17 May 93 14:33:05 KST

Received: by cosmos.kaist.ac.kr (4.1/SMI-4.1)

id AA06952; Mon, 17 May 93 14:32:40 KST

From: chon@cosmos.kaist.ac.kr (Kilnam Chon)

Message-Id: <9305170532.AA06952@cosmos.kaist.ac.kr>

Errors-To: Postmaster@cosmos.kaist.ac.kr

Subject: Re: JWCC and i18n

To: mohta@necom830.cc.titech.ac.jp (Masataka Ohta)

Date: Mon, 17 May 93 14:32:39 KST

Cc: apccirn-i18n@nic.nm.kr

In-Reply-To: <9305131214.AA03128@necom830.cc.titech.ac.jp>; from "Masataka Ohta" at May 13, 93 9:14 pm

X-Mailer: ELM [version 2.3 PL11]

the first thing we should do is to recruit apccirn-i18n members from various

countries. for example, we have only one from japan, and none from hong kong

and many other countries.

kilnam chon

PS: Unix International just delivered the following report on April 16.

Guidelines for the Development of Localization Packages, 80 pages.

From uhhyung Fri May 28 04:49:49 1993

Return-Path: <uhhyung>

Received: by nic.nm.kr (4.1/SMI-4.1)

id AA14061; Fri, 28 May 93 04:49:49 KST

From: uhhyung (Uhhyung Choi)

Message-Id: <9305271949.AA14061@nic.nm.kr>

Errors-To: Postmaster@nic.nm.kr

Subject: Status of the Korean Encoding for Internet Messages

To: mohta@necom830.cc.titech.ac.jp (Masataka Ohta)

Date: Fri, 28 May 1993 04:49:49 +0900 (KST)

Cc: apccirn-i18n@nic.nm.kr

In-Reply-To: <9305271345.AA10711@necom830.cc.titech.ac.jp> from "Masataka Ohta" at May 27, 93 10:45:00 pm

X-Mailer: ELM [version 2.4 PL21-h3]

Mime-Version: 1.0

Content-Type: text/plain; charset=US-ASCII

Content-Transfer-Encoding: 7bit

Content-Length: 3093

Prof. Ohta,

I have no plan to attend JWCC in Taipei as for now. Are you planning

to have apccirn-i18n related meeting in upcoming JWCC?

Before any activities, we'll have to make the charter to define the

issues, goals and milestones for the WG.

Information you requested on the status of the Korean encoding

as follows. I'd like to hear about the situation in Taiwan also.

For your background information, we have three nation-wide IP networks

in Korea. KREN(Korea Research and Educational Network), KREONet(Korea

Research and Educational Open Network) are those funded by government,

each by Ministry of Education and by Ministry of Science and Technology

respectively. (The names doesn't seem to give any meaning to me though.)

The other network, SDN, which began its operation in the early eighties

connecting domestic organizations with UUCP and TCP/IP, now has grown

to a membership based network which has also a 56Kbps link to

NASA Science Internet.

The encoding began to be used in SDN in late 1991. and spread to the

other two networks in early 1992. It is the only encoding used to carry

Korean characters as fas as I know. And as for its frequency get used.

I get several tens of emails everyday, about half or two-third of which

are in Korean.

The encoding itself has somewhat different role than that of Japan.

Unlike Japanese practice, we don't recommand the encoding used as the

storage code. Actually, my own implementation of the encoding doesn't

allow the encoding be stored as a file, but used as the transit media.

We don't have any hardware that supports the encoding.

We don't use any encoding in USENET news. We've arranged all the NNTP

gateways handle EUC code correctly so we use bare EUC with USENET news.

It is required that each new organization establishing a new connection to

the network provider can handle mails in the encoding.

Accidentally, the encoding has partial compatibility with the encoding

used in SunOS Korean Language Environment, but most people doesn't seem to

even know about it. Moreover, the KLE itself doesn't have easy documentation

for the novice users, so it doesn't seem to help users get acquaintence

with Korean email system. Personally I don't prefer the implementation

used in KLE be used widely for it leaves the encoded message in each user's

mailbox though there is not any tool to manipulate with the encoding.

I have plan to ask Sun to change their encoding and implementation

currently used in KLE after the Internet-Draft published as an

Informational RFC.

I know several students studying abroad who can read and write in Korean

But I don't have any statistics about it. It seems people overseas get to

know about the encoding by the colleagues in Korea who use the encoding

day to day.

Several months ago, I've discussed implementation issues with a person

from SONY but I don't know whether he was concerned with adopting the

encoding in their workstation's operating system.

If you would like to hear further information, please let me know.

--

Uhhyung Choi

Korea Network Information Center

uhhyung@nic.nm.kr

From mohta@necom830.cc.titech.ac.jp Thu Jun 17 10:33:01 1993

Received: from daiduk.kaist.ac.kr by nic.nm.kr (4.1/SMI-4.1)

id AA13545; Thu, 17 Jun 93 10:33:01 KST

Errors-To: Postmaster@necom830.cc.titech.ac.jp

Received: from necom830.cc.titech.ac.jp by daiduk.kaist.ac.kr (4.1/KAISTNet-Relay-3.2)

id AA14380; Thu, 17 Jun 93 10:23:50 KST

Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Thu, 17 Jun 93 10:20:57 +0859

From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>

Return-Path: <mohta@necom830.cc.titech.ac.jp>

Message-Id: <9306170121.AA10755@necom830.cc.titech.ac.jp>

Subject: internet default character code

To: apccirn-i18n@nic.nm.kr

Date: Thu, 17 Jun 93 10:20:56 JST

X-Mailer: ELM [version 2.3 PL11]

Considering the current practice in Japan and Korea, I think it

is worthwhile to standardize that the default 7bit character encoding

method on the Internet be full 7 bit ISO 2022.

That is, if nothing else is specified on a 7 bit stream, the character code

used should be assumed to be ISO 2022 with the initial designation of

ASCII to GO and none to G1/2/3.

As there already exist much non-MIME 7-bit traffic with Japanese

news/mail and Korean news, it is practical to make them legitimate.

Though MIME has limited capability to specify a character set, isn't

MIME too complex to use only to legislate currently used character

encoding? Moreover, it is usable only when there can be a header part.

Any opinions?

Masataka Ohta

From uhhyung Sat Jun 19 00:26:51 1993

Return-Path: <uhhyung>

Received: by nic.nm.kr (4.1/SMI-4.1)

id AA18355; Sat, 19 Jun 93 00:26:51 KST

From: uhhyung (Uhhyung Choi)

Message-Id: <9306181526.AA18355@nic.nm.kr>

Errors-To: Postmaster

Subject: Re: internet default character code

To: mohta@necom830.cc.titech.ac.jp (Masataka Ohta)

Date: Sat, 19 Jun 1993 00:26:50 +0900 (KST)

Cc: apccirn-i18n@nic.nm.kr

In-Reply-To: <9306170121.AA10755@necom830.cc.titech.ac.jp> from "Masataka Ohta" at Jun 17, 93 10:20:56 am

X-Mailer: ELM [version 2.4 PL21-h3]

Mime-Version: 1.0

Content-Type: text/plain; charset=US-ASCII

Content-Transfer-Encoding: 7bit

Content-Length: 1225

As Masataka Ohta writes:

*

* Considering the current practice in Japan and Korea, I think it

* is worthwhile to standardize that the default 7bit character encoding

* method on the Internet be full 7 bit ISO 2022.

*

* That is, if nothing else is specified on a 7 bit stream, the character code

* used should be assumed to be ISO 2022 with the initial designation of

* ASCII to GO and none to G1/2/3.

Yes. That is exactly what is being used in Japanese and Korean email

these days.

* As there already exist much non-MIME 7-bit traffic with Japanese

* news/mail and Korean news, it is practical to make them legitimate.

No, we don't have 7-bit news traffic that carries any kind of Hangul

characters encoded.

* Though MIME has limited capability to specify a character set, isn't

* MIME too complex to use only to legislate currently used character

* encoding? Moreover, it is usable only when there can be a header part.

Yes, MIME is a little bit complex, but I think we'd better stick with MIME

rather than introducing another method for extended character encoding.

Do you have any simple idea that will make current practice legistimate?

--

Uhhyung Choi

Korea Network Information Center

From mohta@necom830.cc.titech.ac.jp Thu Jun 24 16:56:31 1993

Received: from necom830.cc.titech.ac.jp by nic.nm.kr (4.1/SMI-4.1)

id AA09636; Thu, 24 Jun 93 16:56:31 KST

Errors-To: Postmaster@necom830.cc.titech.ac.jp

Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Thu, 24 Jun 93 16:48:00 +0900

From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>

Return-Path: <mohta@necom830.cc.titech.ac.jp>

Message-Id: <9306240748.AA11873@necom830.cc.titech.ac.jp>

Subject: Re: internet default character code

To: uhhyung@nic.nm.kr (Uhhyung Choi)

Date: Thu, 24 Jun 93 16:47:59 JST

Cc: apccirn-i18n@nic.nm.kr

In-Reply-To: <9306181526.AA18355@nic.nm.kr>; from "Uhhyung Choi" at Jun 19, 93 12:26 am

X-Mailer: ELM [version 2.3 PL11]

Sorry about the confusion on mails and news in Korea.

> * Though MIME has limited capability to specify a character set, isn't

> * MIME too complex to use only to legislate currently used character

> * encoding? Moreover, it is usable only when there can be a header part.

>

> Yes, MIME is a little bit complex, but I think we'd better stick with MIME

> rather than introducing another method for extended character encoding.

My proposal can interoperate with MIME and applicable to non-mail

traffics.

> Do you have any simple idea that will make current practice legistimate?

Simple. Write an internet draft saying it legitimate because it is the

current practice used by 100,000 or 1,000,000 of people on the internet.

And, that is what I'm proposing.

Masataka Ohta

From mohta@necom830.cc.titech.ac.jp Fri Jul 16 19:07:19 1993

Received: from necom830.cc.titech.ac.jp by nic.nm.kr (4.1/SMI-4.1)

id AA07696; Fri, 16 Jul 93 19:07:19 KST

Errors-To: Postmaster@necom830.cc.titech.ac.jp

Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Fri, 16 Jul 93 18:59:24 +0900

From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>

Return-Path: <mohta@necom830.cc.titech.ac.jp>

Message-Id: <9307160959.AA26139@necom830.cc.titech.ac.jp>

Subject: IETF BOF

To: apccirn-i18n@nic.nm.kr

Date: Fri, 16 Jul 93 18:59:23 JST

X-Mailer: ELM [version 2.3 PL11]

In the last IETF, BOF on character encoding was held.

The discussion will continue on the mailing list, and WG will be,

perhaps, formed.

All of you, who have interested in this issue should register

your mail address to:

ietf-charsets-request@innosoft.com.

MO

From mohta@necom830.cc.titech.ac.jp Wed Jul 21 18:17:01 1993

Received: from necom830.cc.titech.ac.jp ([131.112.4.4]) by nic.nm.kr (4.1/SMI-4.1)

id AA08624; Wed, 21 Jul 93 18:17:01 KST

Errors-To: Postmaster@necom830.cc.titech.ac.jp

Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Wed, 21 Jul 93 16:50:35 +0900

From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>

Return-Path: <mohta>

Message-Id: <9307210750.AA14623@necom830.cc.titech.ac.jp>

Subject: JWCC paper

To: apccirn-i18n@nic.nm.kr

Date: Wed, 21 Jul 93 16:50:33 JST

X-Mailer: ELM [version 2.3 PL11]

Following is my paper submitted for the next JWCC. The paper

is somewhat revised reflecting the discussion in the last

IETF.

Any comments?

Masataka Ohta

PS

Does someone on this list know the contact person of KCS commettee.

.TL

Character Encoding Method for Internationalized Plain Text Processing

.AU

Masataka Ohta

.AI

Computer Center, Tokyo Institute of Technology

2-12-1, O-okayama, Meguro-ku, Tokyo 152, JAPAN

Tel: +81-3-5499-7084, Fax: +81-3-3729-1940

.br

.ce 0

.AB

Encoding, decoding and comparison of text are the most basic operations

of plain text processing.

By inspecting various aspects of these operations under multilingual

environment, operational requirements for internationalized

character encoding methods become clear.

Finiteness properties such as finite state machine operation or

finite length resynchronization are also requirements for

the encoding methods.

As the existing or proposed encoding methods such as ISO 2022 or

ISO 10646 can not fulfill several basic requirements,

they are not useful for the internationalized plain text processing.

Thus, a new encoding system, ICODE/IUTF is proposed based on ISO 10646.

ICODE is a 21 bit code suitable for simple processing of plain, but,

possibly bidirectional, text.

IUTF is a compact information interchange form for ICODE and is

upper compatible to UTF2, the proposed information interchange

code for ISO 10646.

.AE

.LP

KEYWORDS: Text Processing, Character Encoding, Multilingual

.bp

.ls 2

.ds CF "

.NH

Introduction

.PP

Plain text processing is the most basic form of text processing.

Moreover, for various applications, plain text

processing is often enough.

One of the most successful plain text processing system is

UNIX.

UNIX text files are assumed to have several lines

separated by newline characters.

Without assuming further structure, various tools to generate,

filter and consume plain text has been designed such as

cat, grep, wc, ls, echo, sed, tee, sort, diff and so on.

Moreover, these simple but powerful tools are combined through the pipe

mechanisms to perform more complicated processing.

It should also be noted that the command language for the shell is

plain text and the above tools could be used also for meta

level processing.

.PP

On classic UNIX, ASCII was the only available character code with

which English was represented.

Plain text processing with ASCII has been quite simple because

with ASCII and English:

.IP

there are only 128 characters.

.IP

all character can be input directly from the key board

.IP

case correspondence is regular and simple

.IP

all characters are represented by a single 8 bit byte

.IP

text is written left to right

.LP

.PP

Unfortunately, to construct an internationalized text processing

environment most of these favorable properties are lost.

That is,

.IP

there are more than 65536 characters even in a single language

such as Chinese.

.IP

some input mechanism is necessary to construct diacriticized

characters of Latin characters.

Even worse, for Japanese language input,

complex and interactive input mechanism is necessary to map the

typed pronunciation or shape hint to actual characters.

.IP

case correspondence is complex and different language by language.

For example, a capital form of 'y' with diaeresis is 'IJ' in

Danish but 'Y' or 'Y' with diaeresis in French.

.IP

Even by a single 16 bit byte, not all characters can be encoded,

so that multibyte representation is practically inevitable.

.IP

In Arabic, text may be written left to right, right to left or in

mixed direction.

.LP

.PP

While there is an endless debate on what is a character,

what we actually need is

not character encoding but a convenient method of encoding of plain text.

For that purpose, it is enough to define characters

as some unit of text encoding.

Thus, the implications of above stated differences are inspected

in section 2 for three basic operations for plain

text processing: encoding, decoding and equality comparison.

.NH

Requirements for the Internationalized Plain Text Processing

.PP

Text is a visible media.

Thus, translation between graphical representation and

the coded representation is essential for the plain

text processing.

So, the very basic operation of plain text processing

is encoding of graphically represented text to the coded representation

and decoding of the coded representation to the graphical representation.

Various plain text processing such as information interchange, concatenation,

counting and simple sorting become possible only with encoding

and decoding.

.PP

The second most important operation is equality of text which

enables search operations.

.NH 2

Universality

.PP

In this paper, a text encoding method is said to be universal for

some family of languages, if all the decoding information is self

contained in the encoding and no profiling nor negotiation is necessary to

correctly decode the text of the family of the languages.

Universality does not mean that all the languages in the world

could be encoded.

.PP

To make universal encoding/decoding possible, different characters

(whatever 'a character' means) should have different coded representations,

which does not mean that a single character can not have

more than one coded representations.

At the same time, it is also desirable that the encoded

representation is compact, which means that

a single character should have as little number of

representations as possible.

.PP

As long as encoding and decoding concern, it is not

necessary to assign multiple code points to a single

graphic form.

Thus, letters 'A' of English and 'A' of French both

in Gothic font can share

a single code point.

The problem is that a character can have several different graphic

representations.

If all the variant could be allowed and is regarded to have the same semantics

as plain text, the distinction is not necessary.

For example, the font information is not encoded in plain text.

.PP

The distinction between uppercase/lowercase

characters are qualitative, aesthetic and subjective matter.

It is perfectly legal to express English text with uppercase characters only.

On old

computers, all characters are represented in the uppercase,

because, in bad old days, someone thought case difference is not significant.

But, on UNIX, the case distinction has been considered to be

essential.

In general these days, for the computer output of plain text, type-written

or LBP-printed quality is expected.

.PP

But, sometimes, the distinction is objectively necessary for the

universality.

That is, in some context in some language, some graphical

representations of a character is disallowed.

So, it is necessary to select appropriate shapes

allowed by the context.

If such selection can not be performed mechanically, different code

points must be assigned to different graphic representations at the

time of encoding.

For example, in German with case distinction, the first

character of sentences and nouns are in capital form. While the first

character of a sentence could be, in general, mechanically identified,

it is not possible to mechanically identify a noun. Thus,

case distinction information must be encoded.

.NH 2

Causality

.PP

Because of the law of causality, decoding process can not depend on

a not-yet-happened event.

Thus, for an interactive processing, as immediate output is required,

a shape of a character can not depend on the possibly-not-yet-typed

next character.

.PP

For example, Arabic characters have different form depending on

whether the character is at the end of a word or not.

Then, if the end-of-a-word information is not encoded in the

character code, a correct display of an Arabic character

is impossible until the next character arrives.

In interactive environment, the next character might not be

typed by a user so that the waiting period is indeterminate.

So, for interactive processing, it is necessary to be able to produce

a image of a character without looking ahead the possibly-not-yet-exist

next character.

.NH 2

Finite state recognition

.PP

Causality does not prohibit displaying of characters be affected

by previous characters.

That is, the decoding process could be controlled by a stateful automaton.

Such state dependence is inevitable to detect character boundary

of multibyte characters.

But, as long as the plain text processing concerns, the state

transition should be represented with a finite state automata.

Or else many algorithm of plain text processing does not work.

Thus, if some text have more complex structure represented by, say, a context

sensitive grammer, it is not a plain text.

.NH 2

Finite resynchronizablity

.PP

When displaying characters backward or when performing binary search on

sorted text, the state of the displaying automaton

is, in general, unknown.

Moreover, in interactive environment, octets are often lost, because

of communication errors.

User interruption might also cause synchronization error.

Finite resynchronizablity means that, by reading fixed finite number of

bytes, the state of the automaton can be determined uniquely.

It should be noted that this requirement automatically implies the

finiteness of the state machine.

.NH 2

Equality

.PP

Equality of two text should be defined unambiguously, of course.

.PP

As a character might have several different coded representations,

to search some text, it is sometimes convenient that all the possible

representation are compared to be equal or there is a handy representation

for the set of all the related characters.

But it is not a strict requirement, as one can list all the

possible encoded forms, in theory, by hand.

For example, if there is a notation to specify case insensitive

comparison, it is sometimes useful.

But, one can also specify the search pattern containing two code points

for the both case.

Thus,

.DS

% grep -i abc

.DE

could be

.DS

% grep '[Aa][Bb][Cc]'

.DE

.NH 2

Summary

.PP

To summarize, the requirements for the internationalized

character encoding methods for the minimal text processing are:

.IP

Universality

.IP

Causality

.IP

Finite stateness

.IP

Finite resynchronizability

.IP

Equality

.LP

.NH

Existing Encoding Methods

.PP

Considering the requirements in section 2,

the existing encoding method for multi lingual processing

is not enough.

.NH 2

ISO 2022

.PP

ISO 2022 gives the frame work to switch between different encoding systems

by escape sequences.

Each encoding system have one or multiple, but fixed, number of octets

to represent different set of characters.

One of the major problem with ISO 2022 is that there is no unified

policy on encoding systems.

.PP

As for the requirements in section 2,

.IP Universality

Satisfied

.IP Causality

Some encoding system does not satisfy the causality

.IP "Finite stateness

Satisfied

.IP "Finite resynchronizablity

Not satisfied. The standard has several longterm states

.IP Equality

Equality between different encoding systems are not defined

.LP

It should be noted that ISO 2022 is a large standard containing

large number of states that

some profiling is necessary to specify the initial state

and the allowable combinations of escape sequences.

Finite resynchronizability could also be satisfied by profiling

but the resulting encoding method is rather lengthy.

.PP

In general, ISO 2022 is actually used widely to represent limited number

of languaged within which the encoding policy is

unified, but, it is not so useful as a general framework

for the internationalization.

.NH 2

ISO 10646

.PP

ISO 10646 was designed to be a universal coded character set (UCS).

It is actually universal in some sense. That is:

.IP 1)

it contains large number of characters

.IP 2)

it intends to represent all the characters in the world

by a simple 16 bit or 32 bit integer.

.IP 3)

Along the effort to develop the standard, character mnemonic has been

developed to be able to define equality of characters in the different

encoding systems such as those in ISO 2022.

.LP

It has three implementation levels.

In level 1, all the characters are represented by a single

16 or 32 bit integer.

In level 3, to represent complex combination of several graphic

element, combining characters are introduced,

which effectively is a multibyte representation.

In level 2, limited number of combining characters are available

to specify representations of some limited number of languages.

Still, the problem of ISO 10646 is that the standard is not so universal.

That is, it is sometimes required to have

prior negotiation or external profiling.

For example, in some cases the standard is not useful unless the

information on what language is encoded by the

standard.

For example,

corresponding Han characters in China, Japan and Korea are

assigned the common single code points (called Han unification)

in ISO 10646.

As the graphical form of Han characters in China, Japan and Korea

has developed somewhat independently,

some Han characters are now so different that a form used in

one country is considered to be a wrong form in the

other country.

So, to construct the correct graphical shapes of some Han

characters the language information is necessary.

The language information is necessary

to construct the graphical shapes of almost all Han

characters if the required quality of font is those

actually used now in each nation for plain text processing.

.PP

As the way how combining characters graphically interacts

is unspecified, it can be different language by language.

.PP

As for the requirements in section 2 with ISO 10646,

.IP Universality

Not satisfied.

With level 2 or 3,

the decoding rules on how combining characters affect

the shape of characters is not specified.

Han unification make it impossible to restore correct forms

of some Han characters.

.IP Causality

Mostly satisfied with level 1.

It contains code point for Arabic characters without

the end-of-word information mentioned in section 2.

But, as the code points with the end-of-word

information is also contained, they could be used.

Not satisfied with level 2 or 3 as combining characters

affect the shape of the previous character.

.IP "Finite stateness

Satisfied with level 1.

Seemingly not satisfied in level 2 or 3, as some combining

characters might require push

down automaton to restore a graphic form.

.IP "Finite resynchronizablity

Satisfied with level 1 if 16 bit or 32 bit is used as a byte.

Not satisfied with level 2 or 3, as, after a single base

character, there can be any

number of combining characters.

.IP Equality

Satisfied with level 1.

Not satisfied with level 2 or 3.

That is, though equality between a single 16 bit or 32 bit

character is specified, equality between sequences of

multiple characters are not specified. So, equality

between two text is undefined.

.LP

It is obvious that combining characters of level 2 and level 3

has made the entire standard rather useless.

The requirements in section 2 for ISO 10646 level 1 only is

.IP Universality

Not Satisfied

.IP Causality

Satisfied if unnecessary code points for Arabic are removed

.IP "Finite stateness

Satisfied

.IP "Finite resynchronizability

Satisfied

.IP Equality

Satisfied

.LP

.PP

Thus, ISO 10646 level 1 is not so bad that it could be a base for

the internationalized character code.

The problem is that combining characters in level 2

are necessary to represent some languages.

While level 2 allows free combination of combining characters, which

is quite harmful,

combining characters might not be so harmful if its use is strictly profiled.

Some counties actually have ISO-2022-based encoding

system with limited combination of combining characters.

The problem here is that, as such profiling will differ

language by language, there may not be a

universal way to handle all the

characters in the world.

Or, it is also possible to extend the set of characters in level 1 to contain

all the necessary combination results.

But, as such profiling of combining characters or the enumeration of

required precombined characters

is too much language specific and beyond the author's knowledge,

actual way to support level 2 characters is not discussed in this paper.

.NH

ICODE

.PP

ICODE (Internationalized CODE) is a 21 bit code defined by

adding several bits

to the coded representation of characters in of ISO 10646 level 1

(except for some duplicated code points for Arabic to satisfy Causality

requirement).

Considering that, nowadays, even personal computers have 16MB

of memory or more, a 21 bit encoding space, is quite practical even if some

array must be indexed by the character codes.

.PP

Though the ICODE is, currently, 21 bit, it will actually be used within

32 bit words on most existing machines, which does not matteer at all

as ICODE is a processing code and won't be used for the information

interchange.

IUTF, described in section 5, is provided for the interchange purpose

on communication lines or in files.

.PP

While ISO 10646 allows 32 bit representation of characters (UCS4),

it actually contains code points which can be represented

with 16 bit only.

So, the lower 16 bit of ICODE is identical to ISO 10646.

The added 4 bits are used to extend the set of characters or to

provide language separation information.

.PP

For Han characters,

the combination of four bits are assigned to the source of

characters as identified in the section 26 of ISO 10646 as follows.

.IP 0

Unused. Reserved for compatibility to ISO 10646

.IP 1

Hanzi used by GB standards

.IP 2

Hanzi used by TCA-CNS standards

.IP 3

Kanji used by JIS standards

.IP 4

Hanji used by KS standards

.IP 5~7

Reserved for further languages

.IP 8~15

Used for extension to represent non-Han characters.

.LP

For characters which does not require language information,

the added four bits contain all zeros.

If more than 4 bits are necessary to represent large number

of characters, extra bits could be added to extend ICODE

22 bit, 32 bit or more on top

of the current MSB.

.PP

The MSB of ICODE is a direction bit used to control bi-directionality.

Support for bi-directionality is absolutely necessary to

support some languages such as Arabic.

But, as bi-directionality, in general, have nested structure,

general treatment is impossible with finite-state

mechanism.

That is, the mapping between semantical order and

display order of bi-directional text needs push down

automaton.

So, for the plain text processing, in ICODE, the

display order is used.

The direction bit MSB of ICODE is used to reverse the natural

directionality of

a character.

.PP

That is, with ICODE, all the characters in a line must have the

same directionality and encoded with the display order.

If, in a line of some directionality, characters of different

directionality is needed, direction bits of the characters are set and words

with the characters are spelled backwards.

So, in English context, English are encoded with the natural

order with direction bit reset and Arabic is spelled backwards

with direction bit set.

But, in Arabic context, Arabic are encoded with the natural

order with direction bit reset and English is spelled backwards

with direction bit set.

.PP

The direction bit is also useful to control the line directionality

of text having top to bottom character directionality.

.PP

The requirements in section 2 for ICODE is

.IP Universality

Satisfied

.IP Causality

Satisfied

.IP "Finite stateness

Satisfied

.IP "Finite resynchronizability

Satisfied

.IP Equality

Satisfied

.LP

.PP

To maintain full compatibility to future extension of

ISO 10646, characters in ICODE

also have a representation as UCS4 of ISO 10646.

That is, characters with ICODE values between 0 to 65535 have the same

UCS4 values

(in the BMP), while other characters of ICODE are represented with UCS4

values between

0x7f010000 to 0x7f1fffff (in the private use zone of UCS4) by adding

0x7f000000 to the ICODE values.

Implementors are perfectly free to choose whichever representation of

characters, ICODE or UCS4.

ICODE or UCS4, here, is for processing, not for interchange and

thus its representation is not visible from the outside of programs.

It should be noted that, the two representations are

equivalent as fully ordered sets.

.NH

IUTF

.PP

IUTF (Internationalized UTF) is an interchange form for ICODE

compatible to UTF2 (UCS Transformation Format 2).

.PP

UTF2 is an ASCII compatible variable length multi octet

interchange form for ISO 10646 proposed by X/Open.

.PP

UTF2 is designed considering

.IP 1)

compatibility to UNIX file system

.IP 2)

compatibility to existing programs

.IP 3)

easy conversion between UTF2 and ISO 10646

.IP 4)

that code length can be determined by the first octet

.IP 5)

that code length is short

.IP 6)

finite resynchronizability

.PP

In UTF2, a octet is classified as

.DS

C0:0~32,127

A :33~126

Tx:128~191

T1:192~223

T2:224~239

T3:240~247

T4:248~251

T5:252~253

Ty:254~255(unused)

.DE

.PP

Then, the following combinations of octets

.DS

Octet Sequence code of ISO 10646

C0 0~32,127

A 33~126

T1 Tx 128~2047

T2 Tx Tx 2048~2^16-1

T3 Tx Tx Tx 2^16~2^21-1

T4 Tx Tx Tx Tx 2^21~2^26-1

T5 Tx Tx Tx Tx Tx 2^26~2^31-1

.DE

are used to represent characters in ISO 10646.

Resynchronization of character boundaries is possible by scanning

at most 6 characters.

.PP

Note that, with UTF2, all the characters of major European languages

can be represented

in two octets and all the existing characters of ISO 10646

can be represented in three octets.

.PP

So, IUTF is designed considering

.IP 0)

compatibility to UTF2

.IP 1)

compatibility to UNIX file system

.IP 2)

compatibility to existing programs as interchange code

.IP 3)

fast conversion between IUTF and ISO 10646

.IP 4)

that code length can be determined without looking

ahead extra octets

.IP 5)

that code length is short

.IP 6)

finite resynchronizability

.LP

that is, IUTF is upper compatible to UTF2 both in its format

and its design policy.

Note that 2) is rather meaningless condition as

processing code (ICODE, not IUTF, in this case) is used

in exsisting programs, which is also a processing model of multibyte/wide

characters of ANSI C and X/Open.

.PP

In UTF2, an octet is classified as

.DS

C0:0~32,127

A :33~126

A':33~46,48~126

C1:128~159

Tx:128~191

T1:192~223

T2:224~239(=S2+S3+S4+S6+S7)

S2:224~229

S3:230~235

S4:236~237

S6:238

S7:239

U1:240~255

.DE

Then, the following combinations of octets

.DS

Octet Sequence code of ISO 10646

C0 0~32,127

A 33~126

T1 Tx 128~4095

T2 Tx Tx 4096~65535

.DE

are used to represent characters in UTF2.

Thus, IUTF is compatible to UTF2.

Then, the following combinations of octets are

available to represent extra characters.

.DS

Octet Sequence number of code points represented

T1 A' 2976

T2 A' 1488

U1 A' 1488

U1 Tx 1024

T1 T2 512

T1 U1 512

U1 T2 256

S2 Tx A' 35712

S3 Tx A' Tx >2^21

S4 Tx A' Tx Tx >2^25

S6 Tx A' Tx Tx Tx Tx >2^36

S7 Tx A' Tx Tx Tx Tx Tx >2^42

.DE

Thus, all the character in 21 bit ICODE can be represented

with four octet form by a sequence beginning with S3.

Resynchronization of character boundaries is possible by scanning

at most 8 characters.

.PP

As IUTF have extra 8256 (= 2976 + 1488 + 1488 + 1024 + 512 + 512 + 256)

two octet representations and

35712 three octet representations, which

can be used for short hand notations of characters such as

frequently used non-European characters.

The actual assignment is not yet determined.

Hash tables could be used for the fast translation between ICODE and IUTF

for such shorthand notations.

.NH

Conclusion

.PP

By using ICODE and IUTF, fully internationalized exchange

of various languages in the world has become possible in a

unified, universal way.

.PP

International cooperation is still necessary to

extend ICODE to support characters represented by

ISO 10646 level 2 or level 3 representations

and to assign shorthand notations of IUTF.

From chon@cosmos.kaist.ac.kr Fri Sep 10 13:51:15 1993

Return-Path: <chon@cosmos.kaist.ac.kr>

Received: from han.hana.nm.kr by nic.nm.kr (4.1/SMI-4.1)

id AA08262; Fri, 10 Sep 93 13:51:15 KST

Errors-To: Postmaster@cosmos.kaist.ac.kr

Received: from cosmos.kaist.ac.kr by han.hana.nm.kr (4.1/KUM-0.1)

id AA18408; Fri, 10 Sep 93 13:49:26 KST

Received: by cosmos.kaist.ac.kr (4.1/SMI-4.1)

id AA20669; Fri, 10 Sep 93 13:45:20 KST

Date: Fri, 10 Sep 93 13:45:20 KST

From: chon@cosmos.kaist.ac.kr (Kilnam Chon)

Message-Id: <9309100445.AA20669@cosmos.kaist.ac.kr>

Errors-To: Postmaster@cosmos.kaist.ac.kr

To: ap-i18n@nic.nm.kr

Subject: next meeting

i would like to see initial, good discussion on internationalization/

localization(i.e., local language support) at the next apccirn meeting in

taipei in 1993.12.10-11. since this is the first time to address on the

local language support at apccirn, i would like to see comprehensive

presentation on status reports of several leading countries such as


Japanese, Korean, Chinese, Thai

do you have good idea who to make the comprehensive presentation of each

language/country?

the above presentations may be followed by development of the issue list

for us to focus for the next years such as


unicode

internationalized(generic) network software packages

(others)

i am looking forward to seeing good discussions on the above matters.

kilnam chon

From @IBM3090.snu.ac.kr:WSCHEN@TWNMOE10.BITNET Tue Sep 21 15:13:45 1993

Return-Path: <@IBM3090.snu.ac.kr:WSCHEN@TWNMOE10.BITNET>

Received: from ercc.snu.ac.kr by nic.nm.kr (4.1/SMI-4.1)

id AA03505; Tue, 21 Sep 93 15:13:45 KST

Errors-To: Postmaster@IBM3090.snu.ac.kr

Received: from IBM3090.snu.ac.kr by ercc.snu.ac.kr (4.1/SMI-4.1)

id AA11787; Tue, 21 Sep 93 15:13:44 KST

Message-Id: <9309210613.AA11787@ercc.snu.ac.kr>

Received: from KRSNUCC1.BITNET by IBM3090.snu.ac.kr (IBM VM SMTP R1.2.1) with BSMTP id 4010; Tue, 21 Sep 93 15:09:57 EXP

Received: from TWNMOE10.edu.tw by KRSNUCC1.BITNET (Mailer R2.08) with BSMTP id

9684; Tue, 21 Sep 93 14:51:30 EXP

Received: by TWNMOE10 (Mailer R2.10 ptf000) id 0765;

Tue, 21 Sep 93 13:51:16 EST

Date: Tue, 21 Sep 93 13:45:04 EST

From: Wen-Sung Chen <WSCHEN%TWNMOE10@IBM3090.snu.ac.kr>

Subject: Re: next meeting

To: Kilnam Chon <chon@cosmos.kaist.ac.kr>, ap-i18n@nic.nm.kr

In-Reply-To: Your message of Fri, 10 Sep 93 13:45:20 KST

On Fri, 10 Sep 93 13:45:20 KST you said:

>i would like to see initial, good discussion on internationalization/

>localization(i.e., local language support) at the next apccirn meeting in

>taipei in 1993.12.10-11. since this is the first time to address on the

>local language support at apccirn, i would like to see comprehensive

>presentation on status reports of several leading countries such as

> Japanese, Korean, Chinese, Thai

>do you have good idea who to make the comprehensive presentation of each

>language/country?

>the above presentations may be followed by development of the issue list

>for us to focus for the next years such as

> unicode

> internationalized(generic) network software packages

> (others)

>i am looking forward to seeing good discussions on the above matters.

We would like to arrange a chinese localization presentation

in APCCIRN(Taipei). This presentation will be prepared by

expert of III, Taiwan.

Topic: Chinese Localization and SUCCESS project

1. What is SUCCESS project

2. The current chinese codes

3. The problem with different chinese codes

4. The problem with chinese input

5. The future ?

Any comments?

Wen-Sung Chen (wschen@twnmoe10.bitnet)

(wschen@twnmoe10.edu.tw)

Computer Center, Ministry of Education Phone #: 011-886-2-7377011

Taipei, Taiwan, R.O.C. Fax #: 011-886-2-7377043

From chon@cosmos.kaist.ac.kr Tue Sep 21 15:26:05 1993

Return-Path: <chon@cosmos.kaist.ac.kr>

Received: from han.hana.nm.kr by nic.nm.kr (4.1/SMI-4.1)

id AA03559; Tue, 21 Sep 93 15:26:05 KST

Errors-To: Postmaster@cosmos.kaist.ac.kr

Received: from cosmos.kaist.ac.kr by han.hana.nm.kr (4.1/KUM-0.1)

id AA21034; Tue, 21 Sep 93 15:24:24 KST

Received: by cosmos.kaist.ac.kr (4.1/SMI-4.1)

id AA09592; Tue, 21 Sep 93 15:19:55 KST

From: chon@cosmos.kaist.ac.kr (Kilnam Chon)

Message-Id: <9309210619.AA09592@cosmos.kaist.ac.kr>

Errors-To: Postmaster@cosmos.kaist.ac.kr

Subject: Re: next meeting (fwd)

To: ap-i18n@nic.nm.kr

Date: Tue, 21 Sep 93 15:19:54 KST

X-Mailer: ELM [version 2.3 PL11]

Wen-Sung Chen writes:

>From @IBM3090.snu.ac.kr:WSCHEN@TWNMOE10.BITNET Tue Sep 21 15:07:20 1993

>Errors-To: Postmaster@cosmos.kaist.ac.kr

>Message-Id: <9309210613.AA11787@ercc.snu.ac.kr>

>Date: Tue, 21 Sep 93 13:45:04 EST

>From: Wen-Sung Chen <WSCHEN%TWNMOE10@IBM3090.snu.ac.kr>

>Subject: Re: next meeting

>To: Kilnam Chon <chon@cosmos.kaist.ac.kr>, ap-i18n@nic.nm.kr

>In-Reply-To: Your message of Fri, 10 Sep 93 13:45:20 KST

>

>On Fri, 10 Sep 93 13:45:20 KST you said:

>>i would like to see initial, good discussion on internationalization/

>>localization(i.e., local language support) at the next apccirn meeting in

>>taipei in 1993.12.10-11. since this is the first time to address on the

>>local language support at apccirn, i would like to see comprehensive

>>presentation on status reports of several leading countries such as

>> Japanese, Korean, Chinese, Thai

>>do you have good idea who to make the comprehensive presentation of each

>>language/country?

>>the above presentations may be followed by development of the issue list

>>for us to focus for the next years such as

>> unicode

>> internationalized(generic) network software packages

>> (others)

>>i am looking forward to seeing good discussions on the above matters.

>

>We would like to arrange a chinese localization presentation

>in APCCIRN(Taipei). This presentation will be prepared by

>expert of III, Taiwan.

> Topic: Chinese Localization and SUCCESS project

> 1. What is SUCCESS project

> 2. The current chinese codes

> 3. The problem with different chinese codes

> 4. The problem with chinese input

> 5. The future ?

>

>Any comments?

>

>Wen-Sung Chen (wschen@twnmoe10.bitnet)

> (wschen@twnmoe10.edu.tw)

>Computer Center, Ministry of Education Phone #: 011-886-2-7377011

>Taipei, Taiwan, R.O.C. Fax #: 011-886-2-7377043

>

my idea is to have overview presentation of local language support in Chinese,

Japanese, Korean and other languages as appropriate followed by discussion on

possible cooperation/collaboration in this area. the above presentation would

be a good contribution on this matter.

kilnam chon

From uhhyung Wed Sep 22 16:24:29 1993

Return-Path: <uhhyung>

Received: by nic.nm.kr (4.1/SMI-4.1)

id AA10124; Wed, 22 Sep 93 16:24:29 KST

From: uhhyung (Uhhyung Choi)

Message-Id: <9309220724.AA10124@nic.nm.kr>

Errors-To: Postmaster

Subject: Re: next meeting (fwd)

To: chon@cosmos.kaist.ac.kr (Kilnam Chon)

Date: Wed, 22 Sep 1993 16:24:27 +0900 (KST)

Cc: ap-i18n@nic.nm.kr

In-Reply-To: <9309210619.AA09592@cosmos.kaist.ac.kr> from "Kilnam Chon" at Sep 21, 93 03:19:54 pm

X-Mailer: ELM [version 2.4 PL21-h3]

Mime-Version: 1.0

Content-Type: text/plain; charset=US-ASCII

Content-Transfer-Encoding: 7bit

Content-Length: 644

As Kilnam Chon writes:

*

* my idea is to have overview presentation of local language support

* in Chinese, Japanese, Korean and other languages as appropriate followed

* by discussion on possible cooperation/collaboration in this area.

* the above presentation would be a good contribution on this matter.

*

* kilnam chon

I'm planning to make a presentation on Korean localization efforts and status

in the upcoming APCCIRN meeting in Taipei. It would be a lot more productive

session if every presentation could address cooperation/collaboration issues

such as Unicode etc.

--

Uhhyung Choi

Korea Network Information Center

From uhhyung Wed Sep 22 18:34:03 1993

Return-Path: <uhhyung>

Received: by nic.nm.kr (4.1/SMI-4.1)

id AA11270; Wed, 22 Sep 93 18:34:03 KST

From: uhhyung (Uhhyung Choi)

Message-Id: <9309220934.AA11270@nic.nm.kr>

Errors-To: Postmaster

Subject: Presentation at the next APCCIRN

To: mohta@cc.titech.ac.jp

Date: Wed, 22 Sep 1993 18:34:01 +0900 (KST)

Cc: apccirn-i18n@nic.nm.kr

X-Mailer: ELM [version 2.4 PL21-h3]

Mime-Version: 1.0

Content-Type: text/plain; charset=US-ASCII

Content-Transfer-Encoding: 7bit

Content-Length: 2483

M. Ohta San,

I wonder if you can attend the APCCIRN meeting and present current status

of Japan. And could you please send me(or to the list) the latest draft of

your paper to be presented in upcoming JWCC in Taipei?

Issue list, I think, each presenter should address and we can discuss at the

meeting includes(but not limited to):

localization profile

currently supported localized network softwares

(possibility of joint effort for internationalized

network softwares.)

ongoing efforts and future plans

i.e. strategy for Unicode, ISO/IEC10646

Current list of presenter as follows

?(arranged by Wen-Sung Chen) Taiwan

Uhhyung Choi Korea

Masataka Ohta(?) Japan

? Thai

Any comments?

--

Uhhyung Choi

Korea Network Information Center

P.S. I'm forwarding this mail on Chinese Localization for your information.

--------

As (Sam Shiu) writes:

>From shiu@cs.cuhk.hk Mon Sep 20 19:29:36 1993

>Errors-To: Postmaster@cosmos.kaist.ac.kr

>Message-Id: <9309200859.AA01092@hanzix4.cs.cuhk.hk.cs-sun>

>To: nangu@sm.sony.co.jp, jisyoon@cosmos.kaist.ac.kr,

> johnnie@dascohk.attmail.com

>Subject: Hanzix

>Date: Mon, 20 Sep 1993 16:59:01 +0800

>From: Sam Shiu <shiu@cs.cuhk.hk>

>

>

>Hi, how are you ? My name is Sam Shiu. I am the manager of Hanzix, a

>joint effort by CUHK(Chinese University of Hong Kong), CAS(Chinese Academy

>of Science) of Beijing and III(Institute of Information Industry) of

>Taiwan, dedicated to the development and promotion of a standardised

>Open System for Chinese Computing. Currently, we are working on serveral items

>which may interest you.

> - National Profile of locales & charmap for mainland China

> - National Profile for Taiwan

> - Standard interface to input methods

> - Interim-Hanizx, an operating system built on Unix which supports

> I18N and L10N for Chinese Computing based on ISO 10646.

> Some of the highlights include a file announcement mechanism and

> codeset conversion utilities

>

>We are planning to start a Hanzix work group involving industry and

>research organizations where we can work together on an Open System for

>Chinese Computing.

>I am composing a list of contacts who are interested in our work

>especially those from HK. Would you please let me know if you are

>interested ?

>Regards,

>

>Sam Shiu, email : shiu@cs.cuhk.hk

>Manager, Hanzix Tel : (825) 609-8436

> Fax : (825) 603-5024

From mohta@cc.titech.ac.jp Fri Sep 24 17:12:18 1993

Received: from rc.cc.titech.ac.jp by nic.nm.kr (4.1/SMI-4.1)

id AA21720; Fri, 24 Sep 93 17:12:18 KST

Errors-To: Postmaster@nic.nm.kr

Received: from cc.titech.ac.jp (titcce.cc.titech.ac.jp) by rc.cc.titech.ac.jp (5.65+1.5W/r2TM)

id AA13437; Fri, 24 Sep 93 17:10:21 JST

Received: by cc.titech.ac.jp (5.61/cce-1.5/TM)

id AA11056; Fri, 24 Sep 93 17:04:11 +0900

From: Masataka Ohta <mohta@cc.titech.ac.jp>

Return-Path: <mohta@cc.titech.ac.jp>

Message-Id: <9309240804.AA11056@cc.titech.ac.jp>

Subject: Re: Presentation at the next APCCIRN

To: uhhyung@nic.nm.kr (Uhhyung Choi)

Date: Fri, 24 Sep 1993 17:04:07 +0900 (JST)

Cc: mohta@cc.titech.ac.jp, apccirn-i18n@nic.nm.kr

In-Reply-To: <9309220934.AA11270@nic.nm.kr> from "Uhhyung Choi" at Sep 22, 93 06:34:01 pm

X-Mailer: ELM [version 2.4 PL21]

Content-Type: text

Content-Length: 1051

> M. Ohta San,

Sorry for the delayed answer. I have been on vacation.

> I wonder if you can attend the APCCIRN meeting and present current status

> of Japan.

Sure.

> And could you please send me(or to the list) the latest draft of

> your paper to be presented in upcoming JWCC in Taipei?

I think I sent the latest one to the list one or two month ago.

Didn't you received that? Since then, I have changed nothing yet.

> localization profile

My opinion is that while language dependent localization at application

level is necessary, localization of the character set is the major

obstacle to the universality.

> ?(arranged by Wen-Sung Chen) Taiwan

> Uhhyung Choi Korea

> Masataka Ohta(?) Japan

> ? Thai

>

> Any comments?

I'm quite interested in how people in Thai think about the ISO10646,

because the fully duplexed interactive processing of plain Thai text

is, it seems to me, impossible with ISO 10646.

> P.S. I'm forwarding this mail on Chinese Localization for your information.

Thanks. I'll contact him.

Masataka Ohta

From trin@nwg.nectec.or.th Sat Sep 25 01:52:57 1993

Return-Path: <trin@nwg.nectec.or.th>

Received: from munnari.oz.au by nic.nm.kr (4.1/SMI-4.1)

id AA23551; Sat, 25 Sep 93 01:52:57 KST

Errors-To: Postmaster@nic.nm.kr

Received: from nwg.nectec.or.th by munnari.oz.au with SMTP (5.83--+1.3.1+0.50)

id AA01795; Fri, 24 Sep 1993 23:42:11 +1000 (from trin@nwg.nectec.or.th)

From: trin@nwg.nectec.or.th (Trin Tantsetthi)

Message-Id: <9309241340.AA38240@nwg.nectec.or.th>

To: Masataka Ohta <mohta@cc.titech.ac.jp>

Cc: apccirn-i18n@nic.nm.kr

Subject: Re: Presentation at the next APCCIRN

In-Reply-To: Your message of Fri, 24 Sep 93 17:04:07 V.

<9309240804.AA11056@cc.titech.ac.jp>

Date: Fri, 24 Sep 93 20:40:27 +0700

I won't be able to attend the meeting in Taipei. While I'm not positive

that there will be a representative from Thailand, I hope the discussion

won't stop there.

Ohta-san wrote:

>I'm quite interested in how people in Thai think about the ISO10646,

>because the fully duplexed interactive processing of plain Thai text

>is, it seems to me, impossible with ISO 10646.

As far as character set is concerned, it looks okay. Thailand objected

5 "matras" proposed in Unicode 1.0 (U+0E70 thru U+0E74) and ISO10646

dropped these code points. A big issue which has not been resolved is

encoding.

Thai employs combining marks. A cell (graphic character which is bounded

by a rectangular real estate of the output device) may have multiple

"characters" (which is defined as atomic entity in the script). In general,

a cell contains one base character (rendered on the base line) and optional

combining marks (rendered above or below the base character). Since there

might be multiple combining marks, leaving encoding order of them with

a high degree of freedom (i.e. implementation specific) can be dangerous.

For instance, the word "recover" can be encoded as <U+0E01><U+0E39><U+0E49>

or <U+0E01><U+0E49><U+0E39> according to ISO10646 (sect 23.1) and Unicode

1.0 (pages 627-628).

If one performs data entry using one order and another person performs

record search using query key entered with the second order, database

engine might just report "Record not found".

Looking from another angle, this might be classified as input method

issue. IMO, ISO10646 is a done deal. The chance to impose ISO10646 to

include so many Thai-specific information (on encoding) is minimal.

Standard Thai encoding will be announced as a national standard. It is

still in the pipeline of formality. In a few week from now, a new draft

RFC on Thai encoding will be posted to the ietf-charsets mailing list.

This will be an informational RFC, like iso2022-jp. A mapping table for

Thai has been sent to author of the upcoming RFC1345bis. An application

area director of the IETF also suggested that Thailand registers Thai

as part of the ISO 8859 family. This is still under consideration.

Thai keysym has been proposed to the X Consortium.

As far as i18n is concerned, I have a feeling that character set experts

put, perhaps, too much emphasis on code point and encoding. A big missing

piece in order to complete the i18n vision has not been discussed in

the level of detail I wish. That piece is i18n common runtime library.

IMO, XPG4 still have a long way to go to achieve true i18n goal. It does

not seems to handle combining marks and "Indic" well.

Wouldn't it be nice if APCCRIN-I18N could come up with a proposal of

run-time service/API, either new or as an extension of existing API, that

could cover most if not all languages. In my view, Asia/Pacific Rim

has the most diversity in term of script/language requirements. Our

requirements are so different. If we can't settle i18n requirements

among ourselves, let's trash the hope that true i18n environment would

become a reality.

I guess that's all for progress from Thailand. Comments warmly welcome.

Regards,

Trin

From mohta@necom830.cc.titech.ac.jp Sat Sep 25 23:36:39 1993

Received: from daiduk.kaist.ac.kr by nic.nm.kr (4.1/SMI-4.1)

id AA26183; Sat, 25 Sep 93 23:36:39 KST

Errors-To: Postmaster@nic.nm.kr

Received: from necom830.cc.titech.ac.jp by daiduk.kaist.ac.kr (4.1/KAISTNet-Relay-3.2)

id AA24738; Sat, 25 Sep 93 23:40:45 KST

Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Sat, 25 Sep 93 23:28:35 +0900

From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>

Return-Path: <mohta@necom830.cc.titech.ac.jp>

Message-Id: <9309251428.AA00882@necom830.cc.titech.ac.jp>

Subject: Re: Presentation at the next APCCIRN

To: trin@nwg.nectec.or.th (Trin Tantsetthi)

Date: Sat, 25 Sep 93 23:28:34 JST

Cc: mohta@cc.titech.ac.jp, apccirn-i18n@nic.nm.kr

In-Reply-To: <9309241340.AA38240@nwg.nectec.or.th>; from "Trin Tantsetthi" at Sep 24, 93 8:40 pm

X-Mailer: ELM [version 2.3 PL11]

> I won't be able to attend the meeting in Taipei. While I'm not positive

> that there will be a representative from Thailand, I hope the discussion

> won't stop there.

We can continue the discussion with mail, of course.

> Ohta-san wrote:

> >I'm quite interested in how people in Thai think about the ISO10646,

> >because the fully duplexed interactive processing of plain Thai text

> >is, it seems to me, impossible with ISO 10646.

>

> As far as character set is concerned, it looks okay. Thailand objected

> 5 "matras" proposed in Unicode 1.0 (U+0E70 thru U+0E74) and ISO10646

> dropped these code points. A big issue which has not been resolved is

> encoding.

Agreed.

> Thai employs combining marks. A cell (graphic character which is bounded

> by a rectangular real estate of the output device) may have multiple

> "characters" (which is defined as atomic entity in the script). In general,

> a cell contains one base character (rendered on the base line) and optional

> combining marks (rendered above or below the base character). Since there

> might be multiple combining marks, leaving encoding order of them with

> a high degree of freedom (i.e. implementation specific) can be dangerous.

As long as we use batch or half duplexed environment, that is the only

problem.

The problem with 10646 for Thai is in fully duplexed interactive processing.

> For instance, the word "recover" can be encoded as <U+0E01><U+0E39><U+0E49>

> or <U+0E01><U+0E49><U+0E39> according to ISO10646 (sect 23.1) and Unicode

> 1.0 (pages 627-628).

What happens if <U+0E01> is received? Should it be displayed immediately?

The problem is identified in my JWCC paper as the causality problem.

> If one performs data entry using one order and another person performs

> record search using query key entered with the second order, database

> engine might just report "Record not found".

>

> Looking from another angle, this might be classified as input method

> issue. IMO, ISO10646 is a done deal. The chance to impose ISO10646 to

> include so many Thai-specific information (on encoding) is minimal.

IMHO, anything which requires so much language specific information

is not universal.

So, I think we develop something new by ourselves based on 10646.

> Standard Thai encoding will be announced as a national standard. It is

> still in the pipeline of formality. In a few week from now, a new draft

> RFC on Thai encoding will be posted to the ietf-charsets mailing list.

Though it will share the same causality problem, it does not matter

for MIME, because mail processing is done as batch.

> As far as i18n is concerned, I have a feeling that character set experts

> put, perhaps, too much emphasis on code point and encoding.

I disagree. Code points can be anything, but, encoding is important.

It's you who said:

> A big issue which has not been resolved is

> encoding.

> A big missing

> piece in order to complete the i18n vision has not been discussed in

> the level of detail I wish. That piece is i18n common runtime library.

> IMO, XPG4 still have a long way to go to achieve true i18n goal. It does

> not seems to handle combining marks and "Indic" well.

>

> Wouldn't it be nice if APCCRIN-I18N could come up with a proposal of

> run-time service/API, either new or as an extension of existing API, that

> could cover most if not all languages. In my view, Asia/Pacific Rim

> has the most diversity in term of script/language requirements. Our

> requirements are so different. If we can't settle i18n requirements

> among ourselves, let's trash the hope that true i18n environment would

> become a reality.

I don't think we can expect such a library contain too much Thai specific

specification. So we need really universal encoding which does not contain

much language specific features.

Also, to be able to figure out a reasonable common runtime library, the

encoding should have several aesthetical properties: those such as

described in my JWCC paper. If you also miss the paper, I will remail

the paper, here.

To my knowledge, Thai and Indic can be processed just as easily as

European languages by encoding them with several thousands of precombined

characters.

To process ancient Hangul characters in the same fashion, about a half

mega precombined characters are necessary.

The encoding space will be 21 or 22 bits.

But, does that matter? The font of most characters can be synthesized at

run time, of course.

Then, we will be able to have a unified library routines to process,

to my knowledge, all the characters in the world.

Masataka Ohta

From uhhyung Mon Oct 4 12:40:49 1993

Return-Path: <uhhyung>

Received: by nic.nm.kr (4.1/SMI-4.1)

id AA08041; Mon, 4 Oct 93 12:40:49 KST

From: uhhyung (Uhhyung Choi)

Message-Id: <9310040340.AA08041@nic.nm.kr>

Subject: Comments on your paper

To: mohta@cc.titech.ac.jp

Date: Mon, 4 Oct 1993 12:40:48 +0900 (KST)

Cc: apccirn-i18n@nic.nm.kr

X-Mailer: ELM [version 2.4 PL21-h3]

Mime-Version: 1.0

Content-Type: text/plain; charset=US-ASCII

Content-Transfer-Encoding: 7bit

Content-Length: 1269

Masataka,

Here goes my 2-cent-worth comment on your paper. First of all, there

seems to be a typo in describing the use of four additional bits for

Han characters. The right term for Han characters used by KS standards

is "Hanja".

I understand USC2 fails to qualify the criteria you mentioned in the

paper, but ICODE would not be acceptable unless you have provisions

for supporting at least strict USC2 level 2 and enough justification

for the proposed method.

1. Is the bidirectional display mode bit really nessecery to be included

in the charset definition? Can't it be treated as a regional matter?

2. As for the rendering problem of Han characters, how about designing

a renderer so that it can display the equivalent shape of the character

from current locale information?

3. What do you think of the comments from the UCS BOF that your solution

is not in the general stream of the development of the standard character

set codes and their applications in the computing systems.

I think we should try to feedback proposed solutions and enhancements in

depolyment issues of 10646, and profiling is an unevitable solution to

presumed weakness of UCS. Possibily a comman Asia-Pacific profile?

--

Uhhyung Choi

Korea Network Information Center

From mohta@necom830.cc.titech.ac.jp Mon Oct 4 18:32:21 1993

Received: from necom830.cc.titech.ac.jp by nic.nm.kr (4.1/SMI-4.1)

id AA12511; Mon, 4 Oct 93 18:32:21 KST

Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Mon, 4 Oct 93 18:24:13 +0900

From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>

Return-Path: <mohta@necom830.cc.titech.ac.jp>

Message-Id: <9310040924.AA04641@necom830.cc.titech.ac.jp>

Subject: Re: Comments on your paper

To: uhhyung@nic.nm.kr (Uhhyung Choi)

Date: Mon, 4 Oct 93 18:24:11 JST

Cc: mohta@cc.titech.ac.jp, apccirn-i18n@nic.nm.kr

In-Reply-To: <9310040340.AA08041@nic.nm.kr>; from "Uhhyung Choi" at Oct 4, 93 12:40 pm

X-Mailer: ELM [version 2.3 PL11]

> Here goes my 2-cent-worth comment on your paper.

Thank you, very much.

> First of all, there

> seems to be a typo in describing the use of four additional bits for

> Han characters. The right term for Han characters used by KS standards

> is "Hanja".

Oops... Sorry.

> I understand USC2 fails to qualify the criteria you mentioned in the

> paper, but ICODE would not be acceptable unless you have provisions

> for supporting at least strict USC2 level 2 and enough justification

> for the proposed method.

Provisions for USC2 level 2 as is is, as I have proved, unacceptable as an

internationalized plain text encoding method for interaractive use. OK?

Still, provisions for the encoding of text represented with USC2 level 2

is possible, as ICODE has much extra encoding space, even in its 21 bit

form.

For example, encoding of all the possible combination of ancient Hangul

requires only 0.5 mega code points.

Encoding of Thai and Devanagari characters as precombined characters

requires several thouthands of code points only, I think. The resulting

encoding will be much shorter than the one in ISO level 2.

As for the justification, I'd be glad if someone show any other

requirement which an internationalized plain text encoding method for

interaractive use should satisfy.

It should be noted that IUTF is upper compatible to UTF-2 and can

provide much shorter representation for frequently used Asian characters.

> 1. Is the bidirectional display mode bit really nessecery to be included

> in the charset definition?

What does "charset definition" mean? I don't know but I don't mind.

The bit is necessary to make text encoding finitely resynchronizable.

I don't mind at all whether you might call the resulting encoding

method "charset definition" or not.

> Can't it be treated as a regional matter?

WHAT!!!!????? Do you think there can be "a regional matter" in an

internationalized plain text encoding?

People who use Arabic use bidirectionality in thier plain text.

If there can be, we don't need any common encoding method. Anyone

can use thier domestic encoding such as existing ISO 2022 with

implicit announcers and call it "international" because the difference

is "a regional matter".

That' why I defined "universality".

Suppose two Arabic users, A in Korea, B in France tried to communicate

each other. What encoding should they use? What is the proper "region"

to be used as "a regional matter"? Can we expect each Arabic users, most

of them are not expert of computers nor linguistics, know all the possible

encoding of Arabic? What happen if the third person, C in Brazil, who can not

read Arabic at all, tries to relay the message adding a short English comment

at the top of the message?

Well, actually, ICODE makes the bidirectionality somewhat regional, that

is, if an implementor want to drop the support for it, he can. The

direction bit is necessary only when the support for the bidirectionnal

text is necessary. But dropping the support for it only to make it 20

bit is, I think, quite meaningless.

> 2. As for the rendering problem of Han characters, how about designing

> a renderer so that it can display the equivalent shape of the character

> from current locale information?

Locale dependence makes the encoding not universal as an internationalized

encoding.

ISO 10646 allows us to have text which contain English, German and French

at the same time, which was impossible with ISO 8859.

Isn't an minimal requirement to internationalized encoding is to allow us

to have text which contain Chinese, Japanese, Korean and any other

languages at the same time?

Don't you want fairness to international things?

> 3. What do you think of the comments from the UCS BOF that your solution

> is not in the general stream of the development of the standard character

> set codes and their applications in the computing systems.

The comment was in charsets ML, not from UCS BOF.

In the ML, no one has shown the definition of "general stream of the

development".

Thouhg I'm not sure what is the definition, I have also shown that there

can be no single encoding method which could be thought ot be in the

"general stream of the development".

Moreover, I have show, in my paper, that both ISO 2022 and ISO 10646 are

inappropriate as an internationalized plain text encoding method for

interactive use.

So, could you tell me what, do you think, the "general stream of

the development" means, at least?

> I think we should try to feedback proposed solutions and enhancements in

> depolyment issues of 10646, and profiling is an unevitable solution to

> presumed weakness of UCS. Possibily a comman Asia-Pacific profile?

No. It is as bad as ISO 2022, then.

Instead, there should be a internationally single profile, which should be

called the universal encoding.

Then, we will be free from specifying profiles.

Masataka Ohta

From mohta@necom830.cc.titech.ac.jp Mon Dec 20 09:27:52 1993

Received: from necom830.cc.titech.ac.jp ([131.112.4.4]) by nic.nm.kr (4.1/SMI-4.1)

id AA01613; Mon, 20 Dec 93 09:27:52 KST

Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Mon, 20 Dec 93 09:18:54 +0859

From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>

Return-Path: <mohta@necom830.cc.titech.ac.jp>

Message-Id: <9312200019.AA21974@necom830.cc.titech.ac.jp>

Subject: Interoperable Localizaion/Internationalization

To: ietf-822@dimacs.rutgers.edu, ietf-charsets@innosoft.com,

apccirn-i18n@nic.nm.kr

Date: Mon, 20 Dec 93 9:18:51 JST

X-Mailer: ELM [version 2.3 PL11]

Attached is a memo of ISO-2022-JP-2 encoding sent to the RFC editor

just recently.

At the APCCIRN (Asia Pasific CCIRN) meeting of early December in Taiwan,

it was decided to merge

ISO-2022-JP-2 in Japan

ISO-2022-KR in Korea

CNS in Taiwan

to develop

ISO-2022-INT-1

as a standard track text encoding method of the Internet for which

I am acting as a coordinator.

It is an attempt to merge various interoperale localizations.

It is also intended to further develop:

ISO-2022-INT-2

ISO-2022-INT-3

etc. in a timely fashion.

Any comments?

Masataka Ohta

PS

Please reply to appropriate mailing lists only.

------------------------------------------------------------------------

Network Working Group M. Ohta

Request for Comments: nnnn Tokyo Institute of Technology

Category: Informational K. Handa

ETL

28 November 1993

ISO-2022-JP-2: Multilingual Extension of ISO-2022-JP

Status of this Memo

This memo provides information for the Internet community. This memo

does not specify an Internet standard of any kind. Distribution and

translation of this memo is unlimited.

Introduction

This memo describes a text encoding scheme: "ISO-2022-JP-2", which is

used experimentally for electronic mail [RFC822] and network news

[RFC1036] messages in several Japanese networks. The encoding is a

multilingual extension of "ISO-2022-JP", the existing encoding for

Japanese [2022JP]. The encoding is supported by an Emacs based

multilingual text editor: MULE [MULE].

The name, "ISO-2022-JP-2", is intended to be used in the "charset"

parameter field of MIME headers (see [MIME1] and [MIME2]).

Description

The text with "ISO-2022-JP-2" starts in ASCII [ASCII], and switches

to other character sets of ISO 2022 [ISO2022] through limited

combinations of escape sequences. All the characters are encoded

with 7 bits only.

At the beginning of text, the existence of an announcer sequence:

"ESC 2/0 4/1 ESC 2/0 4/6 ESC 2/0 5/10" is (though omitted) assumed.

Thus, characters of 94 character sets are designated to G0 and

invoked as GL. C1 control characters are represented with 7 bits.

Characters of 96 character sets are designated to G2 and invoked with

SS2 (single shift two, "ESC 4/14" or "ESC N").

For example, the escape sequence "ESC 2/4 2/8 4/3" or "ESC $ ( C"

indicates that the bytes following the escape sequence are Korean KSC

characters, which are encoded in two bytes each. The escape sequence

"ESC 2/14 4/1" or "ESC . A" indicates that ISO 8859-1 is designated

to G2. After the designation, the single shifted sequence "ESC 4/14

4/1" or "ESC N A" is interpreted to represent a character "A with

acute".

Ohta & Handa [Page 1]

.

RFC nnnn ISO-2022-JP-2 28 November 1993

The following table gives the escape sequences and the character sets

used in "ISO-2022-JP-2" messages. The reg# is the registration number

in ISO's registry [ISOREG].

94 character sets

reg# character set ESC sequence designated to

------------------------------------------------------------------

6 ASCII ESC 2/8 4/2 ESC ( B G0

42 JIS X 0208-1978 ESC 2/4 4/0 ESC $ @ G0

87 JIS X 0208-1983 ESC 2/4 4/2 ESC $ B G0

14 JIS X 0201-Roman ESC 2/8 4/10 ESC ( J G0

58 GB2312-1980 ESC 2/4 4/1 ESC $ A G0

149 KSC5601-1987 ESC 2/4 2/8 4/3 ESC $ ( C G0

159 JIS X 0212-1990 ESC 2/4 2/8 4/4 ESC $ ( D G0

96 character sets

reg# character set ESC sequence designated to

------------------------------------------------------------------

100 ISO8859-1 ESC 2/14 4/1 ESC . A G2

126 ISO8859-7(Greek) ESC 2/14 4/6 ESC . F G2

For further information about the character sets and the escape

sequences, see [ISO2022] and [ISOREG].

If there is any G0 designation in text, there must be a switch to

ASCII or to JIS X 0201-Roman before a space character (but not

necessarily before "ESC 4/14 2/0" or "ESC N ' '") or control

characters such as tab or CRLF. This means that the next line starts

in the character set that was switched to before the end of the

previous line. Though the designation to JIS X 0201-Roman is allowed

for backward compatibility to "ISO-2022-JP", its use is discouraged.

Applications such as pagers and editors which randomly seek within a

text file encoded with "ISO-2022-JP-2" may assume that all the lines

begin with ASCII, not with JIS X 0201-Roman.

At the beginning of a line, information on G2 designation of the

previous line is cleared. New designation must be given before a

character in 96 character sets is used in the line.

The text must end in ASCII designated to G0.

As the "ISO-2022-JP", and thus, "ISO-2022-JP-2", is designed to

represent English and modern Japanese, left-to-right directionality

is assumed if the text is displayed horizontally.

Users of "ISO-2022-JP-2" must be aware that some common transport

such as old Bnews can not relay a 7-bit value 7/15 (decimal 127),

which is used to encode, say, "y with diaeresis" of ISO 8859-1.

Ohta & Handa [Page 2]

.

RFC nnnn ISO-2022-JP-2 28 November 1993

Other restrictions are given in the Formal Syntax section below.

Formal Syntax

The notational conventions used here are identical to those used in

RFC 822 [RFC822].

The * (asterisk) convention is as follows:

l*m something

meaning at least l and at most m somethings, with l and m taking

default values of 0 and infinity, respectively.

message = headers 1*(CRLF text)

; see also [MIME1] "body-part"

; note: must end in ASCII

text = *(single-byte-char /

g2-desig-seq /

single-shift-char)

[*segment

reset-seq

*(single-byte-char /

g2-desig-seq /

single-shift-char ) ]

; note: g2-desig-seq must

; precede single-shift-char

headers = <see [RFC822] "fields" and [MIME1] "body-part">

segment = single-byte-segment / double-byte-segment

single-byte-segment = single-byte-seq

*(single-byte-char /

g2-desig-seq /

single-shift-char )

double-byte-segment = double-byte-seq

*((one-of-94 one-of-94) /

g2-desig-seq /

single-shift-char )

reset-seq = ESC "(" ( "B" / "J" )

single-byte-seq = ESC "(" ( "B" / "J" )

double-byte-seq = (ESC "$" ( "@" / "A" / "B" )) /

Ohta & Handa [Page 3]

.

RFC nnnn ISO-2022-JP-2 28 November 1993

(ESC "$" "(" ( "C" / "D" ))

g2-desig-seq = ESC "." ( "A" / "F" )

single-shift-seq = ESC "N"

single-shift-char = single-shift-seq one-of-96

CRLF = CR LF

; ( Octal, Decimal.)

ESC = <ISO 2022 ESC, escape> ; ( 33, 27.)

SI = <ISO 2022 SI, shift-in> ; ( 17, 15.)

SO = <ISO 2022 SO, shift-out> ; ( 16, 14.)

CR = <ASCII CR, carriage return>; ( 15, 13.)

LF = <ASCII LF, linefeed> ; ( 12, 10.)

one-of-94 = <any one of 94 values> ; (41-176, 33.-126.)

one-of-96 = <any one of 96 values> ; (40-177, 32.-127.)

7BIT = <any 7-bit value> ; ( 0-177, 0.-127.)

single-byte-char = <any 7BIT, including bare CR & bare LF, but NOT

including CRLF, and not including ESC, SI, SO>

MIME Considerations

The name given to the character encoding is "ISO-2022-JP-2". This

name is intended to be used in MIME messages as follows:

Content-Type: text/plain; charset=iso-2022-jp-2

The "ISO-2022-JP-2" encoding is already in 7-bit form, so it is not

necessary to use a Content-Transfer-Encoding header. It should be

noted that applying the Base64 or Quoted-Printable encoding will

render the message unreadable in non-MIME-compliant software.

"ISO-2022-JP-2" may also be used in MIME headers. Both "B" and "Q"

encoding could be useful with "ISO-2022-JP-2" text.

References

Ohta & Handa [Page 4]

.

RFC nnnn ISO-2022-JP-2 28 November 1993

[ASCII] American National Standards Institute, "Coded character set

-- 7-bit American national standard code for information

interchange", ANSI X3.4-1986.

[ISO2022] International Organization for Standardization (ISO),

"Information processing -- ISO 7-bit and 8-bit coded character sets

-- Code extension techniques", International Standard, Ref. No. ISO

2022-1986 (E).

[ISOREG] International Organization for Standardization (ISO),

"International Register of Coded Character Sets To Be Used With

Escape Sequences".

[MIME1] N. Borenstein, N. Freed, "MIME (Multipurpose Internet Mail

Extensions) Part One: Mechanisms for Specifying and Describing the

Format of Internet Message Bodies", RFC 1521, September 1993.

[MIME2] K. Moore, "MIME (Multipurpose Internet Mail Extensions) Part

Two: Message Header Extensions for Non-ASCII Text", RFC 1522,

September 1993.

[RFC822] Crocker, D., "Standard for the Format of ARPA Internet Text

Messages", STD 11, RFC 1522, UDEL, August 1982.

[RFC1036] Horton M., and R. Adams, "Standard for Interchange of

USENET Messages", RFC 1036, AT&T Bell Laboratories, Center for

Seismic Studies, December 1987.

[2022JP] J. Murai, M. Crispin, E. van der Poel, "Japanese Character

Encoding for Internet Messages", RFC 1468, June 1993.

[MULE] M. Nishikimi, K. Handa, S. Tomura, "Mule: MULtilingual

Enhancement to GNU Emacs", Proc. of INET'93, August, 1993.

Acknowledgements

This memo is the result of discussion between various people in a

news group: fj.kanji and is reviewed by a mailing list: jp-msg

@iij.ad.jp. The Authors wish to thank in particular Prof. Eiichi

Wada for his suggestions based on profound knowledge in ISO 2022 and

related standards.

Security Considerations

Security issues are not discussed in this memo.

Authors' Addresses

Ohta & Handa [Page 5]

.

RFC nnnn ISO-2022-JP-2 28 November 1993

Masataka Ohta

Tokyo Institute of Technology

2-12-1, O-okayama, Meguro-ku,

Tokyo 152, JAPAN

Phone: +81-3-5499-7084

Fax: +81-3-3729-1940

EMail: mohta@cc.titech.ac.jp

Ken'ichi Handa

Electrotechnical Laboratory

Umezono 1-1-4, Tsukuba,

Ibaraki 305, JAPAN

Phone: +81-298-58-5916

Fax: +81-298-58-5918

EMail: handa@etl.go.jp

Ohta & Handa [Page 6]

.

From mohta@necom830.cc.titech.ac.jp Tue Dec 21 15:44:31 1993

Received: from necom830.cc.titech.ac.jp by nic.nm.kr (4.1/SMI-4.1)

id AA04880; Tue, 21 Dec 93 15:44:31 KST

Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Tue, 21 Dec 93 15:34:23 +0900

From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>

Return-Path: <mohta@necom830.cc.titech.ac.jp>

Message-Id: <9312210634.AA00144@necom830.cc.titech.ac.jp>

Subject: Re: Proposals for 10646/Unicode in MIME

To: rhys@cs.uq.oz.au, ietf-charsets@innosoft.com, apccirn-i18n@nic.nm.kr

Date: Tue, 21 Dec 93 15:34:21 JST

Cc: dcrocker@mordor.stanford.edu, David_Goldsmith@taligent.com,

ietf-822@dimacs.rutgers.edu, unicored@unicode.org

Reply-To: ietf-charsets@innosoft.com, unicored@unicode.org,

apccirn-i18n@nic.nm.kr

In-Reply-To: <9312202223.AA28439@client>; from "rhys@cs.uq.oz.au" at Dec 21, 93 8:23 am

X-Mailer: ELM [version 2.3 PL11]

Note: As the issue is on text encoding in general, Reply-To: is not

directed to ietf-822.

> I note here that Masataka's proposal for ISO-2022-JP-2 demonstrates what

> we've been arguing all along: it is not enough to just have a character

> encoding.

Recently I avoid to use the word "character" as much as possible and

use the phrase "text encoding", because the concept of "character"

beyond ASCII can not be well defined. Various units of text encoding

are necessary for different purposes.

Thus, I think the names such as MIME charset and ietf-charsets ML

no good.

> There also needs to be some form of markup to distinguish

> different usages of the same character encoding. ISO-2022-JP-2 uses

> escape sequences to do markup, whereas a UNICODE version of text/enriched

> would use <...> tags.

ISO-2022-JP-2 does not do any markup. It is for plain text.

It is finite state. It has no nesting.

I don't think anything with nested structure is plain text.

It is and its successors will be as stateless as practically possible

with ISO 2022.

That is, at the beginning of a line, the state can be assumed to be unique.

> The main difference I can see is that ISO-2022-JP-2

> requires the use of markup, even when the whole message is in the same

> language, but UNICODE can get away without markup for 99% of messages,

It is a meaningless difference.

Whether it is 1% or 100%, you need the same amount of codings, fonts,

settings of config.sys and such, anyway.

> letting local conventions set the default language.

That is one of a very important difference.

Unlike UNICODE, ISO-2022-JP-2 is intended to be used in internationalized

environment. It needs no local conventions. BTW, MIME charsets also, can

not depend on local conventions.

> I still fail to see why Masataka objects to UNICODE since his own proposal has

> to jump through the same markup hoops. The only advantage of ISO-2022-JP-2

> that I can see is that it will work on existing terminals without special

> software in some communities.

Then, you can see nothing.

ISO-2022-JP-2 is produced from long and extensive

localization/internationalization experiences in Japanese computer community

with ISO-2022-JP, EUC, SJIS and such.

First of all, ISO-2022-JP-2 can interoperate with ASCII.

Next, it is 7 bit.

Thus, it can interoperate with any ASCII compatible text encoding such

as EUC (both UJIS and EUC-KR) and SJIS.

More importantly, it can interoperate with the future ultimate ASCII

compatible 8 bit encoding. Of course, UNICODE is NOT the future.

We do know that having two or more uninteroperable encodings such

as EUS and SJIS or ASCII and 16bit-UNICODE is the real pain.

> A specious argument at best, since the rest

> of the world does need special software to view ISO-2022-JP-2 anyway.

ISO-2022-JP-2 is, and ISO-2022-INT-1 will be, designed to aid those

who immediately need localization.

I don't think it be a long term solution.

Both ISO 2022 and ISO 10646/UNICODE has a unified syntax to mix

multilingual characters in the world. ISO 2022 is much better for

us to be able to separate C/J/K characters.

On the other hand, both ISO 2022 and ISO 10646/UNICODE lacks a unified

semantics to mix multilingual characters in the world. ISO 10646/UNICODE

inherits the policy of ISO 2022 to treat characters in different languages

differently. Thus, it is impossible to write a unified text processing

library or application of meaningfully rich functionality.

Thus, for the time being, our solution must be 7 bit ISO 2022.

As a long term solution, I have designed ICODE/IUTF, which has, besides

ASCII compatibility, several useful semantical properties for, as far

as I know, all the characters in the world. With a large enough encoding

space (though not impractically large), the real, semantical, unification

is possible.

> UNICODE has the advantage that if a message gets corrupted and the markup

> is lost, there is still a reasonable character that can be displayed, which

> is close enough not to cause the sky to fall in on the reader. Such corruption

> could easily happen when a message is quoted. What happens with ISO-2022-JP-2?

Misquoting is the issue which MUST be solved by faulty MTAs and other

faulty transports. Providing workarounds will only result in the delay

of the real solution.

Instead, the real state corruption problem is caused in an interactive

environment where individual programs output their own text streams

simultaneously.

With ISO-2022-JP-2, unlike text/enriched, the state is resumed at the

beginning of the next line.

> People have tried time and again to add markup to UNICODE to satisfy Masataka

> (e.g. language tags), but it just doesn't seem to satisfy him. *sigh*

Strange.

I have *ABSOLUTELY* *NO* interest in text/enriched from the beginning.

I and most of the people in the world want to process our natural

languages as plain text in internationalized environment.

We already have a lot of experience to use our languages as plain text.

You can't force us give up plain text.

Masataka Ohta

PS

For more information on ICODE, why ISO 10646/UNICODE is no good and how

can it be improved, see:

"Character Encoding Method for Internationalized Plain

Text Processing", Proceedings of 8th International Joint

Workshop on Computer Communications, Masataka OHTA,

Dec. 1993.

electric copy is available from me.

From rong@watson.ibm.com Wed Dec 22 08:31:48 1993

Return-Path: <rong@watson.ibm.com>

Received: from watson.ibm.com by nic.nm.kr (4.1/SMI-4.1)

id AA07503; Wed, 22 Dec 93 08:31:48 KST

Received: from WATSON by watson.ibm.com (IBM VM SMTP V2R3) with BSMTP id 3129;

Tue, 21 Dec 93 18:26:16 EST

Received: from YKTVMH by watson.vnet.ibm.com with "VAGENT.V1.0"

id 6180; Tue, 21 Dec 1993 18:26:08 EST

Received: from hawpub.watson.ibm.com by yktvmh.watson.ibm.com (IBM VM SMTP V2R3)

with TCP; Tue, 21 Dec 93 18:26:07 EST

Received: by hawpub.watson.ibm.com (AIX 3.2/UCB 5.64/930311)

id AA27752; Tue, 21 Dec 1993 18:26:15 -0500

Date: Tue, 21 Dec 1993 18:26:15 -0500

From: rong@watson.ibm.com (Rong Chang)

Message-Id: <9312212326.AA27752@hawpub.watson.ibm.com>

To: apccirn-i18n@nic.nm.kr

Subject: subscribe

Please add me to the mailing list. I'm sorry for posting this request

to the mailing list because "apccirn-i18n-request@nic.nm.kr" was not

available.

-rong

From mohta@necom830.cc.titech.ac.jp Thu Dec 23 01:30:28 1993

Received: from necom830.cc.titech.ac.jp by nic.nm.kr (4.1/SMI-4.1)

id AA02216; Thu, 23 Dec 93 01:30:28 KST

Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Thu, 23 Dec 93 01:21:41 +0900

From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>

Return-Path: <mohta@necom830.cc.titech.ac.jp>

Message-Id: <9312221621.AA07900@necom830.cc.titech.ac.jp>

Subject: Re: subscribe

To: rong@watson.ibm.com (Rong Chang)

Date: Thu, 23 Dec 93 1:21:40 JST

Cc: apccirn-i18n@nic.nm.kr

In-Reply-To: <9312212326.AA27752@hawpub.watson.ibm.com>; from "Rong Chang" at Dec 21, 93 6:26 pm

X-Mailer: ELM [version 2.3 PL11]

> Please add me to the mailing list.

Welcome.

Could you introduce yourself to the mailing list?

Who are you and what are you doing in i18n area?

> I'm sorry for posting this request

> to the mailing list because "apccirn-i18n-request@nic.nm.kr" was not

> available.

I was aware about that.

But, formally speaking, apccirn related MLs are not open to the public.

Practically speaking, I, personally, would like to add everyone who

are interested in our activities and give us technical contribution.

Masataka Ohta

From jinho@iti.gov.sg Fri Dec 24 13:16:22 1993

Return-Path: <jinho@iti.gov.sg>

Received: from iti.gov.sg by nic.nm.kr (4.1/SMI-4.1)

id AA08975; Fri, 24 Dec 93 13:16:22 KST

Received: by iti.gov.sg (4.1/SMI-4.1)

id AA14773; Fri, 24 Dec 93 12:09:02 SST

From: jinho@iti.gov.sg (Tan Jin Ho)

Message-Id: <9312240409.AA14773@iti.gov.sg>

Subject: Re: Proposals for 10646/Unicode in MIME

To: ietf-charsets@innosoft.com, unicored@unicode.org, apccirn-i18n@nic.nm.kr

Date: Fri, 24 Dec 93 12:09:00 WST

In-Reply-To: <9312210634.AA00144@necom830.cc.titech.ac.jp>; from "Masataka Ohta" at Dec 21, 93 3:34 pm

X-Mailer: ELM [version 2.3 PL11]

Hi,

I would like to have a soft copy of the following report.

"Character Encoding Method for Internationalized Plain

Text Processing", Proceedings of 8th International Joint

Workshop on Computer Communications, Masataka OHTA,

Dec. 1993.

Could you email it to me @ jinho@ncb.gov.sg. Thank you.

Regards,

Jin-Ho

From mohta@necom830.cc.titech.ac.jp Sat Dec 25 23:12:37 1993

Received: from necom830.cc.titech.ac.jp by nic.nm.kr (4.1/SMI-4.1)

id AA11484; Sat, 25 Dec 93 23:12:37 KST

Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Sat, 25 Dec 93 23:04:02 +0900

From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>

Return-Path: <mohta@necom830.cc.titech.ac.jp>

Message-Id: <9312251404.AA16696@necom830.cc.titech.ac.jp>

Subject: Re: subscribe

To: rong@watson.ibm.com (Rong Chang)

Date: Sat, 25 Dec 93 23:04:00 JST

Cc: rong@watson.ibm.com, apccirn-i18n@nic.nm.kr

In-Reply-To: <9312221846.AA40392@hawpub.watson.ibm.com>; from "Rong Chang" at Dec 22, 93 1:46 pm

X-Mailer: ELM [version 2.3 PL11]

> I was born in Taiwan, and have been interested in multilingual,

> multimedia mail systems for several years.

Interesting. Can your system handle a single message containing

arbitrary mixed multiple script languages? Or can it only handle multiple

messages each containing a single (or, maybe, double) script language?

> "I18n" is new to me. It would me nice if someone could send me an FAQ

> list regarding i18n.

As I18n is new to everyone, :-) FAQ list is not avilable.

The currently hot topic of I18n is internationalized text encoding,

which necessarily related to multilingual issues.

Masataka Ohta

From apccirn-sec Tue Jan 25 18:33:33 1994

Received: from necom830.cc.titech.ac.jp by nic.nm.kr (8.6.4/8.6.4)

id SAA01316; Tue, 25 Jan 1994 18:32:45 +0900

Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Tue, 25 Jan 94 18:23:25 +0900

From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>

Message-Id: <9401250923.AA17095@necom830.cc.titech.ac.jp>

Subject: Re: ISO-2022-INT-1

To: apccirn-i18n@nic.nm.kr

Date: Tue, 25 Jan 94 18:23:23 JST

In-Reply-To: <no.id>; from "mohta" at Dec 24, 93 5:55 pm

X-Mailer: ELM [version 2.3 PL11]

> I'll be happy if the responses will be returned before 1/25 (the

> earlier the better, of course). I expect much earlier response

> on your personal (not the communities) opinions.

It's 1/25.

According to several comments, I have revised the previous version

of pre-internet-draft of ISO-2022-INT-1.

Major changes are:

It is now fully compatible to ISO-2022-KR (G1 invocation is

allowed)

Greek characters can be efficiently encoded with G1

It is described to be not only for messages but for everything

Introduction of aggregated name of ISO-2022-INT-*

Bidirectionality is not yet supported.

Any comments?

Masataka Ohta

------------------------------------------------------------------------

INTERNET DRAFT APCCIRN-I18N

draft-filename-01.txt February 1994

Internet Multilingual Text Encoding: ISO-2022-INT-*

Status of this Memo

This document is an Internet-Draft. Internet-Drafts are working

documents of the Internet Engineering Task Force (IETF), its areas,

and its working groups. Note that other groups may also distribute

working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six

months. Internet-Drafts may be updated, replaced, or obsoleted by

other documents at any time. It is not appropriate to use Internet-

Drafts as reference material or to cite them other than as a

``working draft'' or ``work in progress.''

To learn the current status of any Internet-Draft, please check the

1id-abstracts.txt listing contained in the Internet-Drafts Shadow

Directories on ds.internic.net, nic.nordu.net, ftp.nisc.sri.com, or

munnari.oz.au.

Abstract

Based on the experience with "ISO-2022-JP-2" (RFC 1554), a

multilingual text encoding scheme, "ISO-2022-INT-1", is designed as

an extension of "ISO-2022-JP" (RFC 1468) and "ISO-2022-KR" (RFC

1557).

The encoding is ASCII compatible and 7-bit, thus, can be used mixed

with any ASCII compatible encoding. The encoding is designed to be

as stateless as practically possible with ISO 2022. That is, no state

information needs to be preserved between lines.

"ISO-2022-INT-1" and its successors have an aggregated name: "ISO-

2022-INT-*".

Introduction

This memo describes a text encoding scheme: "ISO-2022-INT-1", which

is intended to be a text encoding scheme of the Internet including,

but not limited to, for electronic mail [RFC822] and network news

[RFC1036]. The encoding is also useful in multilingual text files.

The encoding is a multilingual extension of "ISO-2022-JP" [2022JP]

and "ISO-2022-KR" [2022KR]. The encoding is supported by an Emacs

based multilingual text editor: MULE [MULE].

APCCIRN-I18N Expires on Aug 1, 1994 [Page 1]

.

INTERNET DRAFT Internet Multilingual Text Encoding February 1994

The name, "ISO-2022-INT-1", is intended to be used in the "charset"

parameter field of MIME headers (see [MIME1] and [MIME2]).

Description

The text with "ISO-2022-INT-1" starts in ASCII [ASCII] designated to

G0 invoked as GL and KS C 5601 [KSC5601] designated to to G1, and

switches to other character sets of ISO 2022 [ISO2022] through

limited combinations of designation/invocation sequences. All the

characters are encoded with 7 bits only.

At the beginning of text, the existence of an announcer sequence:

"ESC 2/0 4/6 ESC 2/0 5/0 ESC 2/0 5/2 ESC 2/0 5/10" and a

designation/invocation sequence: "ESC 2/8 4/2 SI ESC 2/4 2/9 4/3 ESC

2/10 7/14 ESC 2/11 7/14" are (though omitted) assumed. The same

designation/invocation sequence is also assumed (though unnecessary

and, thus, omitted) at the beginning of each line. Thus, C1 control

characters are represented with 7 bits. Characters of 94 character

sets are designated to G0 or G1 and invoked as GL by SI (shift in,

'0/15') and SO (shift out, '0/14') each. Characters of 96 character

sets are designated to G1 and invoked as GL by SO or they may be

designated to G2 and invoked with SS2 (single shift two, "ESC 4/14"

or "ESC N").

For example, the escape sequence "ESC 2/4 2/8 4/3" or "ESC $ ( C"

indicates that the bytes following the escape sequence are Korean KSC

characters, which are encoded in two bytes each. A double byte

sequence enclosed by SO and SI also indicates a KSC string unless

other character sets are designated to G1. The escape sequence "ESC

2/14 4/1" or "ESC . A" indicates that ISO 8859-1 is designated to G2.

After the designation, the single shifted sequence "ESC 4/14 4/1" or

"ESC N A" is interpreted to represent a character "A with acute".

The following table gives the escape sequences and the character sets

used in "ISO-2022-INT-1" messages. The reg# is the registration

number in ISO's registry [ISOREG].

94 character sets

reg# character set ESC sequence designated to

------------------------------------------------------------------

6 ASCII ESC 2/8 4/2 ESC ( B G0

ESC 2/9 4/2 ESC ) B G1

14 JIS X 0201-Roman ESC 2/8 4/10 ESC ( J G0

ESC 2/9 4/10 ESC ) J G1

94*94 character sets

reg# character set ESC sequence designated to

------------------------------------------------------------------

APCCIRN-I18N Expires on Aug 1, 1994 [Page 2]

.

INTERNET DRAFT Internet Multilingual Text Encoding February 1994

42 JIS X 0208-1978 ESC 2/4 4/0 ESC $ @ G0

ESC 2/4 2/9 4/0 ESC $ ) @ G1

58 GB 2312-80 ESC 2/4 4/1 ESC $ A G0

ESC 2/4 2/9 4/1 ESC $ ) A G1

87 JIS X 0208-1983 ESC 2/4 4/2 ESC $ B G0

ESC 2/4 2/9 4/2 ESC $ ) B G1

149 KS C 5601-1987 ESC 2/4 2/8 4/3 ESC $ ( C G0

ESC 2/4 2/9 4/3 ESC $ ) C G1

159 JIS X 0212-1990 ESC 2/4 2/8 4/4 ESC $ ( D G0

ESC 2/4 2/9 4/4 ESC $ ) D G1

171 CNS 11643-1986-1 ESC 2/4 2/8 4/7 ESC $ ( G G0

ESC 2/4 2/9 4/7 ESC $ ) G G1

172 CNS 11643-1986-2 ESC 2/4 2/8 4/8 ESC $ ( H G0

ESC 2/4 2/9 4/8 ESC $ ) H G1

96 character sets

reg# character set ESC sequence designated to

------------------------------------------------------------------

100 ISO8859-1 ESC 2/13 4/1 ESC - A G1

ESC 2/14 4/1 ESC . A G2

126 ISO8859-7(Greek) ESC 2/13 4/6 ESC - F G1

ESC 2/14 4/6 ESC . F G2

Handling of code points not specified in each standard is

implementation dependent. For further information about the

character sets and the escape sequences, see [ISO2022] and [ISOREG].

Some Asian standards are also described in chapter 3 and 4 of

[LUNDE].

If there is any G0 designation other than ASCII in text, there must

be a switch back to ASCII before a space character '2/0' (but not

necessarily before '2/0' code of 96 character set such as "ESC 4/14

2/0" or "ESC N ' '") or control characters such as tab or CRLF. If

there is any G1 designation other than KS C [KSC5601] in text, there

must be a switch back to KS C before the end of line. If there is

any G1 invocation in text, there must be a switch back to G0

invocation before a space character (but not necessarily before "ESC

4/14 2/0" or "ESC N ' '") or control characters such as tab or CRLF.

This means that the next line starts in the ASCII character set that

was switched to before the end of the previous line.

Though ISO 2022 [ISO2022] and related standards permits long term,

persistent states, "ISO-2022-INT-1" is designed not to need such

states be preserved between lines. Applications such as pagers and

editors which randomly seek within a text file encoded with "ISO-

2022-INT-1" can assume that the state is same as that of the

beginning of the text.

APCCIRN-I18N Expires on Aug 1, 1994 [Page 3]

.

INTERNET DRAFT Internet Multilingual Text Encoding February 1994

Thus, in each line containing 96 character sets, G2 designation must

be given before 96 character set is used.

The text will end in ASCII designated to G0.

Left-to-right directionality is assumed if the text is displayed

horizontally.

Users of "ISO-2022-INT-1" must be aware that some common transport

such as old Bnews can not relay a 7-bit value 7/15 (decimal 127),

which is used to encode, say, "y with diaeresis" of ISO 8859-1.

Other restrictions are given in the Formal Syntax section below.

Formal Syntax

The notational conventions used here are identical to those used in

RFC 822 [RFC822].

The * (asterisk) convention is as follows:

l*m something

meaning at least l and at most m somethings, with l and m taking

default values of 0 and infinity, respectively.

text = *(line CRLF)

line = *(single-byte-char /

single-shift-char /

(*g0-segment reset-desig-seq) /

g1-segment /

g1-desig-seq /

g2-desig-seq )

; note: within a line,

; g2-desig-seq must precede

; single-shift-char

; note2: must end KS C

; designated to G1

g0-segment = single-byte-g0-segment /

double-byte-g0-segment

single-byte-g0-segment = single-byte-g0-seq

*(single-byte-char / single-shift-char)

double-byte-g0-segment = double-byte-g0-seq

*((one-of-94 one-of-94) / single-shift-char)

APCCIRN-I18N Expires on Aug 1, 1994 [Page 4]

.

INTERNET DRAFT Internet Multilingual Text Encoding February 1994

g1-segment = single-byte-g1-94-segment /

single-byte-g1-96-segment /

double-byte-g1-segment

; note: an appropriate segment

; should be selected according

; to the current state of G1

; designation

single-byte-g1-94-segment = SO *(one-of-94 / single-shift-char) SI

single-byte-g1-96-segment = SO *(one-of-96 / single-shift-char) SI

double-byte-g1-segment = SO

*((one-of-94 one-of-94) /

single-shift-char )

SI

reset-desig-seq = ESC "(" "B"

single-byte-g0-seq = ESC "(" ( "B" / "J" )

g1-desig-seq = single-byte-g1-seq / double-byte-g1-seq

single-byte-g1-seq = (ESC ")" ( "B" / "J" )) /

(ESC "-" ( "A" / "F" ))

double-byte-g1-seq = ESC "$" "(" ( "@" / "A" / "B" /

"C" / "D" / "G" / "H" )

g2-desig-seq = ESC "." ( "A" / "F" )

single-shift-seq = ESC "N"

single-shift-char = single-shift-seq one-of-96

CRLF = CR LF

; ( Octal, Decimal.)

ESC = <ISO 2022 ESC, escape> ; ( 33, 27.)

SI = <ISO 2022 SI, shift-in> ; ( 17, 15.)

SO = <ISO 2022 SO, shift-out> ; ( 16, 14.)

CR = <ASCII CR, carriage return>; ( 15, 13.)

LF = <ASCII LF, linefeed> ; ( 12, 10.)

APCCIRN-I18N Expires on Aug 1, 1994 [Page 5]

.

INTERNET DRAFT Internet Multilingual Text Encoding February 1994

one-of-94 = <any one of 94 values> ; (41-176, 33.-126.)

one-of-96 = <any one of 96 values> ; (40-177, 32.-127.)

7BIT = <any 7-bit value> ; ( 0-177, 0.-127.)

single-byte-char = <any 7BIT, including bare CR & bare LF, but NOT

including CRLF, and not including ESC, SI, SO>

Mail System Considerations

"ISO-2022-INT-1" is designed to be purely 7-bit, so that it can be

used with any transport which conforms to STD 11, RFC822 [RFC822]

without MIME, which is the current practice in Japan to use "ISO-

20220-JP" [2022JP].

If "ISO-2022-INT-1" is used with MIME, MIME charset name may be given

as follows:

Content-Type: text/plain; charset=iso-2022-int-1

Even if charset parameters are omitted, multilingual applications

should, in spite of [MIME1], still assume iso-2022-int-1 or its

latest available successor (see the section "Future Extension Plan"),

not US-ASCII, is used.

The "ISO-2022-INT-1" encoding is already in 7-bit form, so it is not

necessary to use a Content-Transfer-Encoding header. It should be

noted that applying the Base64 or Quoted-Printable encoding will

render the message unreadable in non-MIME-compliant software.

"ISO-2022-INT-1" may also be used in mail headers. If bare STD11,

RFC822 without MIME is used, appropriate quoting of special

characters as "quoted string" might be necessary with structured

headers, which might not be supported in all the common environment.

In MIME headers, Both "B" and "Q" encoding could be useful with

"ISO-2022-INT-1" text.

Future Extension Plan

Future extensions of "ISO-2022-INT-1" will be named "ISO-2022-INT-2",

"ISO-2022-INT-3" and so on. MIME charset names beginning with "ISO-

2022-INT-" are reserved for them. The family of encoding has an

aggregated name: "ISO-2022-INT-*".

The extensions will be solely by adding extra character sets of ISO

2022, though other extensions such as for bidirectionality support

are possible. To avoid duplicated assignment of escape sequences,

APCCIRN-I18N Expires on Aug 1, 1994 [Page 6]

.

INTERNET DRAFT Internet Multilingual Text Encoding February 1994

formal ISO registry [ISOREG] will be required.

The current feature of an initial designation of KS C 5601 to G1 will

be removed in the versions of near future. Users of ISO-2022-INT-1

are recommended to explicitly designate KS C 5601 to G1.

To minimize the number of character sets, those which is already

covered by the larger character sets and not so widely used should

not be added. For example, Katakana character set of "JIS X 0201-

Kana" is omitted because the set is completely covered by "JIS X

0208-1978" and not used at all in the Internet community of Japan.

In any event, the property of "ISO-2022-INT-1" that:

Though ISO 2022 [ISO2022] and related standards permits long term,

persistent states, "ISO-2022-INT-1" is designed not to need such

states be preserved between lines. Applications such as pagers

and editors which randomly seek within a text file encoded with

"ISO-2022-INT-1" can assume that the state is same as that of the

beginning of the text.

will be preserved.

References

[ASCII] American National Standards Institute, "Coded character set

-- 7-bit American national standard code for information

interchange", ANSI X3.4-1986.

[ISO2022] International Organization for Standardization (ISO),

"Information processing -- ISO 7-bit and 8-bit coded

character sets -- Code extension techniques", International

Standard, Ref. No. ISO 2022-1986 (E).

[ISOREG] International Organization for Standardization (ISO),

"International Register of Coded Character Sets To Be Used

With Escape Sequences".

[MIME1] Borenstein, N., and Freed, N., "MIME (Multipurpose Internet

Mail Extensions) Part One: Mechanisms for Specifying and

Describing the Format of Internet Message Bodies", RFC 1521,

September 1993.

[MIME2] Moore, K., "MIME (Multipurpose Internet Mail Extensions) Part

Two: Message Header Extensions for Non-ASCII Text", RFC 1522,

September 1993.

[RFC822] Crocker, D., "Standard for the Format of ARPA Internet Text

APCCIRN-I18N Expires on Aug 1, 1994 [Page 7]

.

INTERNET DRAFT Internet Multilingual Text Encoding February 1994

Messages", STD 11, RFC 822, August 1982.

[RFC1036] Horton M., and Adams, R., "Standard for Interchange of

USENET Messages", RFC 1036, AT&T Bell Laboratories, Center

for Seismic Studies, December 1987.

[2022JP] Murai, J., Crispin, M., and E. van der Poel, "Japanese

Character Encoding for Internet Messages", RFC 1468, June

1993.

[2022JP2] Ohta, M., and Handa, K. "ISO-2022-JP-2: Multilingual

Extension of ISO-2022-JP", RFC 1554, December 1993.

[2022KR] U. Choi, K., and Chon, H. Park, "Korean Character Encoding

for Internet Messages", RFC 1557, December 1993.

[KSC5601] Korea Industrial Standards Association, "Code for

Information Interchange (Hangul and Hanja)," Korean

Industrial Standard, 1987, Ref. No. KS C 5601-1987.

[MULE] Nishikimi, M., Handa, K., and Tomura, S., "Mule: MULtilingual

Enhancement to GNU Emacs", Proc. of INET'93, August, 1993.

[LUNDE] Lunde, K., "Understanding Japanese Information Processing,",

O'Reilly & Associates, Inc., ISBN 1-56592-043-0, 1993.

Acknowledgements

(to be supplied)

Security Considerations

Security issues are not discussed in this memo.

Authors' Addresses

(to be supplied)

APCCIRN-I18N Expires on Aug 1, 1994 [Page 8]

.

From apccirn-sec Tue Jan 25 18:47:02 1994

Received: from necom830.cc.titech.ac.jp by nic.nm.kr (8.6.4/8.6.4)

id SAA01350; Tue, 25 Jan 1994 18:46:51 +0900

Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Tue, 25 Jan 94 18:37:20 +0900

From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>

Message-Id: <9401250937.AA17209@necom830.cc.titech.ac.jp>

Subject: Re: ISO-2022-INT-1

To: mohta@necom830.cc.titech.ac.jp (Masataka Ohta)

Date: Tue, 25 Jan 94 18:37:19 JST

Cc: apccirn-i18n@nic.nm.kr

In-Reply-To: <9401250923.AA17095@necom830.cc.titech.ac.jp>; from "Masataka Ohta" at Jan 25, 94 6:23 pm

X-Mailer: ELM [version 2.3 PL11]

>

> > I'll be happy if the responses will be returned before 1/25 (the

> > earlier the better, of course). I expect much earlier response

> > on your personal (not the communities) opinions.

>

> It's 1/25.

>

> According to several comments, I have revised the previous version

> of pre-internet-draft of ISO-2022-INT-1.

> Any comments?

I forgot to mention that I'll post a draft with further revision as an

Internet Draft early in February.

Masataka Ohta

From apccirn-sec Wed Jan 26 14:11:18 1994

Received: from necom830.cc.titech.ac.jp by nic.nm.kr (8.6.4/8.6.4)

id OAA04486; Wed, 26 Jan 1994 14:10:45 +0900

Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Wed, 26 Jan 94 14:01:32 +0900

From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>

Message-Id: <9401260501.AA21732@necom830.cc.titech.ac.jp>

Subject: Re: ISO-2022-INT-1

To: mohta@necom830.cc.titech.ac.jp (Masataka Ohta)

Date: Wed, 26 Jan 94 14:01:30 JST

Cc: apccirn-i18n@nic.nm.kr

In-Reply-To: <9401250923.AA17095@necom830.cc.titech.ac.jp>; from "Masataka Ohta" at Jan 25, 94 6:23 pm

X-Mailer: ELM [version 2.3 PL11]

The following is a slimed down version.

The changes to the yeasterdays draft are:

A single character set is designated only to G0 or G1

G2 and SS2 is not used

Error of designatio sequence in formal syntax section is

corrected

Which one do you like better?

Masataka Ohta

------------------------------------------------------------------------

INTERNET DRAFT APCCIRN-I18N

draft-filename-01.txt February 1994

Internet Multilingual Text Encoding: ISO-2022-INT-*

Status of this Memo

This document is an Internet-Draft. Internet-Drafts are working

documents of the Internet Engineering Task Force (IETF), its areas,

and its working groups. Note that other groups may also distribute

working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six

months. Internet-Drafts may be updated, replaced, or obsoleted by

other documents at any time. It is not appropriate to use Internet-

Drafts as reference material or to cite them other than as a

``working draft'' or ``work in progress.''

To learn the current status of any Internet-Draft, please check the

1id-abstracts.txt listing contained in the Internet-Drafts Shadow

Directories on ds.internic.net, nic.nordu.net, ftp.nisc.sri.com, or

munnari.oz.au.

Abstract

Based on the experience with "ISO-2022-JP-2" (RFC 1554), a

multilingual text encoding scheme, "ISO-2022-INT-1", is designed as

an extension of "ISO-2022-JP" (RFC 1468) and "ISO-2022-KR" (RFC

1557).

The encoding is ASCII compatible and 7-bit, thus, can be used mixed

with any ASCII compatible encoding. The encoding is designed to be

as stateless as practically possible with ISO 2022. That is, no state

information needs to be preserved between lines.

"ISO-2022-INT-1" and its successors have an aggregated name: "ISO-

2022-INT-*".

Introduction

This memo describes a text encoding scheme: "ISO-2022-INT-1", which

is intended to be a text encoding scheme of the Internet including,

but not limited to, for electronic mail [RFC822] and network news

[RFC1036]. The encoding is also useful in multilingual text files.

The encoding is a multilingual extension of "ISO-2022-JP" [2022JP]

and "ISO-2022-KR" [2022KR]. The encoding is supported by an Emacs

based multilingual text editor: MULE [MULE].

APCCIRN-I18N Expires on Aug 1, 1994 [Page 1]

.

INTERNET DRAFT Internet Multilingual Text Encoding February 1994

The name, "ISO-2022-INT-1", is intended to be used in the "charset"

parameter field of MIME headers (see [MIME1] and [MIME2]).

Description

The text with "ISO-2022-INT-1" starts in ASCII [ASCII] designated to

G0 invoked as GL and KS C 5601 [KSC5601] designated to to G1, and

switches to other character sets of ISO 2022 [ISO2022] through

limited combinations of designation/invocation sequences. All the

characters are encoded with 7 bits only.

At the beginning of text, the existence of an announcer sequence:

"ESC 2/0 4/2" and a designation/invocation sequence: "ESC 2/8 4/2 SI

ESC 2/4 2/9 4/3 ESC 2/10 7/14 ESC 2/11 7/14" are (though omitted)

assumed. The same designation/invocation sequence is also assumed

(though unnecessary and, thus, omitted) at the beginning of each

line. Thus, characters of 94 character sets are designated to G0 or

G1 and invoked as GL by SI (shift in, '0/15') and SO (shift out,

'0/14') each. Characters of 96 character sets are designated to G1

and invoked as GL by SO. To make the encoding almost unique, a

character set is designated only to either G0 or G1 and not both.

For example, the escape sequence "ESC 2/4 4/2" or "ESC $ B" indicates

that the bytes following the escape sequence are Japanese JIS X

0208-1983 characters, which are encoded in two bytes each. A double

byte sequence enclosed by SO and SI indicates a KSC string unless

other character sets are designated to G1. The escape sequence "ESC

2/13 4/1" or "ESC - A" indicates that ISO 8859-1 is designated to G1.

After the designation, a character code '4/1' is interpreted to

represent a character "A with acute".

The following table gives the escape sequences and the character sets

used in "ISO-2022-INT-1" messages. The reg# is the registration

number in ISO's registry [ISOREG].

94 character sets

reg# character set ESC sequence designated to

------------------------------------------------------------------

6 ASCII ESC 2/8 4/2 ESC ( B G0

14 JIS X 0201-Roman ESC 2/8 4/10 ESC ( J G0

94*94 character sets

reg# character set ESC sequence designated to

------------------------------------------------------------------

42 JIS X 0208-1978 ESC 2/4 4/0 ESC $ @ G0

58 GB 2312-80 ESC 2/4 4/1 ESC $ A G0

87 JIS X 0208-1983 ESC 2/4 4/2 ESC $ B G0

149 KS C 5601-1987 ESC 2/4 2/9 4/3 ESC $ ) C G1

APCCIRN-I18N Expires on Aug 1, 1994 [Page 2]

.

INTERNET DRAFT Internet Multilingual Text Encoding February 1994

159 JIS X 0212-1990 ESC 2/4 2/8 4/4 ESC $ ( D G0

171 CNS 11643-1986-1 ESC 2/4 2/8 4/7 ESC $ ( G G0

172 CNS 11643-1986-2 ESC 2/4 2/8 4/8 ESC $ ( H G0

96 character sets

reg# character set ESC sequence designated to

------------------------------------------------------------------

100 ISO8859-1 ESC 2/13 4/1 ESC - A G1

126 ISO8859-7(Greek) ESC 2/13 4/6 ESC - F G1

Handling of code points not specified in each standard is

implementation dependent. For further information about the

character sets and the escape sequences, see [ISO2022] and [ISOREG].

Some Asian standards are also described in chapter 3 and 4 of

[LUNDE].

If there is any G0 designation other than ASCII in text, there must

be a switch back to ASCII before a space character '2/0' (but not

necessarily before '2/0' code of 96 character set, which usually

represent non-breaking space) or control characters such as tab or

CRLF. If there is any G1 designation other than KS C [KSC5601] in

text, there must be a switch back to KS C before the end of line. If

there is any G1 invocation in text, there must be a switch back to G0

invocation before a space character or control characters such as tab

or CRLF. This means that the next line starts in the ASCII character

set that was switched to before the end of the previous line.

Though ISO 2022 [ISO2022] and related standards permits long term,

persistent states, "ISO-2022-INT-1" is designed not to need such

states be preserved between lines. Applications such as pagers and

editors which randomly seek within a text file encoded with "ISO-

2022-INT-1" can assume that the state is same as that of the

beginning of the text.

The text will end in ASCII designated to G0.

Left-to-right directionality is assumed if the text is displayed

horizontally.

Users of "ISO-2022-INT-1" must be aware that some common transport

such as old Bnews can not relay a 7-bit value 7/15 (decimal 127),

which is used to encode, say, "y with diaeresis" of ISO 8859-1.

Other restrictions are given in the Formal Syntax section below.

Formal Syntax

The notational conventions used here are identical to those used in

APCCIRN-I18N Expires on Aug 1, 1994 [Page 3]

.

INTERNET DRAFT Internet Multilingual Text Encoding February 1994

RFC 822 [RFC822].

The * (asterisk) convention is as follows:

l*m something

meaning at least l and at most m somethings, with l and m taking

default values of 0 and infinity, respectively.

text = *(line CRLF)

line = *(single-byte-char /

(*g0-segment reset-desig-seq) /

g1-segment /

g1-desig-seq )

; note: must end KS C

; designated to G1

g0-segment = single-byte-g0-segment /

double-byte-g0-segment

single-byte-g0-segment = single-byte-g0-seq *single-byte-char

double-byte-g0-segment = double-byte-g0-seq *(one-of-94 one-of-94)

g1-segment = single-byte-g1-96-segment /

double-byte-g1-segment

; note: an appropriate segment

; should be selected according

; to the current state of G1

; designation

single-byte-g1-96-segment = SO *one-of-96 SI

double-byte-g1-segment = SO *(one-of-94 one-of-94) SI

reset-desig-seq = ESC "(" "B"

single-byte-g0-seq = ESC "(" ("B" / "J")

double-byte-g0-seq = (ESC "$" ("@" / "A" / "B")) /

(ESC "$" "(" ("D" / "G" / "H")

g1-desig-seq = single-byte-g1-seq / double-byte-g1-seq

single-byte-g1-seq = (ESC "-" ("A" / "F"))

double-byte-g1-seq = ESC "$" ")" "C"

APCCIRN-I18N Expires on Aug 1, 1994 [Page 4]

.

INTERNET DRAFT Internet Multilingual Text Encoding February 1994

CRLF = CR LF

; ( Octal, Decimal.)

ESC = <ISO 2022 ESC, escape> ; ( 33, 27.)

SI = <ISO 2022 SI, shift-in> ; ( 17, 15.)

SO = <ISO 2022 SO, shift-out> ; ( 16, 14.)

CR = <ASCII CR, carriage return>; ( 15, 13.)

LF = <ASCII LF, linefeed> ; ( 12, 10.)

one-of-94 = <any one of 94 values> ; (41-176, 33.-126.)

one-of-96 = <any one of 96 values> ; (40-177, 32.-127.)

7BIT = <any 7-bit value> ; ( 0-177, 0.-127.)

single-byte-char = <any 7BIT, including bare CR & bare LF, but NOT

including CRLF, and not including ESC, SI, SO>

Mail System Considerations

"ISO-2022-INT-1" is designed to be purely 7-bit, so that it can be

used with any transport which conforms to STD 11, RFC822 [RFC822]

without MIME, which is the current practice in Japan to use "ISO-

20220-JP" [2022JP].

If "ISO-2022-INT-1" is used with MIME, MIME charset name may be given

as follows:

Content-Type: text/plain; charset=iso-2022-int-1

Even if charset parameters are omitted, multilingual applications

should, in spite of [MIME1], still assume iso-2022-int-1 or its

latest available successor (see the section "Future Extension Plan"),

not US-ASCII, is used.

The "ISO-2022-INT-1" encoding is already in 7-bit form, so it is not

necessary to use a Content-Transfer-Encoding header. It should be

noted that applying the Base64 or Quoted-Printable encoding will

render the message unreadable in non-MIME-compliant software.

"ISO-2022-INT-1" may also be used in mail headers. If bare STD11,

RFC822 without MIME is used, appropriate quoting of special

characters as "quoted string" might be necessary with structured

APCCIRN-I18N Expires on Aug 1, 1994 [Page 5]

.

INTERNET DRAFT Internet Multilingual Text Encoding February 1994

headers, which might not be supported in all the common environment.

In MIME headers, Both "B" and "Q" encoding could be useful with

"ISO-2022-INT-1" text.

Future Extension Plan

Future extensions of "ISO-2022-INT-1" will be named "ISO-2022-INT-2",

"ISO-2022-INT-3" and so on. MIME charset names beginning with "ISO-

2022-INT-" are reserved for them. The family of encoding has an

aggregated name: "ISO-2022-INT-*".

The extensions will be solely by adding extra character sets of ISO

2022, though other extensions such as for bidirectionality support

are possible. To avoid duplicated assignment of escape sequences,

formal ISO registry [ISOREG] will be required.

The current feature of an initial designation of KS C 5601 to G1 will

be removed in the versions of near future. Users of ISO-2022-INT-1

are recommended to explicitly designate KS C 5601 to G1.

To minimize the number of character sets, those which is already

covered by the larger character sets and not so widely used should

not be added. For example, Katakana character set of "JIS X 0201-

Kana" is omitted because the set is completely covered by "JIS X

0208-1978" and not used at all in the Internet community of Japan.

In any event, the property of "ISO-2022-INT-1" that:

Though ISO 2022 [ISO2022] and related standards permits long term,

persistent states, "ISO-2022-INT-1" is designed not to need such

states be preserved between lines. Applications such as pagers

and editors which randomly seek within a text file encoded with

"ISO-2022-INT-1" can assume that the state is same as that of the

beginning of the text.

will be preserved.

References

[ASCII] American National Standards Institute, "Coded character set

-- 7-bit American national standard code for information

interchange", ANSI X3.4-1986.

[ISO2022] International Organization for Standardization (ISO),

"Information processing -- ISO 7-bit and 8-bit coded

character sets -- Code extension techniques", International

Standard, Ref. No. ISO 2022-1986 (E).

APCCIRN-I18N Expires on Aug 1, 1994 [Page 6]

.

INTERNET DRAFT Internet Multilingual Text Encoding February 1994

[ISOREG] International Organization for Standardization (ISO),

"International Register of Coded Character Sets To Be Used

With Escape Sequences".

[MIME1] Borenstein, N., and Freed, N., "MIME (Multipurpose Internet

Mail Extensions) Part One: Mechanisms for Specifying and

Describing the Format of Internet Message Bodies", RFC 1521,

September 1993.

[MIME2] Moore, K., "MIME (Multipurpose Internet Mail Extensions) Part

Two: Message Header Extensions for Non-ASCII Text", RFC 1522,

September 1993.

[RFC822] Crocker, D., "Standard for the Format of ARPA Internet Text

Messages", STD 11, RFC 822, August 1982.

[RFC1036] Horton M., and Adams, R., "Standard for Interchange of

USENET Messages", RFC 1036, AT&T Bell Laboratories, Center

for Seismic Studies, December 1987.

[2022JP] Murai, J., Crispin, M., and E. van der Poel, "Japanese

Character Encoding for Internet Messages", RFC 1468, June

1993.

[2022JP2] Ohta, M., and Handa, K. "ISO-2022-JP-2: Multilingual

Extension of ISO-2022-JP", RFC 1554, December 1993.

[2022KR] U. Choi, K., and Chon, H. Park, "Korean Character Encoding

for Internet Messages", RFC 1557, December 1993.

[KSC5601] Korea Industrial Standards Association, "Code for

Information Interchange (Hangul and Hanja)," Korean

Industrial Standard, 1987, Ref. No. KS C 5601-1987.

[MULE] Nishikimi, M., Handa, K., and Tomura, S., "Mule: MULtilingual

Enhancement to GNU Emacs", Proc. of INET'93, August, 1993.

[LUNDE] Lunde, K., "Understanding Japanese Information Processing,",

O'Reilly & Associates, Inc., ISBN 1-56592-043-0, 1993.

Acknowledgements

(to be supplied)

Security Considerations

Security issues are not discussed in this memo.

APCCIRN-I18N Expires on Aug 1, 1994 [Page 7]

.

INTERNET DRAFT Internet Multilingual Text Encoding February 1994

Authors' Addresses

(to be supplied)

APCCIRN-I18N Expires on Aug 1, 1994 [Page 8]

From apccirn-sec Tue Feb 1 13:12:09 1994

Received: from necom830.cc.titech.ac.jp by nic.nm.kr (8.6.4/8.6.4)

id NAA06212; Tue, 1 Feb 1994 13:11:08 +0900

Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Tue, 1 Feb 94 13:01:34 +0900

From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>

Message-Id: <9402010401.AA12115@necom830.cc.titech.ac.jp>

Subject: Re: ISO-2022-INT-1

To: apccirn-i18n@nic.nm.kr, jp-msg@iij.ad.jp

Date: Tue, 1 Feb 94 13:01:32 JST

In-Reply-To: <no.id>; from "mohta" at Dec 24, 93 5:55 pm

X-Mailer: ELM [version 2.3 PL11]

I'm going to post the finished draft (the slim one) this afternoon.

Any objections or corrections?

Masataka Ohta

------------------------------------------------------------------------

INTERNET DRAFT APCCIRN-I18N

draft-ohta-text-encoding-00.txt February 1994

Internet Multilingual Text Encoding: ISO-2022-INT-*

Status of this Memo

This document is an Internet-Draft. Internet-Drafts are working

documents of the Internet Engineering Task Force (IETF), its areas,

and its working groups. Note that other groups may also distribute

working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six

months. Internet-Drafts may be updated, replaced, or obsoleted by

other documents at any time. It is not appropriate to use Internet-

Drafts as reference material or to cite them other than as a

``working draft'' or ``work in progress.''

To learn the current status of any Internet-Draft, please check the

1id-abstracts.txt listing contained in the Internet-Drafts Shadow

Directories on ds.internic.net, nic.nordu.net, ftp.nisc.sri.com, or

munnari.oz.au.

Abstract

APCCIRN internationalization group has, based on the experience with

"ISO-2022-JP-2" (RFC 1554), designed a multilingual text encoding

scheme, "ISO-2022-INT-1", as an extension of "ISO-2022-JP" (RFC 1468)

and "ISO-2022-KR" (RFC 1557).

The encoding is ASCII compatible and 7-bit, thus, can be used mixed

with any ASCII compatible encoding. The encoding is designed to be

as stateless as practically possible with ISO 2022. That is, no state

information needs to be preserved between lines.

"ISO-2022-INT-1" and its successors have an aggregated name: "ISO-

2022-INT-*".

Introduction

This memo describes a text encoding scheme: "ISO-2022-INT-1", which

is intended to be a multilingual text encoding scheme of the Internet

including, but not limited to, for electronic mail [RFC822] and

network news [RFC1036]. The encoding is also useful in multilingual

text files. The encoding is a multilingual extension of "ISO-2022-

JP" [2022JP] and "ISO-2022-KR" [2022KR]. The encoding is supported

by an Emacs based multilingual text editor: MULE [MULE].

APCCIRN-I18N Expires on Aug 4, 1994 [Page 1]

.

INTERNET DRAFT Internet Multilingual Text Encoding February 1994

The name, "ISO-2022-INT-1", is intended to be used in the "charset"

parameter field of MIME headers (see [MIME1] and [MIME2]).

Description

The text with "ISO-2022-INT-1" starts in ASCII [ASCII] designated to

G0 invoked as GL and KS C 5601 [KSC5601] designated to to G1, and

switches to other character sets of ISO 2022 [ISO2022] through

limited combinations of designation/invocation sequences. All the

characters are encoded with 7 bits only.

At the beginning of text, the existence of an announcer sequence:

"ESC 2/0 4/2" and a designation/invocation sequence: "ESC 2/8 4/2 SI

ESC 2/4 2/9 4/3 ESC 2/10 7/14 ESC 2/11 7/14" are (though omitted)

assumed. The same designation/invocation sequence is also assumed

(though unnecessary and, thus, omitted) at the beginning of each

line. Thus, characters of 94 character sets are designated to G0 or

G1 and invoked as GL by SI (shift in, "0/15") and SO (shift out,

"0/14") each. Characters of 96 character sets are designated to G1

and invoked as GL by SO. To make the encoding almost unique, a

character set is designated only to either G0 or G1 and not to both.

For example, the escape sequence "ESC 2/4 4/2" or "ESC $ B" indicates

that the bytes following the escape sequence are Japanese JIS X

0208-1983 characters, which are encoded in two bytes each. A double

byte sequence enclosed by SO and SI indicates a KS C 5601 [KSC5601]

string unless other character sets are designated to G1. The escape

sequence "ESC 2/13 4/1" or "ESC - A" indicates that ISO 8859-1 is

designated to G1. After the designation, a character code "4/1" is

interpreted to represent a character "A with acute", not ASCII "A".

The following table gives the escape sequences and the character sets

used in "ISO-2022-INT-1" messages. The reg# is the registration

number in ISO's registry [ISOREG].

94 character sets

reg# character set ESC sequence designated to

------------------------------------------------------------------

6 ASCII ESC 2/8 4/2 ESC ( B G0

14 JIS X 0201-Roman ESC 2/8 4/10 ESC ( J G0

94*94 character sets

reg# character set ESC sequence designated to

------------------------------------------------------------------

42 JIS X 0208-1978 ESC 2/4 4/0 ESC $ @ G0

58 GB 2312-80 ESC 2/4 4/1 ESC $ A G0

87 JIS X 0208-1983 ESC 2/4 4/2 ESC $ B G0

149 KS C 5601-1987 ESC 2/4 2/9 4/3 ESC $ ) C G1

APCCIRN-I18N Expires on Aug 4, 1994 [Page 2]

.

INTERNET DRAFT Internet Multilingual Text Encoding February 1994

159 JIS X 0212-1990 ESC 2/4 2/8 4/4 ESC $ ( D G0

171 CNS 11643-1986-1 ESC 2/4 2/8 4/7 ESC $ ( G G0

172 CNS 11643-1986-2 ESC 2/4 2/8 4/8 ESC $ ( H G0

96 character sets

reg# character set ESC sequence designated to

------------------------------------------------------------------

100 ISO8859-1 ESC 2/13 4/1 ESC - A G1

126 ISO8859-7(Greek) ESC 2/13 4/6 ESC - F G1

Handling of code points not specified in each standard is

implementation dependent. For further information about the

character sets and the escape sequences, see [ISO2022] and [ISOREG].

Some Asian standards are also described in chapter 3 and 4 of

[LUNDE].

If there is any G0 designation other than ASCII in text, there must

be a switch back to ASCII before a space character "2/0" (but not

necessarily before "2/0" code of 96 character set, which usually

represent non-breaking space) or control characters such as tab or

CRLF. If there is any G1 designation other than KS C [KSC5601] in

text, there must be a switch back to KS C before the end of line. If

there is any G1 invocation in text, there must be a switch back to G0

invocation before a space character or control characters such as tab

or CRLF. This means that the next line starts in the ASCII character

set that was switched to before the end of the previous line.

Though ISO 2022 [ISO2022] and related standards permits long term,

persistent states, "ISO-2022-INT-1" is designed not to need such

states be preserved between lines. Applications such as pagers and

editors which randomly seek within a text file encoded with "ISO-

2022-INT-1" can assume that the state is same as that of the

beginning of the text.

The text will end in ASCII designated to G0.

Left-to-right directionality is assumed if the text is displayed

horizontally.

Users of "ISO-2022-INT-1" must be aware that some common transport

such as old Bnews in Japan can not relay a 7-bit value "7/15"

(decimal 127), which is used to encode, say, "y with diaeresis" of

ISO 8859-1.

Other restrictions are given in the Formal Syntax section below.

Formal Syntax

APCCIRN-I18N Expires on Aug 4, 1994 [Page 3]

.

INTERNET DRAFT Internet Multilingual Text Encoding February 1994

The notational conventions used here are identical to those used in

STD11, RFC 822 [RFC822].

The * (asterisk) convention is as follows:

l*m something

meaning at least l and at most m somethings, with l and m taking

default values of 0 and infinity, respectively.

text = *(line CRLF)

line = *(single-byte-char /

(*g0-segment reset-desig-seq) /

g1-segment /

g1-desig-seq )

; note: must end KS C

; designated to G1

g0-segment = single-byte-g0-segment /

double-byte-g0-segment

single-byte-g0-segment = single-byte-g0-seq *single-byte-char

double-byte-g0-segment = double-byte-g0-seq *(one-of-94 one-of-94)

g1-segment = single-byte-g1-96-segment /

double-byte-g1-segment

; note: an appropriate segment

; should be selected according

; to the current state of G1

; designation

single-byte-g1-96-segment = SO *one-of-96 SI

double-byte-g1-segment = SO *(one-of-94 one-of-94) SI

reset-desig-seq = ESC "(" "B"

single-byte-g0-seq = ESC "(" ("B" / "J")

double-byte-g0-seq = (ESC "$" ("@" / "A" / "B")) /

(ESC "$" "(" ("D" / "G" / "H")

g1-desig-seq = single-byte-g1-seq / double-byte-g1-seq

single-byte-g1-seq = (ESC "-" ("A" / "F"))

APCCIRN-I18N Expires on Aug 4, 1994 [Page 4]

.

INTERNET DRAFT Internet Multilingual Text Encoding February 1994

double-byte-g1-seq = ESC "$" ")" "C"

CRLF = CR LF

; ( Octal, Decimal.)

ESC = <ISO 2022 ESC, escape> ; ( 33, 27.)

SI = <ISO 2022 SI, shift-in> ; ( 17, 15.)

SO = <ISO 2022 SO, shift-out> ; ( 16, 14.)

CR = <ASCII CR, carriage return>; ( 15, 13.)

LF = <ASCII LF, linefeed> ; ( 12, 10.)

one-of-94 = <any one of 94 values> ; (41-176, 33.-126.)

one-of-96 = <any one of 96 values> ; (40-177, 32.-127.)

7BIT = <any 7-bit value> ; ( 0-177, 0.-127.)

single-byte-char = <any 7BIT, including bare CR & bare LF, but NOT

including CRLF, and not including ESC, SI, SO>

Mail System Considerations

"ISO-2022-INT-1" is designed to be purely 7-bit, so that it can be

used with any transport which conforms to STD 11, RFC822 [RFC822]

without MIME, which is the current practice in Japan to use "ISO-

20220-JP" [2022JP].

If "ISO-2022-INT-1" is used with MIME, MIME charset name may be given

as follows:

Content-Type: text/plain; charset=iso-2022-int-1

Even if charset parameters are omitted, multilingual applications

should still assume "ISO-2022-INT-1" or its latest available

successor (see the section "Future Extension Plan"), not US-ASCII of

MIME default, is used.

The "ISO-2022-INT-1" encoding is already in 7-bit form, so it is not

necessary to use a Content-Transfer-Encoding header. It should be

noted that applying the Base64 or Quoted-Printable encoding will

render the message unreadable in non-MIME-compliant software.

"ISO-2022-INT-1" may also be used in mail headers. If bare STD11,

APCCIRN-I18N Expires on Aug 4, 1994 [Page 5]

.

INTERNET DRAFT Internet Multilingual Text Encoding February 1994

RFC822 without MIME is used, appropriate quoting of special

characters as "quoted string" might be necessary with structured

headers, which might not be supported in all the common environment.

In MIME headers, Both "B" and "Q" encoding could be useful with

"ISO-2022-INT-1" text.

Future Extension Plan

Future extensions of "ISO-2022-INT-1" will be named "ISO-2022-INT-2",

"ISO-2022-INT-3" and so on. MIME charset names beginning with "ISO-

2022-INT-" are reserved for them. The family of encoding has an

aggregated name: "ISO-2022-INT-*".

The extensions will be solely by adding extra character sets of ISO

2022, though other extensions such as for bidirectionality support

are possible. To avoid duplicated assignment of escape sequences,

formal ISO registry [ISOREG] will, in general, be required, which

does not deny the future possibility of IANA registration of escape

sequences for private use purposes.

The current feature of an initial designation of KS C 5601 to G1 will

be removed in the versions of near future. Users of ISO-2022-INT-1

are recommended to explicitly designate KS C 5601 to G1.

To minimize the number of character sets, those which is already

covered by the larger character sets and not so widely used should

not be added. For example, Katakana character set of "JIS X 0201-

Kana" is omitted because the set is completely covered by "JIS X

0208-1978" and not used at all in the Internet community of Japan.

In any event, the property of "ISO-2022-INT-1" that:

Though ISO 2022 [ISO2022] and related standards permits long term,

persistent states, "ISO-2022-INT-1" is designed not to need such

states be preserved between lines. Applications such as pagers

and editors which randomly seek within a text file encoded with

"ISO-2022-INT-1" can assume that the state is same as that of the

beginning of the text.

will be preserved.

References

[2022JP] Murai, J., Crispin, M., and E. van der Poel, "Japanese

Character Encoding for Internet Messages", RFC 1468, June

1993.

[2022JP2] Ohta, M., and Handa, K. "ISO-2022-JP-2: Multilingual

APCCIRN-I18N Expires on Aug 4, 1994 [Page 6]

.

INTERNET DRAFT Internet Multilingual Text Encoding February 1994

Extension of ISO-2022-JP", RFC 1554, December 1993.

[2022KR] U. Choi, K., and Chon, H. Park, "Korean Character Encoding

for Internet Messages", RFC 1557, December 1993.

[ASCII] American National Standards Institute, "Coded character set

-- 7-bit American national standard code for information

interchange", ANSI X3.4-1986.

[ISO2022] International Organization for Standardization (ISO),

"Information processing -- ISO 7-bit and 8-bit coded

character sets -- Code extension techniques", International

Standard, Ref. No. ISO 2022-1986 (E).

[ISOREG] International Organization for Standardization (ISO),

"International Register of Coded Character Sets To Be Used

With Escape Sequences".

[KSC5601] Korea Industrial Standards Association, "Code for

Information Interchange (Hangul and Hanja)," Korean

Industrial Standard, 1987, Ref. No. KS C 5601-1987.

[LUNDE] Lunde, K., "Understanding Japanese Information Processing,",

O'Reilly & Associates, Inc., ISBN 1-56592-043-0, 1993.

[MIME1] Borenstein, N., and Freed, N., "MIME (Multipurpose Internet

Mail Extensions) Part One: Mechanisms for Specifying and

Describing the Format of Internet Message Bodies", RFC 1521,

September 1993.

[MIME2] Moore, K., "MIME (Multipurpose Internet Mail Extensions) Part

Two: Message Header Extensions for Non-ASCII Text", RFC 1522,

September 1993.

[MULE] Nishikimi, M., Handa, K., and Tomura, S., "Mule: MULtilingual

Enhancement to GNU Emacs", Proc. of INET'93, August, 1993.

[RFC822] Crocker, D., "Standard for the Format of ARPA Internet Text

Messages", STD 11, RFC 822, August 1982.

[RFC1036] Horton M., and Adams, R., "Standard for Interchange of

USENET Messages", RFC 1036, AT&T Bell Laboratories, Center

for Seismic Studies, December 1987.

Acknowledgements

This memo is the product of APCCIRN (Asian Pacific CCIRN)

Internationalization group and reviewed by various people in a news

APCCIRN-I18N Expires on Aug 4, 1994 [Page 7]

.

INTERNET DRAFT Internet Multilingual Text Encoding February 1994

group: fj.kanji and by a mailing list: jp-msg@iij.ad.jp. Many people

have contributed. In particular, Prof. Eiichi Wada of Tokyo

University and Ken Lunde of Adobe Systems, Inc. has helped us based

on profound knowledge in ISO 2022 and related standards. Uhhyung

Choi of Korea Advanced Institute of Science and Technology has

contributed to make the encoding upper compatible to ISO-2022-KR.

Prof. Kilnam Chon of Korea Advanced Institute of Science and

Technology and Prof. Jun Mirai of Keio University have provided the

framework of international cooperation. The Authors wish to thank

all the people who have helped to provide the memo.

Security Considerations

Security issues are not discussed in this memo.

Authors' Addresses

Masataka Ohta

Tokyo Institute of Technology

2-12-1, O-okayama, Meguro-ku,

Tokyo 152, JAPAN

Phone: +81-3-5499-7084

Fax: +81-3-3729-1940

EMail: mohta@cc.titech.ac.jp

Ken'ichi Handa

Electrotechnical Laboratory

Umezono 1-1-4, Tsukuba,

Ibaraki 305, JAPAN

Phone: +81-298-58-5916

Fax: +81-298-58-5918

EMail: handa@etl.go.jp

APCCIRN-I18N Expires on Aug 4, 1994 [Page 8]

.

From apccirn-sec Tue Feb 1 22:55:23 1994

Received: from necom830.cc.titech.ac.jp by nic.nm.kr (8.6.4/8.6.4)

id WAA08087; Tue, 1 Feb 1994 22:54:52 +0900

Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Tue, 1 Feb 94 22:45:40 +0900

From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>

Message-Id: <9402011345.AA14454@necom830.cc.titech.ac.jp>

Subject: Instructions to RFC translators (resend)

To: apccirn-i18n@nic.nm.kr

Date: Tue, 1 Feb 94 22:45:38 JST

In-Reply-To: <no.id>; from "mohta" at Feb 1, 94 10:37 pm

X-Mailer: ELM [version 2.3 PL11]

Sorry if I have post a garbage.

I think the following Internet Draft should be important for

internationalization.

Any comments?

Masataka Ohta

------------------------------------------------------------------------

INTERNET DRAFT M. Ohta

draft-ohta-translation-instr-00.txt Tokyo Institute of Technology

January 1994

Instructions to RFC Translators

Status of this Memo

This document is an Internet-Draft. Internet-Drafts are working

documents of the Internet Engineering Task Force (IETF), its areas,

and its working groups. Note that other groups may also distribute

working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six

months. Internet-Drafts may be updated, replaced, or obsoleted by

other documents at any time. It is not appropriate to use Internet-

Drafts as reference material or to cite them other than as a

``working draft'' or ``work in progress.''

To learn the current status of any Internet-Draft, please check the

1id-abstracts.txt listing contained in the Internet-Drafts Shadow

Directories on ds.internic.net, nic.nordu.net, ftp.nisc.sri.com, or

munnari.oz.au.

Abstract

A framework is given to coordinate the worldwide effort of RFC

translation into various languages.

Translated RFCs will be encoded in 7bit ISO 2022 and will have a name

"TRFC NNNN-LLL-MM" where "NNNN" is the RFC number of the original

RFC, "LLL" is the language code of ISO 639 and "MM" is the sequence

number to identify different translations.

Formatting rules similar to ASCII RFCs are also described.

M. Ohta Expires on August 4, 1994 [Page 1]

.

INTERNET DRAFT Instructions to RFC Translators January 1994

Index

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3

2. Editorial Policy . . . . . . . . . . . . . . . . . . . . . . 4

3. Format Rules . . . . . . . . . . . . . . . . . . . . . . . . 4

3a. Plain Text Format Rules . . . . . . . . . . . . . . . . . . 5

3b. PostScript Format Rules . . . . . . . . . . . . . . . . . . 5

4. Headers and Footers . . . . . . . . . . . . . . . . . . . . 6

4a. First Page . . . . . . . . . . . . . . . . . . . . . . . . 6

4b. Running Headers . . . . . . . . . . . . . . . . . . . . . . 8

4c. Running Footers . . . . . . . . . . . . . . . . . . . . . . 8

5. Status Section . . . . . . . . . . . . . . . . . . . . . . . 8

6. Translation History Section . . . . . . . . . . . . . . . . 9

7. Contact . . . . . . . . . . . . . . . . . . . . . . . . . . 9

8. RFC Index . . . . . . . . . . . . . . . . . . . . . . . . . 9

9. Copyright Considerations . . . . . . . . . . . . . . . . . . 10

10. Security Considerations . . . . . . . . . . . . . . . . . . 10

11. References . . . . . . . . . . . . . . . . . . . . . . . . . 10

12. Author's Address . . . . . . . . . . . . . . . . . . . . . . 10

M. Ohta Expires on August 4, 1994 [Page 2]

.

INTERNET DRAFT Instructions to RFC Translators January 1994

1. Introduction

This memo provides information about the translation of RFCs, and

certain policies relating to the publication of the translated RFCs

(TRFCs). This memo gives the minimal framework to coordinate the

world wide translation efforts.

Memos translated from the existing RFCs or TRFCs may be submitted as

TRFCs by anyone to the RFC Editor.

TRFCs are distributed online by being stored as public access files.

The online files are copied by the interested people and printed or

displayed at their site on their equipment. This means that the

format of the online files must meet the constraints of a wide

variety of printing and display equipment. (TRFCs may also be

returned via e-mail in response to an e-mail query, or TRFCs may be

found using information and database searching tools such as Gopher,

Wais, WWW, or Mosaic.)

TRFCs are published in plain text encoded with ISO-2022-INT-*

[2022INT]. ISO-2022-INT-* is chosen because 1) it is based on ISO

2022, an internationally widely available standard, 2) it is 7 bit

and can safely be transferred by SMTP and FTP ASCII mode and 3) it

is, in itself, multilingual and, thus, no designation or negotiation

to use other encoding system is necessary.

TRFCs in PostScript are encoded with ASCII.

In any event, TRFCs are secondary or alternative versions and the

original ASCII RFC is the primary version for reference purposes.

It is unlikely that, for each language in the world, the RFC Editor

hires professional translators who also have engineering knowledge of

the Internet to check the quality of translation. Moreover, French

translations may be provided by France, Belgium, Canada, New

Caledonia or even Japan, from which, it is politically difficult to

choose the best one. Thus, all such translations are treated equally

regardless of the quality of the translation and serial numbers are

assigned to them. The quality could be as bad as that of machine

translation. Or it may have even better quality than the original

RFC, if it is written in the native language of the author of the

original RFC. In any event, the primary version is the untranslated

ASCII one and those who need the authoritative information should not

depend on TRFCs.

Multiple versions are also necessary to accommodate the versions of

improved translation quality. That is, while improved RFCs will have

M. Ohta Expires on August 4, 1994 [Page 3]

.

INTERNET DRAFT Instructions to RFC Translators January 1994

new RFC numbers, both the original and the improved translation must

share the same RFC number of the original ASCII one.

The TRFCs will have a name "TRFC NNNN-LLL-MM" where "NNNN" is the RFC

number of the original RFC, "LLL" is the language code of ISO 639

[ISO639] (ISO639 is not the two letter country code of ISO3166, of

course) and "MM" is the sequence number assigned by the RFC Editor to

identify different versions of translations. For example, the first

Japanese translation of RFC1543 will be named "TRFC 1543-JA-1".

TRFCs will generally have file names of "trfcNNNN-LLL-MM.txt" (plain

text) or "trfcNNNN-LLL-MM.ps" (PostScript).

2. Editorial Policy

TRFCs are reviewed by the RFC Editor and possibly by other reviewers

he selects.

Usually, the review is only on the formalities described in this memo

and no further check will be done as to the quality of translation.

The result of the review may be to suggest to the author some

improvements to the document before publication.

In some cases it may be determined that the submitted document is not

appropriate material to be published as a TRFC.

The RFC Editor may make minor changes to the document, especially in

the areas of style and format, but on some occasions also to the

text. Sometimes the RFC Editor will undertake to make more

significant changes, especially when the format rules (see below) are

not followed. However, more often the memo will be returned to the

author for the additional work.

Due to various time pressures on the RFC Editorial staff the time

elapsed between submission and publication can vary greatly. It is

always acceptable to query (ping) the RFC Editor about the status of

a TRFC during this time (but not more than once a week). The two

weeks preceding an IETF meeting are generally very busy, so TRFCs

submitted shortly before an IETF meeting are most likely to be

published after the meeting.

3. Format Rules

To meet the distribution constraints, the following rules established

for the two allowed formats for TRFCs: plain text and PostScript.

The RFC Editor attempts to ensure a consistent RFC style. It is much

M. Ohta Expires on August 4, 1994 [Page 4]

.

INTERNET DRAFT Instructions to RFC Translators January 1994

easier to do this if the submission matches the style of the most

recent RFCs and TRFCs. Please do look at some recent RFCs and TRFCs

and prepare yours in the same style.

You must submit an editable online document to the RFC Editor. The

RFC Editor may require minor changes in format or style and will

insert the actual sequence number.

3a. Plain Text Format Rules

The character code is ISO-2022-INT-* [2022INT].

If printed on paper by common printers, standard page size,

excluding margins, is 7.2 by 10 inches.

Each page must be followed by a form feed on a line by itself.

Each line must be followed by carriage return and line feed.

No overstriking (or underlining) is allowed, unless the language

used needs some special characters represented only by

overstriking (current draft of ISO-2022-INT-* does not contain any

such characters).

These "height" and "width" advices include any headers, footers,

page numbers, or left side indenting.

Use single spaced text within a paragraph, and one blank line

between paragraphs.

TRFCs in plain text Format must be submitted to the RFC Editor in

e-mail messages (or as online files) in the finished publication

format.

3b. PostScript Format Rules

Standard page size is 8 1/2 by 11 inches.

Margin of 1 inch on all sides (top, bottom, left, and right).

ASCII characters in main text should have a point size of no less

than 10 points with a line spacing of 12 points.

ASCII characters in footnotes and graph notations no smaller than

8 points with a line spacing of 9.6 points.

Three fonts are acceptable: Helvetica, Times Roman, and Courier.

Plus their bold-face and italic versions. These are the three

M. Ohta Expires on August 4, 1994 [Page 5]

.

INTERNET DRAFT Instructions to RFC Translators January 1994

standard fonts on most PostScript printers. Shape information of

other fonts must be included explicitly within the PostScript

text.

Prepare diagrams and images based on lowest common denominator

PostScript. Consider common PostScript printer functionality and

memory requirements.

The following PostScript commands should not be used:

initgraphics, erasepage, copypage, grestoreall, initmatrix,

initclip, banddevice, framedevice, nulldevice and renderbands.

These PostScript rules are likely to changed and expanded as

experience is gained.

TRFCs in PostScript Format may be submitted to the RFC Editor in

e-mail messages (or as online files). If you plan to submit a

document in PostScript please consult the RFC Editor first.

4. Headers and Footers

There is the first page heading, the running headers, and the running

footers.

All headers and footers (except for the translated title) must be

coded with ASCII.

4a. First Page

On the first page there is no running header. The top of the

first page has the following items:

Network Working Group

The traditional heading for the group that founded the RFC

series. This appears on the first line on the left hand side

of the heading.

Request for Comments: NNNN-LLL-MM

Identifies this as a request for comments and specifies the

number. Indicated on the second line on the left side. The

actual value of "MM" is filled in at the last moment before

publication by the RFC Editor.

Author

The author's name (first initial and last name only) indicated

M. Ohta Expires on August 4, 1994 [Page 6]

.

INTERNET DRAFT Instructions to RFC Translators January 1994

on the first line on the right side of the heading.

Organization

The author's organization, indicated on the second line on the

right side.

Translator

The translator's name (first initial and last name only)

preceded by a phrase: "Translated by" indicated on the third

line on the right side of the heading.

Organization

The translator's organization, indicated on the fourth line on

the right side.

Date

This is the Month and Year of the original RFC Publication

followed by the parenthesized publication date of the

translated version. For example:

January 1994 (translated on February 1994)

Indicated on the fifth line on the right side.

Updates or Obsoletes

If the original RFC Updates or Obsoletes another RFC, this is

indicated as third line on the left side of the heading.

Category

The category header of the TRFC is always Informational. This

is indicated on the third (if there is no Updates or Obsoletes

indication) or fourth line of the left side.

Original Category

The category of the original RFC, one of: Standards Track,

Informational, or Experimental. This is indicated on the

fourth (if there is no Updates or Obsoletes indication) or

fifth line of the left side.

Title

M. Ohta Expires on August 4, 1994 [Page 7]

.

INTERNET DRAFT Instructions to RFC Translators January 1994

The title appears, centered, below the rest of the heading.

Translated Title

Translated title with non ASCII characters may follow the

original title.

If there are multiple authors or translators and if the multiple

authors or translators are from multiple organizations the right

side heading may have additional lines to accommodate them and to

associate the authors and translators with the organizations

properly.

4b. Running Headers

The running header in one line (on page 2 and all subsequent

pages) has the TRFC name on the left (RFC NNNN-LLL-MM), the

(possibly a shortened form) ASCII title centered, and the original

date (Month Year) on the right.

4c. Running Footers

The running footer in one line (on all pages) has the author's

last name on the left and the page number on the right ([Page N]).

5. Status Section

Each TRFC must include on its first page the "Status of this Memo"

section which contains a paragraph describing the type of the TRFC

first in English with ASCII. Then the status section must be

repeated in the language the RFC is translated into.

The content of this section will be one of the three following

statements.

Standards Track

"This memo provides information for the Internet community. This

memo does not specify an Internet standard of any kind. This memo

is a translation of RFC-NNNN. The quality of the translation is,

by no means, assured. Use at your own risk. The original

document: RFC-NNNN specifies an Internet standards track protocol

for the Internet community, and requests discussion and

suggestions for improvements. Please refer to the current edition

of the "Internet Official Protocol Standards" (STD 1) for the

standardization state and status of this protocol. Distribution

of this memo is unlimited. Modification of this memo to improve

the quality of translation and the distribution of the modified

M. Ohta Expires on August 4, 1994 [Page 8]

.

INTERNET DRAFT Instructions to RFC Translators January 1994

result through the RFC Editor is also unlimited."

Experimental

"This memo provides information for the Internet community. This

memo does not specify an Internet standard of any kind. This memo

is a translation of RFC-NNNN. The quality of the translation is,

by no means, assured. Use at your own risk. The original memo:

RFC-NNNN defines an Experimental Protocol for the Internet

community. This memo does not specify an Internet standard of any

kind. Discussion and suggestions for improvement are requested.

Distribution of this memo is unlimited. Modification of this memo

to improve the quality of translation and the distribution of the

modified result through the RFC Editor is also unlimited."

Informational

"This memo provides information for the Internet community. This

memo does not specify an Internet standard of any kind. This memo

is a translation of RFC-NNNN. The quality of the translation is,

by no means, assured. Use at your own risk. The original memo:

RFC-NNNN provides information for the Internet community. This

memo does not specify an Internet standard of any kind.

Distribution of this memo is unlimited. Modification of this memo

to improve the quality of translation and the distribution of the

modified result through the RFC Editor is also unlimited."

6. Translation History Section

Each TRFC must have at the very end a section giving the brief

history of the translation and the translator's address, including

the name and postal address, the telephone number, (optional: a FAX

number) and the Internet e-mail address.

The section must be written in English and coded with ASCII.

7. Contact

To contact the RFC Editor send an email message to

"RFC-Editor@ISI.EDU".

8. RFC Index

Several organizations maintain TRFC Index files, generally using the

file name "rfc-index-LLL.txt". The contents of such a file copied

from one site may not be identical to that copied from another site.

M. Ohta Expires on August 4, 1994 [Page 9]

.

INTERNET DRAFT Instructions to RFC Translators January 1994

9. Copyright Considerations

This memo does not address the issue on how the permission for the

translation can be obtained from the copyright holders of the

original RFCs, except that the translation and the redistribution

after the translation of this memo is unlimited.

10. Security Considerations

This memo raises no security issues.

11. References

[2022INT]

(to be pulished as an Internet Draft with file name of

"draft-ohta-text-encoding-nn.txt", RFC 1554 shows

rough sketch on how will it be)

[ISO639]

International Organization for Standardization (ISO),

"Code for the representation of names of languages",

International Standard, Ref. No. ISO 639:1988 (E/F)

[RFCAUTH]

Postel, J., "Instructions to RFC Authors", RFC 1543,

October 1993.

12. Author's Address

Masataka Ohta

Tokyo Institute of Technology

2-12-1, O-okayama, Meguro-ku,

Tokyo 152, JAPAN

Phone: +81-3-5499-7084

Fax: +81-3-3729-1940

EMail: mohta@cc.titech.ac.jp

M. Ohta Expires on August 4, 1994 [Page 10]

.

From apccirn-sec Tue Feb 1 22:47:04 1994

Received: from necom830.cc.titech.ac.jp by nic.nm.kr (8.6.4/8.6.4)

id WAA08077; Tue, 1 Feb 1994 22:46:36 +0900

Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Tue, 1 Feb 94 22:37:17 +0900

From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>

Message-Id: <9402011337.AA14379@necom830.cc.titech.ac.jp>

Subject: Instructions to RFC translators

To: apccirn-i18n@nic.nm.kr

Date: Tue, 1 Feb 94 22:37:15 JST

In-Reply-To: <no.id>; from "mohta" at Feb 1, 94 1:01 pm

X-Mailer: ELM [version 2.3 PL11]

>

> I'm going to post the finished draft (the slim one) this afternoon.

>

> Any objections or corrections?

>

> Masataka Ohta

> ------------------------------------------------------------------------

>

>

>

>

>

> INTERNET DRAFT APCCIRN-I18N

> draft-ohta-text-encoding-00.txt February 1994

>

>

> Internet Multilingual Text Encoding: ISO-2022-INT-*

>

> Status of this Memo

>

> This document is an Internet-Draft. Internet-Drafts are working

> documents of the Internet Engineering Task Force (IETF), its areas,

> and its working groups. Note that other groups may also distribute

> working documents as Internet-Drafts.

>

> Internet-Drafts are draft documents valid for a maximum of six

> months. Internet-Drafts may be updated, replaced, or obsoleted by

> other documents at any time. It is not appropriate to use Internet-

> Drafts as reference material or to cite them other than as a

> ``working draft'' or ``work in progress.''

>

> To learn the current status of any Internet-Draft, please check the

> 1id-abstracts.txt listing contained in the Internet-Drafts Shadow

> Directories on ds.internic.net, nic.nordu.net, ftp.nisc.sri.com, or

> munnari.oz.au.

>

> Abstract

>

> APCCIRN internationalization group has, based on the experience with

> "ISO-2022-JP-2" (RFC 1554), designed a multilingual text encoding

> scheme, "ISO-2022-INT-1", as an extension of "ISO-2022-JP" (RFC 1468)

> and "ISO-2022-KR" (RFC 1557).

>

> The encoding is ASCII compatible and 7-bit, thus, can be used mixed

> with any ASCII compatible encoding. The encoding is designed to be

> as stateless as practically possible with ISO 2022. That is, no state

> information needs to be preserved between lines.

>

> "ISO-2022-INT-1" and its successors have an aggregated name: "ISO-

> 2022-INT-*".

>

> Introduction

>

> This memo describes a text encoding scheme: "ISO-2022-INT-1", which

> is intended to be a multilingual text encoding scheme of the Internet

> including, but not limited to, for electronic mail [RFC822] and

> network news [RFC1036]. The encoding is also useful in multilingual

> text files. The encoding is a multilingual extension of "ISO-2022-

> JP" [2022JP] and "ISO-2022-KR" [2022KR]. The encoding is supported

> by an Emacs based multilingual text editor: MULE [MULE].

>

>

>

> APCCIRN-I18N Expires on Aug 4, 1994 [Page 1]

> .

> INTERNET DRAFT Internet Multilingual Text Encoding February 1994

>

>

> The name, "ISO-2022-INT-1", is intended to be used in the "charset"

> parameter field of MIME headers (see [MIME1] and [MIME2]).

>

> Description

>

> The text with "ISO-2022-INT-1" starts in ASCII [ASCII] designated to

> G0 invoked as GL and KS C 5601 [KSC5601] designated to to G1, and

> switches to other character sets of ISO 2022 [ISO2022] through

> limited combinations of designation/invocation sequences. All the

> characters are encoded with 7 bits only.

>

> At the beginning of text, the existence of an announcer sequence:

> "ESC 2/0 4/2" and a designation/invocation sequence: "ESC 2/8 4/2 SI

> ESC 2/4 2/9 4/3 ESC 2/10 7/14 ESC 2/11 7/14" are (though omitted)

> assumed. The same designation/invocation sequence is also assumed

> (though unnecessary and, thus, omitted) at the beginning of each

> line. Thus, characters of 94 character sets are designated to G0 or

> G1 and invoked as GL by SI (shift in, "0/15") and SO (shift out,

> "0/14") each. Characters of 96 character sets are designated to G1

> and invoked as GL by SO. To make the encoding almost unique, a

> character set is designated only to either G0 or G1 and not to both.

>

> For example, the escape sequence "ESC 2/4 4/2" or "ESC $ B" indicates

> that the bytes following the escape sequence are Japanese JIS X

> 0208-1983 characters, which are encoded in two bytes each. A double

> byte sequence enclosed by SO and SI indicates a KS C 5601 [KSC5601]

> string unless other character sets are designated to G1. The escape

> sequence "ESC 2/13 4/1" or "ESC - A" indicates that ISO 8859-1 is

> designated to G1. After the designation, a character code "4/1" is

> interpreted to represent a character "A with acute", not ASCII "A".

>

> The following table gives the escape sequences and the character sets

> used in "ISO-2022-INT-1" messages. The reg# is the registration

> number in ISO's registry [ISOREG].

>

> 94 character sets

> reg# character set ESC sequence designated to

> ------------------------------------------------------------------

> 6 ASCII ESC 2/8 4/2 ESC ( B G0

> 14 JIS X 0201-Roman ESC 2/8 4/10 ESC ( J G0

>

> 94*94 character sets

> reg# character set ESC sequence designated to

> ------------------------------------------------------------------

> 42 JIS X 0208-1978 ESC 2/4 4/0 ESC $ @ G0

> 58 GB 2312-80 ESC 2/4 4/1 ESC $ A G0

> 87 JIS X 0208-1983 ESC 2/4 4/2 ESC $ B G0

> 149 KS C 5601-1987 ESC 2/4 2/9 4/3 ESC $ ) C G1

>

>

>

> APCCIRN-I18N Expires on Aug 4, 1994 [Page 2]

> .

> INTERNET DRAFT Internet Multilingual Text Encoding February 1994

>

>

> 159 JIS X 0212-1990 ESC 2/4 2/8 4/4 ESC $ ( D G0

> 171 CNS 11643-1986-1 ESC 2/4 2/8 4/7 ESC $ ( G G0

> 172 CNS 11643-1986-2 ESC 2/4 2/8 4/8 ESC $ ( H G0

>

> 96 character sets

> reg# character set ESC sequence designated to

> ------------------------------------------------------------------

> 100 ISO8859-1 ESC 2/13 4/1 ESC - A G1

> 126 ISO8859-7(Greek) ESC 2/13 4/6 ESC - F G1

>

> Handling of code points not specified in each standard is

> implementation dependent. For further information about the

> character sets and the escape sequences, see [ISO2022] and [ISOREG].

> Some Asian standards are also described in chapter 3 and 4 of

> [LUNDE].

>

> If there is any G0 designation other than ASCII in text, there must

> be a switch back to ASCII before a space character "2/0" (but not

> necessarily before "2/0" code of 96 character set, which usually

> represent non-breaking space) or control characters such as tab or

> CRLF. If there is any G1 designation other than KS C [KSC5601] in

> text, there must be a switch back to KS C before the end of line. If

> there is any G1 invocation in text, there must be a switch back to G0

> invocation before a space character or control characters such as tab

> or CRLF. This means that the next line starts in the ASCII character

> set that was switched to before the end of the previous line.

>

> Though ISO 2022 [ISO2022] and related standards permits long term,

> persistent states, "ISO-2022-INT-1" is designed not to need such

> states be preserved between lines. Applications such as pagers and

> editors which randomly seek within a text file encoded with "ISO-

> 2022-INT-1" can assume that the state is same as that of the

> beginning of the text.

>

> The text will end in ASCII designated to G0.

>

> Left-to-right directionality is assumed if the text is displayed

> horizontally.

>

> Users of "ISO-2022-INT-1" must be aware that some common transport

> such as old Bnews in Japan can not relay a 7-bit value "7/15"

> (decimal 127), which is used to encode, say, "y with diaeresis" of

> ISO 8859-1.

>

> Other restrictions are given in the Formal Syntax section below.

>

> Formal Syntax

>

>

>

>

> APCCIRN-I18N Expires on Aug 4, 1994 [Page 3]

> .

> INTERNET DRAFT Internet Multilingual Text Encoding February 1994

>

>

> The notational conventions used here are identical to those used in

> STD11, RFC 822 [RFC822].

>

> The * (asterisk) convention is as follows:

>

> l*m something

>

> meaning at least l and at most m somethings, with l and m taking

> default values of 0 and infinity, respectively.

>

> text = *(line CRLF)

>

> line = *(single-byte-char /

> (*g0-segment reset-desig-seq) /

> g1-segment /

> g1-desig-seq )

> ; note: must end KS C

> ; designated to G1

>

> g0-segment = single-byte-g0-segment /

> double-byte-g0-segment

>

> single-byte-g0-segment = single-byte-g0-seq *single-byte-char

>

> double-byte-g0-segment = double-byte-g0-seq *(one-of-94 one-of-94)

>

> g1-segment = single-byte-g1-96-segment /

> double-byte-g1-segment

> ; note: an appropriate segment

> ; should be selected according

> ; to the current state of G1

> ; designation

>

> single-byte-g1-96-segment = SO *one-of-96 SI

>

> double-byte-g1-segment = SO *(one-of-94 one-of-94) SI

>

> reset-desig-seq = ESC "(" "B"

>

> single-byte-g0-seq = ESC "(" ("B" / "J")

>

> double-byte-g0-seq = (ESC "$" ("@" / "A" / "B")) /

> (ESC "$" "(" ("D" / "G" / "H")

>

> g1-desig-seq = single-byte-g1-seq / double-byte-g1-seq

>

> single-byte-g1-seq = (ESC "-" ("A" / "F"))

>

>

>

>

> APCCIRN-I18N Expires on Aug 4, 1994 [Page 4]

> .

> INTERNET DRAFT Internet Multilingual Text Encoding February 1994

>

>

> double-byte-g1-seq = ESC "$" ")" "C"

>

> CRLF = CR LF

>

> ; ( Octal, Decimal.)

>

> ESC = <ISO 2022 ESC, escape> ; ( 33, 27.)

>

> SI = <ISO 2022 SI, shift-in> ; ( 17, 15.)

>

> SO = <ISO 2022 SO, shift-out> ; ( 16, 14.)

>

> CR = <ASCII CR, carriage return>; ( 15, 13.)

>

> LF = <ASCII LF, linefeed> ; ( 12, 10.)

>

> one-of-94 = <any one of 94 values> ; (41-176, 33.-126.)

>

> one-of-96 = <any one of 96 values> ; (40-177, 32.-127.)

>

> 7BIT = <any 7-bit value> ; ( 0-177, 0.-127.)

>

> single-byte-char = <any 7BIT, including bare CR & bare LF, but NOT

> including CRLF, and not including ESC, SI, SO>

>

> Mail System Considerations

>

> "ISO-2022-INT-1" is designed to be purely 7-bit, so that it can be

> used with any transport which conforms to STD 11, RFC822 [RFC822]

> without MIME, which is the current practice in Japan to use "ISO-

> 20220-JP" [2022JP].

>

> If "ISO-2022-INT-1" is used with MIME, MIME charset name may be given

> as follows:

>

> Content-Type: text/plain; charset=iso-2022-int-1

>

> Even if charset parameters are omitted, multilingual applications

> should still assume "ISO-2022-INT-1" or its latest available

> successor (see the section "Future Extension Plan"), not US-ASCII of

> MIME default, is used.

>

> The "ISO-2022-INT-1" encoding is already in 7-bit form, so it is not

> necessary to use a Content-Transfer-Encoding header. It should be

> noted that applying the Base64 or Quoted-Printable encoding will

> render the message unreadable in non-MIME-compliant software.

>

> "ISO-2022-INT-1" may also be used in mail headers. If bare STD11,

>

>

>

> APCCIRN-I18N Expires on Aug 4, 1994 [Page 5]

> .

> INTERNET DRAFT Internet Multilingual Text Encoding February 1994

>

>

> RFC822 without MIME is used, appropriate quoting of special

> characters as "quoted string" might be necessary with structured

> headers, which might not be supported in all the common environment.

> In MIME headers, Both "B" and "Q" encoding could be useful with

> "ISO-2022-INT-1" text.

>

> Future Extension Plan

>

> Future extensions of "ISO-2022-INT-1" will be named "ISO-2022-INT-2",

> "ISO-2022-INT-3" and so on. MIME charset names beginning with "ISO-

> 2022-INT-" are reserved for them. The family of encoding has an

> aggregated name: "ISO-2022-INT-*".

>

> The extensions will be solely by adding extra character sets of ISO

> 2022, though other extensions such as for bidirectionality support

> are possible. To avoid duplicated assignment of escape sequences,

> formal ISO registry [ISOREG] will, in general, be required, which

> does not deny the future possibility of IANA registration of escape

> sequences for private use purposes.

>

> The current feature of an initial designation of KS C 5601 to G1 will

> be removed in the versions of near future. Users of ISO-2022-INT-1

> are recommended to explicitly designate KS C 5601 to G1.

>

> To minimize the number of character sets, those which is already

> covered by the larger character sets and not so widely used should

> not be added. For example, Katakana character set of "JIS X 0201-

> Kana" is omitted because the set is completely covered by "JIS X

> 0208-1978" and not used at all in the Internet community of Japan.

>

> In any event, the property of "ISO-2022-INT-1" that:

>

> Though ISO 2022 [ISO2022] and related standards permits long term,

> persistent states, "ISO-2022-INT-1" is designed not to need such

> states be preserved between lines. Applications such as pagers

> and editors which randomly seek within a text file encoded with

> "ISO-2022-INT-1" can assume that the state is same as that of the

> beginning of the text.

>

> will be preserved.

>

> References

>

> [2022JP] Murai, J., Crispin, M., and E. van der Poel, "Japanese

> Character Encoding for Internet Messages", RFC 1468, June

> 1993.

>

> [2022JP2] Ohta, M., and Handa, K. "ISO-2022-JP-2: Multilingual

>

>

>

> APCCIRN-I18N Expires on Aug 4, 1994 [Page 6]

> .

> INTERNET DRAFT Internet Multilingual Text Encoding February 1994

>

>

> Extension of ISO-2022-JP", RFC 1554, December 1993.

>

> [2022KR] U. Choi, K., and Chon, H. Park, "Korean Character Encoding

> for Internet Messages", RFC 1557, December 1993.

>

> [ASCII] American National Standards Institute, "Coded character set

> -- 7-bit American national standard code for information

> interchange", ANSI X3.4-1986.

>

> [ISO2022] International Organization for Standardization (ISO),

> "Information processing -- ISO 7-bit and 8-bit coded

> character sets -- Code extension techniques", International

> Standard, Ref. No. ISO 2022-1986 (E).

>

> [ISOREG] International Organization for Standardization (ISO),

> "International Register of Coded Character Sets To Be Used

> With Escape Sequences".

>

> [KSC5601] Korea Industrial Standards Association, "Code for

> Information Interchange (Hangul and Hanja)," Korean

> Industrial Standard, 1987, Ref. No. KS C 5601-1987.

>

> [LUNDE] Lunde, K., "Understanding Japanese Information Processing,",

> O'Reilly & Associates, Inc., ISBN 1-56592-043-0, 1993.

>

> [MIME1] Borenstein, N., and Freed, N., "MIME (Multipurpose Internet

> Mail Extensions) Part One: Mechanisms for Specifying and

> Describing the Format of Internet Message Bodies", RFC 1521,

> September 1993.

>

> [MIME2] Moore, K., "MIME (Multipurpose Internet Mail Extensions) Part

> Two: Message Header Extensions for Non-ASCII Text", RFC 1522,

> September 1993.

>

> [MULE] Nishikimi, M., Handa, K., and Tomura, S., "Mule: MULtilingual

> Enhancement to GNU Emacs", Proc. of INET'93, August, 1993.

>

> [RFC822] Crocker, D., "Standard for the Format of ARPA Internet Text

> Messages", STD 11, RFC 822, August 1982.

>

> [RFC1036] Horton M., and Adams, R., "Standard for Interchange of

> USENET Messages", RFC 1036, AT&T Bell Laboratories, Center

> for Seismic Studies, December 1987.

>

> Acknowledgements

>

> This memo is the product of APCCIRN (Asian Pacific CCIRN)

> Internationalization group and reviewed by various people in a news

>

>

>

> APCCIRN-I18N Expires on Aug 4, 1994 [Page 7]

> .

> INTERNET DRAFT Internet Multilingual Text Encoding February 1994

>

>

> group: fj.kanji and by a mailing list: jp-msg@iij.ad.jp. Many people

> have contributed. In particular, Prof. Eiichi Wada of Tokyo

> University and Ken Lunde of Adobe Systems, Inc. has helped us based

> on profound knowledge in ISO 2022 and related standards. Uhhyung

> Choi of Korea Advanced Institute of Science and Technology has

> contributed to make the encoding upper compatible to ISO-2022-KR.

> Prof. Kilnam Chon of Korea Advanced Institute of Science and

> Technology and Prof. Jun Mirai of Keio University have provided the

> framework of international cooperation. The Authors wish to thank

> all the people who have helped to provide the memo.

>

> Security Considerations

>

> Security issues are not discussed in this memo.

>

> Authors' Addresses

>

> Masataka Ohta

> Tokyo Institute of Technology

> 2-12-1, O-okayama, Meguro-ku,

> Tokyo 152, JAPAN

>

> Phone: +81-3-5499-7084

> Fax: +81-3-3729-1940

> EMail: mohta@cc.titech.ac.jp

>

>

> Ken'ichi Handa

> Electrotechnical Laboratory

> Umezono 1-1-4, Tsukuba,

> Ibaraki 305, JAPAN

>

> Phone: +81-298-58-5916

> Fax: +81-298-58-5918

> EMail: handa@etl.go.jp

>

>

>

>

>

>

>

>

>

>

>

>

>

>

>

>

> APCCIRN-I18N Expires on Aug 4, 1994 [Page 8]

> .

>

From apccirn-sec Sun Feb 27 21:50:24 1994

Received: from necom830.cc.titech.ac.jp by nic.nm.kr (8.6.4/8.6.4)

id VAA27934; Sun, 27 Feb 1994 21:50:14 +0900

Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Sun, 27 Feb 94 21:39:57 +0859

From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>

Message-Id: <9402271240.AA19166@necom830.cc.titech.ac.jp>

Subject: IETF WG proposal

To: apccirn-i18n@nic.nm.kr

Date: Sun, 27 Feb 94 21:39:55 JST

In-Reply-To: <no.id>; from "mohta" at Feb 1, 94 10:45 pm

X-Mailer: ELM [version 2.3 PL11]

Dear members of APCCIRN I18N group;

After the successful publication of the Internet Draft on ISO-2022-INT-*,

I'm now trying to negotiate with IESG on the creation of the following

IETF working group.

I think our group can host the WG.

Any comments?

Masataka Ohta

Name:

Internationalization (i18n)

Areas:

USV & APP

Description of the Working Group:

The purpose of the i18n working group is to promote the

internationalization of the Internet.

The main goal of the working group is to develop a single text

encoding scheme useful for all the plain text in the world.

The group may address other issues which require technical

consideration about internationalization.

The group does not address politics of international coordination.

The working group is jointly operated by IETF and APCCIRN.

Goals:

Submit "Internet Multilingual Text Encoding: ISO-2022-INT-*" to the

IESG for consideration as a standard track document.

Submit an Informational RFC on why ISO 10646/UNICODE is inappropriate

as the single text encoding method in the world.

Submit "Mid- to long-term Architecture on Internet Text Encoding" to

the IESG for consideration as a standard track document.

Submit an Informational RFC of "instructions for RFC translators".

Internet Drafts:

The following two related Internet Drafts are posted today and will

soon be available.

draft-ohta-text-encoding-00.txt written by APCCIRN-I18N

Internet Multilingual Text Encoding: ISO-2022-INT-*

draft-ohta-translation-instr-00.txt written by me

Instructions to RFC Translators

The following related Internet Draft will soon be posted.

draft-ohta-mime-charset-names-00.txt written by me

MIME charset names for ISO 10646

From apccirn-sec Wed Mar 30 19:00:46 1994

Received: from cosmos.kaist.ac.kr by krnic.net (8.6.4/8.6.4)

id TAA26895; Wed, 30 Mar 1994 19:00:45 +0900

Received: from localhost (chon@localhost) by cosmos.kaist.ac.kr (8.6.4/8.6.4) id TAA15245 for ap-i18n@krnic.net; Wed, 30 Mar 1994 19:07:03 +0900

Date: Wed, 30 Mar 1994 19:07:03 +0900

From: Kilnam Chon <chon@cosmos.kaist.ac.kr>

Message-Id: <199403301007.TAA15245@cosmos.kaist.ac.kr>

To: ap-i18n@krnic.net

Subject: issue for ap-18n group

this is the issue for the i18n group of apccirn. would like to spend sometime

on this and other matters at the next apccirn meeting in june 17-18.

kilnam chon

------------------------------------------------------------------------

IESG Secretary writes:

>From root Fri Mar 25 12:29:56 1994

>To: Internet Architecture Board <iab@isi.edu>

>cc: The Internet Engineering Steering Group <IESG@CNRI.Reston.VA.US>

>cc: IETF-Announce:;

>Sender: ietf-announce-request@IETF.CNRI.Reston.VA.US

>From: IESG Secretary <iesg-secretary@CNRI.Reston.VA.US>

>Subject: Character Sets and other issues of Internationalization

>Date: Thu, 24 Mar 94 19:10:13 -0500

>X-Orig-Sender: scoya@CNRI.Reston.VA.US

>Message-ID: <9403241910.aa22138@IETF.CNRI.Reston.VA.US>

>

>

>Work in either character set (or coding) development or

>"internationalization" has major long-term architectural and policy

>implications for the Internet. It is clear that the work is important;

>it is clear that others, including several ISO/IEC JTC1 committees, are

>working parts of the issue. Much of the work and the success criteria

>for it are cultural and political, not engineering/technical.

>

>The IESG believes these issues need to be addressed by the IAB, and

>requests that they advise the IETF on architectural frameworks, and on

>what should be done within IETF and what should be done elsewhere.

>

>The IESG also requests the IAB to initiate liaisons with other groups

>(e.g. ISO/IEC JTC1 subgroups, especially SC2 and SC22, APCCIRN, RARE,

>CEN, French Ministry of Culture, etc.) as they believe would facilitate

>the work and reduce the odds of redundant or conflicting work and

>recommendations, and of concerned parties "shopping" for a standards

>body who can be persuaded to adoped approaches rejected elsewhere.

>

>Pending availability of this advice and recommendations, the IESG will

>refer any proposals to initiate standards-track character set work,

>other than requirements to narrowly profile existing and deployed

>standards for Internet use, to the IAB for your deliberations.

>

>

--LAB13340.764995780/cosmos.kaist.ac.kr--

From apng-sec Tue Dec 13 23:53:27 1994

Received: from necom830.cc.titech.ac.jp (necom830.cc.titech.ac.jp [131.112.32.132]) by cosmos.kaist.ac.kr (8.6.9H1/8.6.9) with SMTP id XAA09935 for <apng-i18n@apng.org>; Tue, 13 Dec 1994 23:53:14 +0900

Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Tue, 13 Dec 94 23:52:53 +0859

From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>

Message-Id: <9412131453.AA00250@necom830.cc.titech.ac.jp>

Subject: Apng-i18n charter

To: apng-i18n@apng.org

Date: Tue, 13 Dec 94 23:52:52 JST

X-Mailer: ELM [version 2.3 PL11]

Dear APNG-I18N members;

FYI, the following is the current charter of the group. Any comments,

questions or new proposals?

Masataka Ohta

=============================================================================

APNG Internationalization/Localization Working Group (apng-i18n)

Last updated on 1994.12.13

CHARTER

1. Coordinator(s):

M. Ohta <mohta@cc.titech.ac.jp>

TEL: +81-3-5734-3299

FAX: +81-3-5734-3415

2. Description of Working Group:

The purpose of the i18n working group is to promote the

internationalization of the Internet.

The main goal of the working group is to develop a single text

encoding scheme useful for all the plain text in the world,

where a lot of Asian-Pacific specific issues still remaining.

The group may address other issues which require technical

consideration about internationalization.

The group does not handle politics on policy determination of

international coordination but may produce purely technical

guidelines for it.

3. Members:

Jimmy Hwang <jhwang@wiley.csusb.edu>,

M. Ohta <mohta@cc.titech.ac.jp>,

H.T. Koanatakool <htk@ipied.tu.ac.th>,

Trin Tantsetthi <trin@nwg.nectec.or.th>,

Jaekyung Song <jksong@cosmos.kaist.ac.kr>,

Woohyung Choi <whchoi@krnic.net>,

Kyuho Kim <kyuho@cosmos.kaist.ac.kr>,

APNG Secretariat <apng-sec@apng.org>,

Kilnam Chon <chon@cosmos.kaist.ac.kr>,

<handa@etl.go.jp>,

Abhaya Indurawa <abhaya@cse.mrt.ac.lk>,

<cheng@nwg.nectec.or.th>,

<shin@iij.ad.jp>,

<wschen@twnmoe10.edu.tw>,

Jun Murai <jun@wide.ad.jp>,

<nazo@sfc.wide.ad.jp>,

Jun Matsukata <jm@eng.isas.ac.jp>,

Shunichi Akazawa <akazawa@who.ch>,

Sunyoung Han <syhan@cosmos.kaist.ac.kr>,

<rong@watson.ibm.com>,

<ute@cc.noda.sut.ac.jp>,

<fuku@c1.kagu.sut.ac.jp>,

<lwbbs@shakti.ncst.ernet.in>,

Barry Greene <barry@singnet.com.sg>,

Lim Gek Meng <gmlim@singnet.com.sg>,

Ong Wee Cheong <ongwc@singnet.com.sg>,

Chang Wai Leong <cwl@singnet.com.sg>,

Lee Hyung-Seok <hyslee@coregate.kaist.ac.kr>,

Michell Chiang <michelle@technet.sg>,

Masaki Hirabaru <hi@nic.ad.jp>,

Suguru Yamaguchi <suguru@is.aist-nara.ac.jp>,

Akko Oka <oka@slab.ntt.jp>,

Glenn Mansfiend <glenn@aic.co.jp>,

Xiaoling Teng <ccteng@pkn.edu.cn>,

Susan S. Zhu <szhu@net.edu.cn>,

Haifeng Zhu <zhf@ns.net.edu.cn>,

P. T. Ho <hpt@cc.hku.hk>,

Shigeki Goto <goto@ntt-20.ntt.jp>,

Lawrence Law <cclaw@usthk.ust.hk>,

Hock-Koon Lim <lim@ctron.com>,

Shuichi Tashiro <tashiro@etl.go.jp>,

Qiming Li <liqm@bepc2.ihep.ac.cn>,

Raymond Poon <ccrpoon@cityu.edu.hk>,

Ming Lu <luming@tsinghua.edu.cn>

4. Mailing Lists:

General Discussion: apng-i18n@apng.org

To Subscribe: listserv@apng.org

Archive: apng.org:/apng/mail.archive/apng-i18n

5. Remark:

==============================================================================

From apng-sec Tue May 23 03:33:52 1995

Received: from josef.ifi.unizh.ch (josef.ifi.unizh.ch [130.60.48.10]) by cosmos.kaist.ac.kr (8.6.9H1/8.6.9) with SMTP id DAA23350 for <apng-i18n@cosmos.kaist.ac.kr>; Tue, 23 May 1995 03:33:48 +0900

Message-Id: <199505221833.DAA23350@cosmos.kaist.ac.kr>

Received: from ifi.unizh.ch by josef.ifi.unizh.ch

id <01499-0@josef.ifi.unizh.ch>; Mon, 22 May 1995 20:34:21 +0200

Subject: Re: UN: Unification Method

To: apng-i18n@cosmos.kaist.ac.kr

Date: Mon, 22 May 1995 20:34:20 +0200 (MET DST)

Cc: mduerst@ifi.unizh.ch, zhf@net.edu.cn, apng-cc@apng.org

In-Reply-To: <199505180837.RAA17044@necom830.cc.titech.ac.jp> from "Masataka Ohta" at May 18, 95 05:37:26 pm

X-Mailer: ELM [version 2.4 PL11]

MIME-Version: 1.0

Content-Type: text/plain; charset=US-ASCII

Content-Transfer-Encoding: 8bit

Content-Length: 2982

From: Martin J Duerst <mduerst@ifi.unizh.ch>

Sender: mduerst@ifi.unizh.ch

Masataka Ohta wrote (in comments to a posting of mine):

>> The Macintosh is a good example that

>> uses no escape sequences at all and is multilingual to a higher degree

>> than any other widely available system.

>

>Mac with or without UNICODE is merely as good as EUC.

Already a very simlpe application such as Hypercard is highly

multilingual.

I really wonder what the Mac can't do that escape sequences can.

Could you give examples?

>> And many applications and data formats that are not directly

>> related to high-quality printing will need no escape sequences

>> and no additional information as it is available via fonts and

>> scripts on the Mac.

>

>Try Greek people use Latin alphabet only except on high-quality printing.

Greek and Latin are neatly separated in Unicode/ISO 10646,

so your example is not appropriate. The case in question, namely

e.g. reading simplified Chinese with a Unicode font that contains

the glyphs for tradional Chinese (in case both glyphs are so close

as to share the same code point), is better compared to reading

Latin in an Italic font vs. reading it in a Roman font.

>We are already needing the distinction ignored in Unicode even for

>low quality bitmap display. That's the fact of daily life. There

>are no room of discussion.

Low quality bitmap display introduces many distortions of characters,

esp. where they have many strokes. The additional distortions

introduced *in the worst case* by Unicode are not as big.

And there is in general no need to use the wrong font, whereas

the distortions due to low resolution bitmaps, on a low resolution

device, cannot be helped.

>> To have the same code for things that are considered the same

>> is a very important benefit of unification.

>

>The problem is that, even though sample character shapes in CNS, GB, JIS

>and KS C may have some correspondence, the code points cover different

>area of allowable shape variation.

The standards don't say what shape variations they cover. Basically,

whatever shapes a font designer comes up with that are identifiable and

accepted by the public in the circumstances they are used are okay.

The Japanese standard gives explicit, but not exhaustive examples

of shapes that fall under the same code point. I do not know about

the Chinese standards, but maybe somebody from China could

give this information.

In general, in the case of the characters unified in Unicode/ISO 10646,

the allowable shape variations of a unified character clearly overlap

to a high degree, even if the "center of gravity" of the shape regions,

i.e. the preferred glyph shape according to average typographic

practice, may not be the same.

>> Unicode uses this principle wherever possible.

>

>Unicode is completely broken in this sense. Unicode is unusable in

>multi-lingual environment.

Unicode is very useful in a multi-lingual environment, more than

any other character encoding. If you don't want to use it, that's

your problem, but not ours.

Regards, Martin.

From apng-sec Thu May 25 17:52:31 1995

Received: from necom830.cc.titech.ac.jp (necom830.cc.titech.ac.jp [131.112.32.132]) by cosmos.kaist.ac.kr (8.6.9H1/8.6.9) with ESMTP id RAA16327 for <apng-i18n@cosmos.kaist.ac.kr>; Thu, 25 May 1995 17:51:20 +0900

Received: by necom830.cc.titech.ac.jp (8.6.11/necom-mx-rg); Thu, 25 May 1995 17:46:32 +0900

From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>

Message-Id: <199505250846.RAA07226@necom830.cc.titech.ac.jp>

Subject: Re: UN: Unification Method

To: mduerst@ifi.unizh.ch (Martin J Duerst)

Date: Thu, 25 May 95 17:46:30 JST

Cc: apng-i18n@cosmos.kaist.ac.kr, mduerst@ifi.unizh.ch, zhf@net.edu.cn,

apng-cc@apng.org

In-Reply-To: <199505221834.DAA23353@cosmos.kaist.ac.kr>; from "Martin J Duerst" at May 22, 95 8:34 pm

X-Mailer: ELM [version 2.3 PL11]

> >> The Macintosh is a good example that

> >> uses no escape sequences at all and is multilingual to a higher degree

> >> than any other widely available system.

> >

> >Mac with or without UNICODE is merely as good as EUC.

>

> Already a very simlpe application such as Hypercard is highly

> multilingual.

They are multilingual in a way that they can be configured to

be multiple single lingual instances, which is what EUC already

done.

> I really wonder what the Mac can't do that escape sequences can.

> Could you give examples?

Just compare full ISO 2022 and EUC.

> >> And many applications and data formats that are not directly

> >> related to high-quality printing will need no escape sequences

> >> and no additional information as it is available via fonts and

> >> scripts on the Mac.

> >

> >Try Greek people use Latin alphabet only except on high-quality printing.

>

> Greek and Latin are neatly separated in Unicode/ISO 10646,

Yes, double standard.

> so your example is not appropriate.

Why don't you try to force Greek and Russian use ISO 8859/1 only?

> >We are already needing the distinction ignored in Unicode even for

> >low quality bitmap display. That's the fact of daily life. There

> >are no room of discussion.

>

> Low quality bitmap display introduces many distortions of characters,

> esp. where they have many strokes.

We have our own definition on what is the acceptable distortions.

> The additional distortions

> introduced *in the worst case* by Unicode are not as big.

We already judged it unacceptable.

> >> To have the same code for things that are considered the same

> >> is a very important benefit of unification.

> >

> >The problem is that, even though sample character shapes in CNS, GB, JIS

> >and KS C may have some correspondence, the code points cover different

> >area of allowable shape variation.

>

> The standards don't say what shape variations they cover.

So, you must supply that information, which is the problem.

> Basically,

> whatever shapes a font designer comes up with that are identifiable and

> accepted by the public in the circumstances they are used are okay.

Okay for monocultural environment.

> In general, in the case of the characters unified in Unicode/ISO 10646,

> the allowable shape variations of a unified character clearly overlap

> to a high degree,

Urrr, I don't think you have any real world expertise to judge it.

> Unicode is very useful in a multi-lingual environment, more than

> any other character encoding.

If you think so, use EUC.

Masataka Ohta

From apng-sec Fri May 26 00:12:49 1995

Received: from josef.ifi.unizh.ch (josef.ifi.unizh.ch [130.60.48.10]) by cosmos.kaist.ac.kr (8.6.9H1/8.6.9) with SMTP id AAA19271 for <apng-i18n@cosmos.kaist.ac.kr>; Fri, 26 May 1995 00:12:34 +0900

Message-Id: <199505251512.AAA19271@cosmos.kaist.ac.kr>

Received: from ifi.unizh.ch by josef.ifi.unizh.ch

id <00584-0@josef.ifi.unizh.ch>; Thu, 25 May 1995 17:12:51 +0200

Subject: Re: UN: Unification Method

To: mohta@necom830.cc.titech.ac.jp (Masataka Ohta)

Date: Thu, 25 May 1995 17:12:50 +0200 (MET DST)

Cc: apng-i18n@cosmos.kaist.ac.kr, apng-cc@apng.org

In-Reply-To: <199505250846.RAA07226@necom830.cc.titech.ac.jp> from "Masataka Ohta" at May 25, 95 05:46:30 pm

X-Mailer: ELM [version 2.4 PL11]

MIME-Version: 1.0

Content-Type: text/plain; charset=US-ASCII

Content-Transfer-Encoding: 8bit

Content-Length: 4674

From: Martin J Duerst <mduerst@ifi.unizh.ch>

Sender: mduerst@ifi.unizh.ch

Masataka Ohta wrote, in response to a contribution of mine:

>> >> The Macintosh is a good example that

>> >> uses no escape sequences at all and is multilingual to a higher degree

>> >> than any other widely available system.

>> >

>> >Mac with or without UNICODE is merely as good as EUC.

>>

>> Already a very simlpe application such as Hypercard is highly

>> multilingual.

>

>They are multilingual in a way that they can be configured to

>be multiple single lingual instances, which is what EUC already

>done.

>

>> I really wonder what the Mac can't do that escape sequences can.

>> Could you give examples?

>

>Just compare full ISO 2022 and EUC.

From what you are saying, I have to conclude that you are not

very familiar with the multilingual capabilities of the Mac.

In Hypercard, you can have Japanese, Chinese, Arabic, Hebrew,

Korean, and so on, in one and the same single field on a single

card, all with correct high-quality glyphs (True Type or Postscript).

And this just because Hypercard uses the basic text facilities of

the Mac OS, rather than trying to do better like some word processing

programs.

I would still like to hear what you think the Mac would do better

if it used Escape sequences. Please give actual examples. Just

refering to EUC doesn't help, as there is a big difference between

multiscript/multilingual Mac text processing and EUC based Unix

localization.

>> >Try Greek people use Latin alphabet only except on high-quality printing.

>>

>> Greek and Latin are neatly separated in Unicode/ISO 10646,

>

>Yes, double standard.

There is no double standard. Claiming so shows that you are not

familliar with the principles used in Unicode/ISO10646 and with

your own Japanese character standard JIS X 0208.

According to the shape criteria of Unicode, Latin, Greek, and

Cyrillic 'A' could have been unified. But for backward compatibility,

Unicode excluded unification of characters that have separate code

points in well used standards, so as to allow round-trip conversion.

JIS X 0208 is one of the few standards that contains code points

for all the three. Unifying them would have ment that it would

be impossible to convert a text from JIS, SJIS, or Japanese EUC

encoding to Unicode and back without loss of information.

>> >We are already needing the distinction ignored in Unicode even for

>> >low quality bitmap display. That's the fact of daily life. There

>> >are no room of discussion.

>>

>> Low quality bitmap display introduces many distortions of characters,

>> esp. where they have many strokes.

>

>We have our own definition on what is the acceptable distortions.

>

>> The additional distortions

>> introduced *in the worst case* by Unicode are not as big.

>

>We already judged it unacceptable.

Is this "We" a pluralis maiestatis? Or have you done controlled

experiments? I would be interested to hear about them. The

only experiments I have heard about have been small scale,

but point out that the differences are ignored by most

subjects unless you give them very strong hints to help

get avare of the differences.

>> >> To have the same code for things that are considered the same

>> >> is a very important benefit of unification.

>> >

>> >The problem is that, even though sample character shapes in CNS, GB, JIS

>> >and KS C may have some correspondence, the code points cover different

>> >area of allowable shape variation.

>>

>> The standards don't say what shape variations they cover.

>

>So, you must supply that information, which is the problem.

By saying that you have to supply some information, you are admitting

that the standards don't define it, and are contradicting your previous

statement.

>> Basically,

>> whatever shapes a font designer comes up with that are identifiable and

>> accepted by the public in the circumstances they are used are okay.

>

>Okay for monocultural environment.

It may be perfectly possible that a good font designer comes up

with a new font that is accepted in all CJK areas and doesn't

need glyph distinctions. On the other hand, it would be very

difficult to get Japanese used to e.g. a Long Song type of font,

even if it used Japanese glyph shapes.

>> In general, in the case of the characters unified in Unicode/ISO 10646,

>> the allowable shape variations of a unified character clearly overlap

>> to a high degree,

>

>Urrr, I don't think you have any real world expertise to judge it.

What real-world experience do you have? How many times have

you looked at font specimen from different sources, and found

that they don't agree, for quite some characters, on the details

you consider 'out of discussion'? I can give you examples, if

necessary.

Regards, Martin.

From apng-sec Fri May 26 15:16:20 1995

Received: from net.edu.cn (ns.net.edu.cn [166.111.1.10]) by cosmos.kaist.ac.kr (8.6.9H1/8.6.9) with SMTP id PAA24196; Fri, 26 May 1995 15:16:10 +0900

Received: from [166.111.1.115] by net.edu.cn (4.0/SMI-4.0)

id AA00866; Fri, 26 May 95 14:02:31 CST

From: "Zhu, Haifeng" <zhf@net.edu.cn>

Date: Fri, 26 May 95 01:48:34 CST

Message-Id: <612.zhf@net.edu.cn_POPMail/PC_3.2.2>

Reply-To: <zhf@net.edu.cn>

X-Popmail-Charset: English

To: mduerst@ifi.unizh.ch

Cc: mohta@necom830.cc.titech.ac.jp, apng-cc@apng.org, apng-i18n@apng.org

Subject: Re: UN: Unification Method

On Thu, 25 May 1995 16:29:56 +0200 (ME, Martin J Duerst wrote:

>Zhu, Haifeng long ago has indicated that we will discuss

>unification and related issues in apng-cc, and at that time

>as well as later when he in fact opened the discussion,

>there was no complaint. Also, it is clear that unification

>is related to the topic of this group.

>

>Regards, Martin.

Since this is also related to the scope of i18n, in a sense. I think

we can CC these mail to apng-i18n@apng.org too, if Mr. Ohta or i18n

agree.

Best Regards.

-- Haifeng --

Zhu,Haifeng

Coordinator of APNG-CC (Asia-Pacific Networking Group)

Dept. of Computer Sci.&Tech., Tsinghua University

Institute of Networking, Tsinghua University

Beijing 100084, People's Republic of China

Tel: +86-1-2561144 ext 3492 Fax: +86-1-2564173

Email: zhf@net.edu.cn

From apng-sec Fri May 26 15:40:42 1995

Received: from necom830.cc.titech.ac.jp (necom830.cc.titech.ac.jp [131.112.32.132]) by cosmos.kaist.ac.kr (8.6.9H1/8.6.9) with ESMTP id PAA24317; Fri, 26 May 1995 15:40:24 +0900

Received: by necom830.cc.titech.ac.jp (8.6.11/necom-mx-rg); Fri, 26 May 1995 15:30:55 +0859

From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>

Message-Id: <199505260631.PAA12421@necom830.cc.titech.ac.jp>

Subject: Re: UN: Unification Method

To: zhf@net.edu.cn

Date: Fri, 26 May 95 15:30:54 JST

Cc: mduerst@ifi.unizh.ch, apng-cc@apng.org, apng-i18n@apng.org

In-Reply-To: <612.zhf@net.edu.cn_POPMail/PC_3.2.2>; from "Zhu, Haifeng" at May 26, 95 1:48 am

X-Mailer: ELM [version 2.3 PL11]

> Since this is also related to the scope of i18n, in a sense. I think

> we can CC these mail to apng-i18n@apng.org too, if Mr. Ohta or i18n

> agree.

No, use apng-i18n only. We should suspend the discussion 2 or 3 days

so that all interested parties can also register to apng-i18n ML.

Masataka Ohta

From apng-sec Fri May 26 16:02:51 1995

Received: from net.edu.cn (ns.net.edu.cn [166.111.1.10]) by cosmos.kaist.ac.kr (8.6.9H1/8.6.9) with SMTP id PAA24429; Fri, 26 May 1995 15:58:44 +0900

Received: from [166.111.1.115] by net.edu.cn (4.0/SMI-4.0)

id AA00978; Fri, 26 May 95 14:57:06 CST

From: "Zhu, Haifeng" <zhf@net.edu.cn>

Date: Fri, 26 May 95 02:43:10 CST

Message-Id: <458.zhf@net.edu.cn_POPMail/PC_3.2.2>

Reply-To: <zhf@net.edu.cn>

X-Popmail-Charset: English

To: mohta@necom830.cc.titech.ac.jp

Cc: mduerst@ifi.unizh.ch, apng-cc@apng.org, apng-i18n@apng.org

Subject: Re: UN: Unification Method

On Fri, 26 May 95 15:30:54 JST, Masataka Ohta wrote:

>> Since this is also related to the scope of i18n, in a sense. I think

>> we can CC these mail to apng-i18n@apng.org too, if Mr. Ohta or i18n

>> agree.

>

>No, use apng-i18n only. We should suspend the discussion 2 or 3 days

>so that all interested parties can also register to apng-i18n ML.

Why, if we concentrate on Chinese ? If we use i18n, I'm afraid we are not

discussing Chinese transfer method using unified methods, which is insisted

to be used by some experts.

Noted that unifief coding is also a way of Chinese transfer, it could be

evaluated in this group.

-- Haifeng --

Zhu,Haifeng

Coordinator of APNG-CC (Asia-Pacific Networking Group)

Dept. of Computer Sci.&Tech., Tsinghua University

Institute of Networking, Tsinghua University

Beijing 100084, People's Republic of China

Tel: +86-1-2561144 ext 3492 Fax: +86-1-2564173

Email: zhf@net.edu.cn

From apng-sec Fri May 26 16:39:14 1995

Received: from toad.lake.cs.wwu.edu (toad.lake.cs.wwu.EDU [140.160.138.10]) by cosmos.kaist.ac.kr (8.6.9H1/8.6.9) with SMTP id QAA24670 for <apng-i18n@apng.org>; Fri, 26 May 1995 16:39:08 +0900

Received: by toad.lake.cs.wwu.edu (5.0/SMI-SVR4)

id AA29243; Fri, 26 May 1995 00:36:14 -0700

Date: Fri, 26 May 1995 00:36:14 -0700

From: n8442161@toad.lake.cs.wwu.edu (Patrick Tuttle)

Message-Id: <9505260736.AA29243@toad.lake.cs.wwu.edu>

To: apng-i18n@apng.org

Subject: subscribe n8442161@toad.lake.cs.wwu.edu Patrick Tuttle

content-length: 55

subscribe n8442161@toad.lake.cs.wwu.edu Patrick Tuttle

From apng-sec Fri May 26 16:39:42 1995

Received: from necom830.cc.titech.ac.jp (necom830.cc.titech.ac.jp [131.112.32.132]) by cosmos.kaist.ac.kr (8.6.9H1/8.6.9) with ESMTP id QAA24679; Fri, 26 May 1995 16:39:29 +0900

Received: by necom830.cc.titech.ac.jp (8.6.11/necom-mx-rg); Fri, 26 May 1995 16:34:59 +0859

From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>

Message-Id: <199505260735.QAA12879@necom830.cc.titech.ac.jp>

Subject: Re: UN: Unification Method

To: zhf@net.edu.cn

Date: Fri, 26 May 95 16:34:58 JST

Cc: mduerst@ifi.unizh.ch, apng-cc@apng.org, apng-i18n@apng.org

In-Reply-To: <458.zhf@net.edu.cn_POPMail/PC_3.2.2>; from "Zhu, Haifeng" at May 26, 95 2:43 am

X-Mailer: ELM [version 2.3 PL11]

> >> Since this is also related to the scope of i18n, in a sense. I think

> >> we can CC these mail to apng-i18n@apng.org too, if Mr. Ohta or i18n

> >> agree.

> >

> >No, use apng-i18n only. We should suspend the discussion 2 or 3 days

> >so that all interested parties can also register to apng-i18n ML.

>

> Why, if we concentrate on Chinese ?

Concentrate on Chinese? Then, use apng-cc only. Members of apng-i18n are

already notified the existence of apng-cc.

> Noted that unifief coding is also a way of Chinese transfer, it could be

> evaluated in this group.

Sure. But, according to you, if the scope is communication in Chinese,

GB 2312 is a universal, fixed length encoding.

So, what are the remaining points?

Masataka Ohta

From apng-sec Mon May 29 06:18:40 1995

Received: from josef.ifi.unizh.ch (josef.ifi.unizh.ch [130.60.48.10]) by cosmos.kaist.ac.kr (8.6.9H1/8.6.9) with SMTP id GAA10309; Mon, 29 May 1995 06:18:31 +0900

Message-Id: <199505282118.GAA10309@cosmos.kaist.ac.kr>

Received: from ifi.unizh.ch by josef.ifi.unizh.ch

id <00902-0@josef.ifi.unizh.ch>; Sun, 28 May 1995 23:19:02 +0200

Subject: Re: UN: Scope of discussion

To: apng-i18n@apng.org, apng-cc@apng.org

Date: Sun, 28 May 1995 23:19:01 +0200 (MET DST)

X-Mailer: ELM [version 2.4 PL11]

MIME-Version: 1.0

Content-Type: text/plain; charset=US-ASCII

Content-Transfer-Encoding: 8bit

Content-Length: 2785

From: Martin J Duerst <mduerst@ifi.unizh.ch>

Sender: mduerst@ifi.unizh.ch

Masataka Ohta <mohta@necom830.cc.titech.ac.jp>

and Zhu, Haifeng (zhf@net.edu.cn) have made some

comments on what should be discussed on which

mailing list.

I think we all agree that apng-cc is dedicated to

transfer of Chinese text, and apng-i18n to more

general issues, and that this shouldn't be changed.

The main problem seems to be that discussing unifi-

cation for Chinese, as we have set out to do in apng-cc,

can in many cases not so easily be separated from

other aspects, such as multilingual issues (which

we have already between Mandarin and Cantonese),

other scripts such as Latin and Greek (which are

part of the existing Chinese standards, and are used),

general advantages and disadvantages of unification

and Unicode (because many of them directly or

indirectly apply to Chinese) or even such specific

issues like glyph shapes in Japanese (because both

Masataka Ohta and I are more familliar with Japanese

than with Chinese).

All these issues are related to our main topic, and

therefore they will pop up from time to time. Getting

the greater picture is often advisable when trying

to make decisions.

>> Yes, I think it is needed to be dicussed if concentrated on needs of Chinese

>> Internet communication, as the charter described "mixed/unified method".

>

>As long as it is unrelated to multilingual issues, that's OK.

>

>The problem is in Martin who unnecessarily confuse Chinese and non-Chinese

>issues.

I have just quickly re-read the mails in our thread. The result

was interesting. Many of the points that at a later stage were

criticized to be unappropriate for apng-cc started out as

unneccessary, unsubstantiated, and/or factually incorrect side-

remarks from the person who is most criticizing that the topics

are unappropriate once the arguments lie on the table.

[Just a bit of historic reference for those that have been on

apng-cc for a while: I remember a specific situation where

somebody opened a special mailing list at a point where it

tournend out that he had run out of arguments. I wouldn't

like to get the same impression now (the only difference

being that the mailing list already exists).]

>Apng-cc has the specific purpose and is NOT the place of general

>discussion between Chinese people. And, Martin is not a Chinese.

Nice of Masataka Ohta to tell me (and the list).

Guess I now have to tell him that he isn't Chinese, either, but

Japanese. Guess I also could tell him that these facts are not

relevant to our mailing lists (but that I won't open a new mailing

list to discuss the issues; just to remove any doubts and get

the readers on equal settings for both of us, I will add here

that I am Swiss).

I appologize in advance to all those on the lists that already

knew the above facts, or didn't care anyway.

Regards, Martin.

From apng-sec Mon May 29 14:16:24 1995

Received: from net.edu.cn (ns.net.edu.cn [166.111.1.10]) by cosmos.kaist.ac.kr (8.6.9H1/8.6.9) with SMTP id OAA12699; Mon, 29 May 1995 14:15:01 +0900

Received: from [166.111.1.115] by net.edu.cn (4.0/SMI-4.0)

id AA00570; Mon, 29 May 95 12:49:24 CST

From: "Zhu, Haifeng" <zhf@net.edu.cn>

Date: Mon, 29 May 95 00:34:56 CST

Message-Id: <476.zhf@net.edu.cn_POPMail/PC_3.2.2>

Reply-To: <zhf@net.edu.cn>

X-Popmail-Charset: English

To: mduerst@ifi.unizh.ch

Cc: apng-cc@apng.org, apng-i18n@apng.org

Subject: Re: UN: Scope of discussion

On Sun, 28 May 1995 23:19:01 +0200 (ME, Martin J Duerst wrote:

>Masataka Ohta <mohta@necom830.cc.titech.ac.jp>

>and Zhu, Haifeng (zhf@net.edu.cn) have made some

>comments on what should be discussed on which

>mailing list.

>

>I think we all agree that apng-cc is dedicated to

>transfer of Chinese text, and apng-i18n to more

>general issues, and that this shouldn't be changed.

>

>The main problem seems to be that discussing unifi-

>cation for Chinese, as we have set out to do in apng-cc,

>can in many cases not so easily be separated from

>other aspects, such as multilingual issues (which

>we have already between Mandarin and Cantonese),

>other scripts such as Latin and Greek (which are

>part of the existing Chinese standards, and are used),

>general advantages and disadvantages of unification

>and Unicode (because many of them directly or

>indirectly apply to Chinese) or even such specific

>issues like glyph shapes in Japanese (because both

>Masataka Ohta and I are more familliar with Japanese

>than with Chinese).

>All these issues are related to our main topic, and

>therefore they will pop up from time to time. Getting

>the greater picture is often advisable when trying

>to make decisions.

Agree, they could refered if related with Chinese. Unification especially

Unicode/10646 for Chinese transfer encoding should be discussed in apng-cc.

-- Haifeng --

Zhu,Haifeng

Coordinator of APNG-CC (Asia-Pacific Networking Group)

Dept. of Computer Sci.&Tech., Tsinghua University

Institute of Networking, Tsinghua University

Beijing 100084, People's Republic of China

Tel: +86-1-2561144 ext 3492 Fax: +86-1-2564173

Email: zhf@net.edu.cn

From apng-sec Sun Jun 11 22:21:17 1995

Received: from necom830.cc.titech.ac.jp (necom830.cc.titech.ac.jp [131.112.32.132]) by cosmos.kaist.ac.kr (8.6.9H1/8.6.9) with ESMTP id WAA11567; Sun, 11 Jun 1995 22:21:08 +0900

Received: by necom830.cc.titech.ac.jp (8.6.11/necom-mx-rg); Sun, 11 Jun 1995 22:18:01 +0900

From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>

Message-Id: <199506111318.WAA00763@necom830.cc.titech.ac.jp>

Subject: Agenda for APNG-I18N meeting at Honolulu

To: apng-all@apng.org, apng-i18n@apng.org, apng-cc@apng.org

Date: Sun, 11 Jun 95 22:18:00 JST

X-Mailer: ELM [version 2.3 PL11]

Dear members of APNG;

Below is the current agenda on the upcoming apng-i18n meeting:

Date: 1 July 1995(9:00 - 12:00)

Location: Sheraton Waikiki Hotel

1. General issues

2. Font CDROM Project by Shuichi Tashiro

3. Report of the work of APNG-CC by Prof. Hu

4. Report of APNG-CC RFC-to-be draft of "Chinese Encoding in the

Internet" by Prof. Hu

3 and 4 might be able to be merged.

Any comments?

Masataka Ohta

From apng-sec Mon Jun 12 18:39:56 1995

Received: from net.edu.cn (ns.net.edu.cn [166.111.1.10]) by cosmos.kaist.ac.kr (8.6.9H1/8.6.9) with SMTP id SAA17725; Mon, 12 Jun 1995 18:39:22 +0900

Received: from [166.111.1.115] by net.edu.cn (4.0/SMI-4.0)

id AA01298; Mon, 12 Jun 95 17:43:25 CST

From: "Zhu, Haifeng" <zhf@net.edu.cn>

Date: Mon, 12 Jun 95 17:31:22 CST

Message-Id: <15.zhf@net.edu.cn_POPMail/PC_3.2.2>

Reply-To: <zhf@net.edu.cn>

X-Popmail-Charset: English

To: mohta@necom830.cc.titech.ac.jp

Cc: apng-i18n@apng.org, apng-cc@apng.org

Subject: Re: Agenda for APNG-I18N meeting at Honolulu

On Sun, 11 Jun 95 22:18:00 JST, Masataka Ohta wrote:

> Date: 1 July 1995(9:00 - 12:00)

> Location: Sheraton Waikiki Hotel

>

> 1. General issues

> 2. Font CDROM Project by Shuichi Tashiro

> 3. Report of the work of APNG-CC by Prof. Hu

> 4. Report of APNG-CC RFC-to-be draft of "Chinese Encoding in the

> Internet" by Prof. Hu

>

>3 and 4 might be able to be merged.

>

>Any comments?

Prof. Hu is not in Beijing now, and the report of APNG-CC is now being

written. So, Prof Hu told me that he'd like to recommend Prof. Li Xing to

report on APNG-CC's work and RFC-to-be draft, and he'll preside the APNG-CC

meeting.

Is it ok ?

-- Haifeng --

Zhu,Haifeng

Coordinator of APNG-CC (Asia-Pacific Networking Group)

Dept. of Computer Sci.&Tech., Tsinghua University

Institute of Networking, Tsinghua University

Beijing 100084, People's Republic of China

Tel: +86-1-2561144 ext 3492 Fax: +86-1-2564173

Email: zhf@net.edu.cn

From apng-sec Thu Jul 13 22:04:25 1995

Received: from necom830.cc.titech.ac.jp (necom830.cc.titech.ac.jp [131.112.32.132]) by cosmos.kaist.ac.kr (8.6.9H1/8.6.9) with ESMTP id WAA24424; Thu, 13 Jul 1995 22:00:25 +0900

Received: by necom830.cc.titech.ac.jp (8.6.11/necom-mx-rg); Thu, 13 Jul 1995 21:55:43 +0900

Date: Thu, 13 Jul 1995 21:55:43 +0900

From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>

Message-Id: <199507131255.VAA13313@necom830.cc.titech.ac.jp>

To: apng-all@apng.org, apng-cc@apng.org, apng-i18n@apng.org,

bal@umacmr2.umac.mo, ding@asiainfo.com, edith-wu@cuhk.hk,

hwpark@garam.kreonet.re.kr, j.boellaard@genie.com, jesmith@well.com,

kei@rd.nacsis.ac.jp, mcpong@hkusub.hku.hk, mingfung@cuhk.hk,

mohta@necom830.cc.titech.ac.jp, nschen@cc.nsysu.edu.tw,

oka@slab.ntt.jp, sean@hntp2.hinet.net, sstseng@cis.nctu.edu.tw,

tashiro@etl.go.jp, tsenglm@mbox.ee.ncu.edu.tw, xing@cernet.edu.cn

Subject: Draft Minutes of the APNG-I18N Meeting at Honolulu

Dear APNG members;

Please review the following draft minutes of the APNG-I18N meeting

at Honolulu.

Comments should be sent to

apng-i18n@apng.org

or

apng-cc@apng.org

Masataka Ohta

PS

This is a resent message. Those who have received the previous mail,

sorry for the wrong addresses of apng mailinglist.

------------------------------------------------------------------------

I18N WG Meeting Minutes (DRAFT)

Participants:

Shuichi Tashiro ETL, JAPAN tashiro@etl.go.jp

Xing Li CERNET xing@cernet.edu.cn

Man-Chi Pong The Univ. of Hong Kong mcpong@hkusub.hku.hk

Alex Lai Univ. of Macau bal@umacmr2.umac.mo

Nian-Shing Chen National Sun Yat-sen nschen@cc.nsysu.edu.tw

Univ. Taiwan

Shian-Shyong Tseng TANet sstseng@cis.nctu.edu.tw

Edith Wu The Chinese Univ. of edith-wu@cuhk.hk

Hong Kong

Atsuko Oka NTT oka@slab.ntt.jp

Yusheng Ji NACSIS, Japan kei@rd.nacsis.ac.jp

Jeff Smith Bridge to Asia jesmith@well.com

Jerry Boellaard COMTECH-Hawaii j.boellaard@genie.com

James Ding Asia Info Services, Inc ding@asiainfo.com

Kinming Fung Chinese University of mingfung@cuhk.hk

Hong Kong

Masataka Ohta Tokyo Inst. of Tech mohta@necom830.cc.titech.ac.jp

HyoungWoo Park SERI, KOREA hwpark@garam.kreonet.re.kr

Chen Shyang-yih DCI sean@c2.hinet.net

Tseng Li-Ming CC.MOE.TAIWAN tsenglm@ncuee.ncu.edu.tw

Chair: Masataka Ohta

Documents:

Chinese Character Encoding for Internet Message <DRAFT>

Some experts of APNG-CC

Agenda:

Solicite a Volunteer for Note-taking

Agenda Bashing

Presentation of Font CDROM Project (by Dr. Shuichi Tashiro)

APNG-CC Charter Review

APNG-CC Political Discussion (Final Decision)

APNG-CC Draft Review (by Prof. Li Xing)

New Work Items

APNG-CC Rescheduling

Election

Summary of the Discussion:

1. Font CD ROM project:

- CD-ROM Project

Concept:Font is necessary!

CD-ROM $200/disk

Original intention: To supply small company for TV GAME or ...

Not for Internet

Important Thing:

What type of font should be used?

We can use it immediately.

Respecting original language culture

Cheap! IPR free if possible

Question:

How many fonts already are there?

Can we use Copy Right Font?

- maybe, make new font sets.

Let's discuss apng-i18n ML.

2. APNG-CC Charter Review

The following Charter of APNG-CC compiled by Zhu, Haifeng was

reviewd and approved.

Since there are more and more Chinese using the Internet, the Chinese

tranfer encoding method should be developed. People in P.R.C, Taiwan,

HongKong and Singarpore are using methodes quite different from each other.

We hope to build a suitable mechnism and write an RFC-to-be Internet Draft

to solve this problem.

The work might include: study on available standards/non-standards,

feasibility study on how to mix/unify them including political/cultural

aspects, design of encoding method, write an RFC-to-be Internet Draft.

The work should be done as much as possible through email.

It was stressed that the charter says:

to build a suitable mechnism

and NOT "multiple suitable mechnisms" NOR "the suitable mechnism".

3. APNG-CC Political Discussion (Final Decision)

The wording in the current draft is reviewed and approved by

all the particpants of both sides of the Taiwan straight.

4. APNG-CC Draft Review

Prof. Li Xing has presented an ISO-2022-CN draft.

The following issues are discussed:

Designation

Formally registered ones only

Sub scheme switching (Escape Sequences)

No consensus was formed.

Conformance v.s. Interoperability

have a minimum conformance of

ASCII,

GB2312

CN5 116431,2

It should be stated in the draft that, text beyond the

minimum conformance is not assured to be interoperable.

It was pointed out that GB 2312 font is copyright protected.

So, we agreed that APNG-I18N should strongly recomend P.R.C.

to make GB2312 font copyright free.

Treatment of other encoding mechanisms (HZ, EUC-GB, Big5, ISO 10646...)

It was agreed that APNG-CC draft should not recommend nor

discourage other encoding mechanisms. It may, instead,

give information references.

The current paragraph which discourages HZ MUST be removed.

Liason to HZ developing groups

The followings are the locations of HZ developing group

ftp://ftp.edu.tw/

ftp://cnd.org/

ftp://ftp.ifcss.org/

http://www.ifcss.org/

ML: soft-author@ifcss.org

soft-author-request@ifcss.org

It was requested that "apng.net" should have an aliasing pointer

to "soft-author@ifcss.org".

5. New Work Items

It was agreed that it might be a good idea to provide a separate, new

document "Implementation guidelines for ISO-2022-CN", which covers

information on:

free font locator

conversion tools

editor(s)

x related tools

But, there was no volunteer found.

The volunteers are still being sought in the mailing list.

6. APNG-CC Rescheduling

It will be good if a draft is finalized within a month. After two

weeks of review as an Internet Draft, it should be sent to the RFC

editor.

7. Election

While it would be a good idea to formally elect a chair of APNG-I18N

WG, no one has any idea about that. The current tentative chair

will consult the newly elected chair of APNG on the appropriate

procedure.

From apng-sec Tue Jun 18 21:25:44 1996

Return-Path: demizu@space.csl.sony.co.jp

Received: from space.csl.sony.co.jp (root@space.csl.sony.co.jp [133.138.1.86]) by ins.apng.org (8.6.12+2.4W/3.4W-1.0) with ESMTP id VAA14097 for <apng-i18n@apng.org>; Tue, 18 Jun 1996 21:25:44 +0900

Received: from space.csl.sony.co.jp by space.csl.sony.co.jp (8.7.3/2.8Wb)

id MAA28158; Tue, 18 Jun 1996 12:26:39 GMT

Message-Id: <199606181226.MAA28158@space.csl.sony.co.jp>

From: Noritoshi Demizu <demizu@csl.sony.co.jp>

To: apng-i18n@apng.org

Subject: APNG i18n WG Home page

X-Mailer: Mew version 1.05+ on Emacs 19.28.4, Mule 2.3

Mime-Version: 1.0

Content-Type: Text/Plain; charset=us-ascii

Date: Tue, 18 Jun 1996 21:26:39 +0900

Sender: demizu@space.csl.sony.co.jp

Dear APNG i18n WG members,

APNG i18n WG home page has been compiled at

<URL:http://www.csl.sony.co.jp/person/demizu/apng-i18n/>.

Any comments are welcome.

To make this page more complete, could you send me following

information which isn't on this page yet?

- WG meeting menutes which aren't on this page

- Sample texts for any charsets on the Charset page

- Any activities/products/pages related to i18n/l10n

(especially those in Asia-Pacific area)

Thank you very much.

Best Regards,

Noritoshi Demizu, Sony CSL

From apng-sec Sat Jun 22 23:33:45 1996

Return-Path: apng-sec@rs.krnic.net

Received: from rs.krnic.net (rs.krnic.net [202.30.64.23]) by ins.apng.org (8.6.12+2.4W/3.4W-1.0) with ESMTP id XAA14810 for <apng-i18n@apng.org>; Sat, 22 Jun 1996 23:33:43 +0900

Received: from Mail.IDT.NET by rs.krnic.net (8.6.4/8.6.4)

id AAA26436; Sun, 23 Jun 1996 00:33:44 +1000

Received: from pm1-29.ppp.satelnet.org (pm1-29.ppp.satelnet.org [204.157.227.88]) by Mail.IDT.NET (8.7.4/8.7.3) with SMTP id EAA26392; Sat, 22 Jun 1996 04:39:11 -0400 (EDT)

Message-Id: <199606220839.EAA26392@Mail.IDT.NET>

Comments: Authenticated sender is <hardwear@mail.idt.net>

From: "Neil" <hardwear@mail.idt.net>

To: hardwear@idt.net, hardwear@idt.net, hardwear@idt.net, hardwear@idt.net,

hardwear@idt.net, hardwear@idt.net, hardwear@idt.net, hardwear@idt.net,

hardwear@idt.net, hardwear@idt.net

Date: Sat, 22 Jun 1996 04:39:01 +0000

MIME-Version: 1.0

Content-type: text/plain; charset=US-ASCII

Content-transfer-encoding: 7BIT

Subject: Jewelry for Computer Lovers!!!

Reply-to: hardwear@idt.net

Priority: normal

X-mailer: Pegasus Mail for Windows (v2.33)

Hello,

If you like jewelry and computers check out the WEB site

http://hardwear.com

You will not receive any more messages from us

Thank you

From apng-sec Sun Jun 23 02:49:25 1996

Return-Path: apng-sec@rs.krnic.net

Received: from rs.krnic.net (rs.krnic.net [202.30.64.23]) by ins.apng.org (8.6.12+2.4W/3.4W-1.0) with ESMTP id CAA14849 for <apng-i18n@apng.org>; Sun, 23 Jun 1996 02:49:24 +0900

Received: from Mail.IDT.NET by rs.krnic.net (8.6.4/8.6.4)

id DAA26802; Sun, 23 Jun 1996 03:49:24 +1000

Received: from pm2-23.ppp.satelnet.org (pm2-23.ppp.satelnet.org [204.157.227.112]) by Mail.IDT.NET (8.7.4/8.7.3) with SMTP id JAA13412; Sat, 22 Jun 1996 09:01:37 -0400 (EDT)

Message-Id: <199606221301.JAA13412@Mail.IDT.NET>

Comments: Authenticated sender is <hardwear@mail.idt.net>

From: "Neil" <hardwear@mail.idt.net>

To: hardwear@idt.net, hardwear@idt.net, hardwear@idt.net, hardwear@idt.net,

hardwear@idt.net, hardwear@idt.net, hardwear@idt.net, hardwear@idt.net,

hardwear@idt.net, hardwear@idt.net

Date: Sat, 22 Jun 1996 08:28:28 +0000

MIME-Version: 1.0

Content-type: text/plain; charset=US-ASCII

Content-transfer-encoding: 7BIT

Subject: Jewelry for Computer Lovers!!!

Reply-to: hardwear@idt.net

Priority: normal

X-mailer: Pegasus Mail for Windows (v2.33)

Hello,

If you like jewelry and computers check out the WEB site

http://hardwear.com

You will not receive any more messages from us

Thank you

From apng-sec Wed Aug 7 01:43:15 1996

Return-Path: maeda@ulis.ac.jp

Received: from bach.ulis.ac.jp (bach.ulis.ac.jp [133.51.32.2]) by ins.apng.org (8.6.12+2.4W/3.4W-1.0) with SMTP id BAA27011 for <apng-i18n@apng.org>; Wed, 7 Aug 1996 01:43:14 +0900

Received: from ulis.ac.jp (eboshi) by bach.ulis.ac.jp (4.2/6.4JAIN-ulis-bach2)

id AA21138; Wed, 7 Aug 96 01:45:05 JST

Message-Id: <9608061645.AA21138@bach.ulis.ac.jp>

To: apng-i18n@apng.org

Cc: maeda@ulis.ac.jp

Date: Wed, 07 Aug 1996 01:45:04 +0900

From: Akira MAEDA <maeda@ulis.ac.jp>

subscribe apng-i18n

From apng-sec Tue Nov 12 09:32:31 1996

Return-Path: mohta@necom830.hpcl.titech.ac.jp

Received: from necom830.hpcl.titech.ac.jp (necom830.hpcl.titech.ac.jp [131.112.32.132]) by ins.apng.org (8.6.12+2.4W/3.4W-1.0) with ESMTP id JAA10129 for <apng-i18n@apng.org>; Tue, 12 Nov 1996 09:32:30 +0900

From: Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp>

Message-Id: <199611120033.JAA05413@necom830.hpcl.titech.ac.jp>

Received: by necom830.hpcl.titech.ac.jp (8.6.11/TM2.1)

id JAA05413; Tue, 12 Nov 1996 09:33:33 +0900

Subject: APNG I18N WG Hong Kong meeting

To: apng-i18n@apng.org

Date: Tue, 12 Nov 96 9:33:32 JST

X-Mailer: ELM [version 2.3 PL11]

Dear APNG-I18N members;

Do you or your colleague have any topic about Internationalization and/or

Localization to be discussed/presented in the upcoming APNG meeting?

Proposals to initiate an action of defining how specific local

characters should be encoded on the Internet, like RFC 1922, are welcome.

Please reply to this maling list or privately to me.

I can make a presentation of how was the revision of JIS X 0208

and how will JIS X 0213, the third and forth level Kanji characters,

be, if some of you may be interested in them.

Masataka Ohta

From apng-sec Tue Nov 12 11:49:06 1996

Return-Path: tashiro@media.etl.go.jp

Received: from etlpost.etl.go.jp (etlpost.etl.go.jp [192.31.197.33]) by ins.apng.org (8.6.12+2.4W/3.4W-1.0) with ESMTP id LAA10161 for <apng-i18n@apng.org>; Tue, 12 Nov 1996 11:49:03 +0900

Received: from etlpom.etl.go.jp by etlpost.etl.go.jp (8.6.9+2.4W/2.7W)

id LAA14056; Tue, 12 Nov 1996 11:51:03 +0900

Received: by etlpom.etl.go.jp (4.1/6.4J.6-ETLpom.MASTER)

id AA13062; Tue, 12 Nov 96 11:51:02 JST

Received: by media.etl.go.jp (SMI-8.6/6.4J.6-ETL.SLAVE)

id LAA27943; Tue, 12 Nov 1996 11:51:01 +0900

Message-Id: <199611120251.LAA27943@media.etl.go.jp>

From: tashiro@etl.go.jp (Shuichi TASHIRO)

To: apng-i18n@apng.org

Subject: Re: APNG I18N WG Hong Kong meeting

In-Reply-To: Your message of "Tue, 12 Nov 1996 09:33:32 JST"

References: <199611120033.JAA05413@necom830.hpcl.titech.ac.jp>

Mime-Version: 1.0

Content-Type: text/plain;charset="ISO-2022-JP"

Date: Tue, 12 Nov 1996 11:50:57 +0900

Sender: tashiro@media.etl.go.jp

> Do you or your colleague have any topic about Internationalization and/or

> Localization to be discussed/presented in the upcoming APNG meeting?

We are planning to have a symposium on multilingual information

processing area at Singapore on two days from April to Jun of 1997.

The title of the symposium is tentatively "International Symposium on

the Standardization of Multilingual Information Technologies"

Around 100 people will be invited from countries in Asia.

MITI will support the cost of conference place and travel fee of

speakers (and maybe some participants).

I would like to announce this symposium and discuss the detail of the

symposium (program, theme, speakers, etc.) at the APNG.

--

Shuichi Tashiro

Electrotechnical Laboratory

From apng-sec Tue Nov 12 18:18:20 1996

Return-Path: nakayama

Received: (from nakayama@localhost) by ins.apng.org (8.6.12+2.4W/3.4W-1.0) id SAA10225; Tue, 12 Nov 1996 18:18:20 +0900

Message-Id: <199611120918.SAA10225@ins.apng.org>

To: apng-i18n@apng.org

Subject: Check your e-mail address.

X-Mailer: Mew version 1.06 on Emacs 19.28.1, Mule 2.3

Mime-Version: 1.0

Content-Type: Text/Plain; charset=us-ascii

Date: Tue, 12 Nov 1996 18:18:19 +0900

From: Masaya Nakayama <nakayama>

I removed the folowing entry because of such reasons.

# Jimmy Hwang <jhwang@wiley.csusb.edu> User Unknown

# Woohyung Choi <whchoi@krnic.net> User Unknown

# <lwbbs@shakti.ncst.ernet.in> User Unknown

# Hock-Koon Lim <lim@ctron.com> User Unknown

# Ming Lu <luming@tsinghua.edu.cn> User Unknown

# zhf@captain.net.tsinghua.edu.cn User Unknown

# <fuku@c1.kagu.sut.ac.jp> Host UnKnown

# Xiaoling Teng <ccteng@pkn.edu.cn> Host UnKnown

When you will chenge your e-mail address, please update your

entry by yourself.

We are maintaining MLs by majordomo system. If you don't know

that system, please send a mail to "listserv@apng.org" or

"majordomo@apng.org" with "help" line in its body.

Thanks for your coorperation.

--

Masaya Nakayama, APNG secretariat

From apng-sec Fri May 16 17:52:48 1997

Return-Path: mohta@necom830.hpcl.titech.ac.jp

Received: from necom830.hpcl.titech.ac.jp (necom830.hpcl.titech.ac.jp [131.112.32.132]) by ins.apng.org (8.8.5/3.4W-1.0) with ESMTP id RAA09886; Fri, 16 May 1997 17:52:48 +0900 (JST)

From: Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp>

Message-Id: <199705160852.RAA17108@necom830.hpcl.titech.ac.jp>

Received: by necom830.hpcl.titech.ac.jp (8.6.11/TM2.1)

id RAA17108; Fri, 16 May 1997 17:52:32 +0900

Subject: Kuala Lumpur APNG

To: apng-i18n@apng.org, apng-cc@apng.org

Date: Fri, 16 May 97 17:52:31 JST

X-Mailer: ELM [version 2.3 PL11]

Dear members of APNG I18N WG;

The next APNG meeting will be held at Kuala Lumpur, Malaysia on

June 27 and 28 just after INET'97.

If you have any topic related to APNG I18N WG to be discussed there,

please let me know through e-mail to me or to apng-i18n@apng.org.

Masataka Ohta

Updated: 2012.8.19

Contact sec at InternetHistory.asia for further information.