Up Japanese Greek

日本語

I'm very fond of Akira Kurosawa, Stomu Yamashta and The Yellow Magic Band, but you can keep Saki and Sushi! Yugh!

Japanese is essentially a vertical left-to-right language, with multiple charactersets and downright crazy rules. I really couldn't have thought up more comprehensive wackiness when I was out of my head in the sixties. And the seventies. Fortunately, the great concession is that Japanese is now often displayed horizontally left-to-right, making the developer's life a little easier.

 


 

For setting up English Windows for Japanese Support visit Ted Benson's page

You do not need Microsoft Office Japanese Language settings to enter or display Japanese characters. Many fonts, especially Unicode fonts will support them. However to use the Japanese/Asian specific properties, you need to select this as one of the Multilingual Office Languages :

The Japanese language uses code page 932 (CP_JAPAN), and uses double-byte character encoding (DBCS). VBA still uses code pages, and operations on multinational data, such as filenames in the DIR() function, could fail.

 

Microsoft Access Properties

Language-Specific Properties and Methods MSDN is a little out of date, and refers mainly to Office 2000. Property names and options have changed a little since. You really need to appreciate a little about the Japanese language and the Input Method Editor (IME) to understand the use of these properties. Really good XP IME operational information can be found on  Gregg Tavares's pages.

bullet

FELineBreak Property This is also known as Asian LineBreak. This property only appears on Asian Access.

bullet

FuriganaControl Property  The use of Furigana is to display characters that explain the phonetic pronunciation of an Ideogram, such as a Kanji (Chinese) character. This property only appears on Japanese Access.

This is a demo of Furigana in action. The FuriganaControl property of the top control has the name of the lower control. I enter Kanji characters in the top control. When the data in the top control is updated, such as focus being moved, the Furigana appears below :

 

 

bullet

IMEHold Property This simple Yes/No choice allows the hold of the Kanji Conversion Mode from the PREVIOUS control. NO is the default.

bullet

IMEMode Property This sets which Kanji Conversion Mode should be used when the control has focus. NO CONTROL is the default.

bullet

IMESentenceMode Property  This unusual mode controls the way that the IME controls groups of words, such as sentences. NORMAL is the default.

bullet

KeyboardLanguage Property  This sets the keyboard language for the current control. The languages are those set in the Microsoft Office Language Settings.

bullet

NumeralShapes Property This sets the numeral types. Japanese use the same numeral types as Europeans/US. However, they also use number names, which are not covered by this property.

bullet

Orientation Property This governs the overall orientation (right-to-left or left-to-right) of forms and reports, in the absence of setting this at control level. For Japanese use Left-to-right.

bullet

PostalAddress Property  This appears on Asian Access (I don't know why it does not appear on all) It allows the conversion of Postcodes into Address fragments. In design mode there is a wizard for this.

bullet

ReadingOrder Property  This controls the right-to-left or left-to-right orientation of displayed text in a control. If you select CONTEXT it reads the first character and decides the orientation. As both Japanese (right-to-left) and Chinese (left-to-right) use Kanji characters, I don't know what it bases its decision.

bullet

ScrollBarAlign Property  This property selects whether the scrollbar should be on the left/right of the form/report, or whether it should follow the Orientation of the parent object. For Japanese use Left-to-Right.

 

Date Formats

If you use standard Windows formats such as Long Date (yyyy mm dd) or Short Date (yy/mm/dd), it will implement the User's Regional Settings, and correctly show the Japanese date in the correct format.

However, and this is trés unusual,  ddd, dddd, mmm and mmmm formats display in English, whereas other languages such as Russian, Greek, Polish, French, Spanish and Hungarian display the local language. I really don't know why this is.

 

24-hour clock format is used. 午前 is AM, and 午後 is PM.

The SQL Server 2000 CONVERT(11)  function supports the Japanese Short Date function, and COLLATION support Japanese order rules, together with 17 suffixes which define whether there is case, accent, width or kana sensitivity.

 

Time Zones

Japan is eight hours ahead of the UK.

One of my applications restricted the User's PC clock to be between one hour behind UK to four hours ahead of UK, to cover Europe. The application worked on my Japanese test machine, but rejected the User's PC when ported onto a PC in Japan. I increased the time range to eight hours ahead.

 

Calendar

Windows in Japan has three :

bulletEnglish Gregorian
bulletJapanese Gregorian
bulletJapanese Era

The Regional And Language Options property sheet allows the user to:

bullet

Select an alternative calendar (if applicable to the selected locale).

bullet

Define a two-digit year range for each one of the available calendars.

bullet

Define a default long-date and short-date formatting for each available calendar type.

Japanese Government offices usually require official documents to have their dates in Emperor Era years rather than the Gregorian calendar. Eras now only change when the Emperor changes.  The current era of Akihito, started in 1989, and is called Heisei (Achieving Peace) and is calculated by taking 1989 from the current year and adding 1.  So 2005 – 1989 + 1 = H17

Imperial dates are formatted with the name of the era followed by year, month, and day.  The Japanese characters for year ,  month and day are used as separators. The date, 2003-01-24 (ISO 8601 yyyy-mm-dd) might therefore be written in Japanese western style as 2003124, and in Japanese imperial style as 平成15124
The year 2003 is the 15th year of the era Heisei
平成, which began in 1989 as year 1.

This makes applications more difficult when we are providing “date pickers” to minimise data entry errors and Y2K errors

 

Currency/Numbers

Currency/Numeric format is similar to US/UK format, except for the Yen symbol. EG : ¥123,456,789.00

Japanese usually use "English" digits, but can use their own too. Note that they have an extra digit for 10.

o, , , , , , , , , ,

 

 

Fonts and Glyphs

As you can see in the above example, don't use the fonts whose names begin with "@", as it tips the Japanese on its side. Perhaps these are to be reserved for vertical representation, although characters don't tip when vertical..

It seems obvious to use a Unicode font such as Arial Unicode MS.

Arial and Tahoma seem to have a wide scope. However, it should be noted that some common fonts may only produce Japanese characters on a Japanese PC, as the glyph-set for that font has the Japanese extensions only on Japanese Windows machines. As most Fonts carry English characters you may find the following :

Text Language Windows Language Result
English English OK
English Japanese OK
Japanese Japanese OK
Japanese English Fail

 

You must ensure that all the fonts used by your application are on all target machines. Where I write the date 平成十五年一月十三日, I used  MS Mincho. If you don't have this, it may not be displayed. The font rules are complex and many have fall-back substitute fonts, so could manage a display of characters, of indeterminate accuracy.

There are also Unicode fonts such as MS Unicode. It would be preferable to have applications use Unicode font to reduce local issues when the required fonts are not available on the User’s machine.

Font selection and font licensing also needs consideration. There are four font licensing levels :

bullet

Fonts can be embedded in documents and permanently installed on the remote system

bullet

Fonts can be embedded in documents, but must only be installed temporarily on the remote system.

bullet

Fonts can be embedded in documents, but must only be installed temporarily on the remote system. Documents can only be opened as read-only.

bullet

Fonts cannot be embedded in a document.

Care must be taken when distributing fonts to ensure that all are correctly licensed and the customer does not incur unnecessary costs, or infringe the font license.

It is important that applications use few fonts that are versatile. For multi-language applications, the use on new Unicode fonts must seriously be considered.

For Japanese-only applications,font choice could be MS UI Gothic, or MS Gothic.

TrueType fonts are a further issue. Font names cannot be easily hard-coded. as they are expressed in the local language.

The Japanese language uses English , Japanese and Chinese characters. English (Romanji) characters can get by with 5X7 pixels. Japanese (Hirgana and Katakana) with 16X16 pixels, and Chinese (Kanji) 24X24 pixels. Not only do Kanji need space but their proportions are different.

Size English Japanese Chinese
8 point abcdefghijklmnopqrstuvwxyz あぃいぅうぇえぉおかがきぎくぐけァアィイゥウェエォオカガキギクグ 乿偓偊偣偕偐偲做偟健倦偈偶偽偖偌倐偆偱偦偁偅偸偧側偬
10 point abcdefghijklmnopqrstuvwxyz あぃいぅうぇえぉおかがきぎくぐけァアィイゥウェエォオカガキギクグ 乿偓偊偣偕偐偲做偟健倦偈偶偽偖偌倐偆偱偦偁偅偸偧側偬
12 point abcdefghijklmnopqrstuvwxyz あぃいぅうぇえぉおかがきぎくぐけァアィイゥウェエォオカガキギクグ 乿偓偊偣偕偐偲做偟健倦偈偶偽偖偌倐偆偱偦偁偅偸偧側偬
14 point abcdefghijklmnopqrstuvwxyz あぃいぅうぇえぉおかがきぎくぐけァアィイゥウェエォオカガキギクグ 乿偓偊偣偕偐偲做偟健倦偈偶偽偖偌倐偆偱偦偁偅偸偧側偬

In multi-language applications, it is a shame to have to have large fields everywhere, just because one language requires it. One way around this that I have used is to implement two minor changes.

bullet

When a user clicks on a field with Japanese data, the OnClick event calls a Shift-F2 which activates the in-line editor. The user can select the font size, and this will be remembered by the application.

bullet

The Japanese data is automatically displayed at a larger font size, by reading the Unicode identifiers of the characters.

 

Capitalisation

Capitalisation has no meaning in Japanese, so DON'T  use the UCase() function; it will either crash or produce gibberish..

 

Line and Word Breaks

The Japanese language does not necessarily use spaces to indicate distinction between words.

Japanese line breaking is based on the kinsoku rules : you can break lines between any two characters, with several exceptions :

  1. A line of text cannot end with any leading characters–such as opening quotation marks, opening parentheses, and currency signs–that shouldn't be separated from succeeding characters.
  2. A line of text cannot begin with any following characters, such as closing quotation marks, closing parentheses, and punctuation marks, that shouldn't be separated from preceding characters.
  3. Certain overflow characters (such as punctuation characters) are allowed to extend beyond the right margin for horizontal text or below the bottom margin for vertical text.

 

Sorting

Japanese data freely uses all it's charactersets. The main sort order is Shift-JIS, which sorts Romanji (Latin), Hiragana and Katakana in phonetic order, but Kanji in radical order. Sort that one out!

Japanese can also be sorted in Unicode order, which is perhaps preferable when dealing with multi-language data.

It must be remembered that this may not just be an issue with Microsoft Access, but also may affect the Enterprise Server and SQL, using the international extensions of the ORDER BY statement. SQL Server Sort Orders depend on what Service Packs have been fitted.

 

Find

Up to now most of us think that there are only two Find options :

bullet

Case Sensitive

bullet

Case Insensitive

But Japanese Microsoft Office Find options include :

bullet

Case Sensitive. To not distinguish between uppercase and lowercase characters

bullet

Width Sensitive. To not distinguish between full-width and half-width characters

bullet

Hiragana/Katakana Sensitive. To not distinguish between Hiragana and Katakana characters.

bullet

Match Contractions (yo-on, sokun). Searches without distinguishing characters with dipthongs and double consonants and plain characters.

bullet

Match minus / dash (cho-on). Searches without distinguishing between minus signs, dashes and long vowel sounds.

bullet

Match 'repeat character' marks. Searches without distinguishing between repeat character marks.

bullet

Match variant-form kanji (itaiji). Searches without distinguishing between standard and non-standard ideography.

bullet

Match old kana forms. Searches without distinguishing between new and old kana.

bullet

Match cho-on used for vowels. Searches without distinguishing between characters with long vowel sounds and plain characters.

bullet

Match di/zi, du/ zu。Searches without distinguishing between and or and .

bullet

Match ba/va, ha/fa. Searches without distinguishing and ヴァ or and ワァ

bullet

Match tsi/thi/chi, dhi/zi. Searches without distinguishing ツィ  and , or ディ and 

bullet

Match hyu/iyu, byu/vyu. Searches without distinguishing ヒュ and ワュ, or ビュ and ヴュ

bullet

Match se/she, ze/je. Searches without distinguishing and ッェor and ッェ

bullet

Match ia/iya. Searches without distinguishing and following -row and -row characters。

bullet

Match ki/ku. Searches without distinguishing between and before -row characters.

bullet

Punctuation Characters. Searches without distinguishing between punctuation characters.

bullet

Whitespace Characters. Searches without distinguishing between characters used as blank spaces, such as full-width spaces, half-width spaces and tabs.

Although Microsoft Word supports the above, Microsoft Access does not. However the find options within OLE DB providers and Enterprise SQL could be explored.

 

Paper Size

The Japanese use metric paper sizes and their "A" sizes are ISO standard.

However their "B" paper sizes are not the same as ISO paper sizes. For instance ISO B5 is 176mm X 250 mm, whereas Japanese B5 is 182mm X 257mm

 

Database Drivers

Always use the latest ODBC and OLE DB drivers. Ensure that the latest Client Tools, EG Oracle Client or MDAC, are installed. The later the revision, the more likely it is to cope with Unicode, languages and locales. Be careful when rolling these out, as they can easily be affected by Regional Settings. I have had a driver produce numbers 100 times too large, but only in one country. Once reinstalled in English, then switched back to local language, it was fine. Another renamed the driver name in local language, so that the application could not find it.

Not only may your application be affected by inadequate environment, but your  tools, such as Enterprise Manager, may be affected too.

 

Data Type

Japanese data can be stored in Access text, SQL Server NVarChar, and Oracle 9i VarChar type.

In the case of SQL Server, remember to preface string literals with "n" to retain the characters, which would otherwise be converted to Unicode. Even if you pass Unicode, it will be double-encoded without the "n".

 

Shift-JIS

If you were too young to take drugs in the sixties, then get your head around Shift-JIS; it has a similar affect to the early hallucegens.

There is a lot written about this, but not much that I can understand. Therefore, I have written a table-driven solution based on this TABLE and you can load a working example with this Access XP application ShiftJIS_PaulMillennas.zip