The Windows Console doesn’t support Unicode. It does, however, support Double Byte Character Sets using Code Pages. By changing the system locale, the Console can display Japanese, Korean, and Chinese text:
Terminology
UTF-8 and UTF-16 are types of Unicode. However, it’s common on Windows to refer to UTF-16 as Unicode, and UTF-8 as UTF-8. I will follow this convention. DBCS (Double Byte Character Set) is the only type of MBCS (Multi Byte Character Set) supported by legacy (i.e. non-Unicode) Windows applications. Japanese, Chinese, and Korean are supported via DBCS encodings. None of these DBCS encodings are Unicode, and all of them are proprietary Microsoft implementations of other standards.
Code Pages Supported by Windows
Windows supports four Double Byte Character Set code pages:
- 932 (Japanese Shift-JIS)
- 936 (Simplified Chinese GBK)
- 949 (Korean)
- 950 (Traditional Chinese Big5)
The available code pages are determined by your System Locale. If your System Locale is set to “English (United States)”, then these code pages will be unavailable to you. In this post, I will only be covering Japanese, since it’s the only language with which I have any familiarity. The steps and results would be similar for the other languages.
How to Change System Locale
To change your system locale, go into “Change date, time, or number formats”:
Select the Administrative tab, and click on “Change system locale”. Select the new system locale, click OK, and reboot. The system must be rebooted to change the system locale:
Windows Console Font and Code Page
The font typically recommended for Japanese output is MS Gothic. I have, however, found that Japanese text displays with the Terminal font selected, but it’s entirely possible that the UI is lying to me.
To change the Windows Console code page, use the chcp command. chcp with no arguments will display the active code page.
Code Page 932 (Japanese Shift-JIS)
With the code page set to 932 (Japanese Shift-JIS), the path separator character will change into the Yen symbol (because only the backslash and tilde characters differ from ASCII in the lower 7-bits of Shift-JIS). Japanese file names will display in Japanese, as will text saved as Unicode. Japanese text saved as UTF-8 will display as gibberish:
Code Page 65001 (UTF-8)
I have found that it will sometimes work to set the code page to 65001 (UTF-8). Japanese filenames, Japanese Unicode file content, and Japanese UTF-8 content will all three display, as shown below. However, when I experimented with this it stopped working after changing fonts and code pages a few times. My final impression is that it should work, but that the Console has some bugs in this regard.
Here’s a screen shot of the Console after code page 65001 stopped working as expected:
References
- Code Pages Supported by Windows (MSDN)
- Encodings of Japanese (sci.lang.japan FAQ)