Differences

This shows you the differences between two versions of the page.

Link to this comparison view

utf8 [2011/10/13 23:55] (current)
Line 1: Line 1:
 +====== UTF-8 at grml ======
  
 +===== General information =====
 +
 +Starting with version 1.0 grml uses UTF-8 as default encoding. The main reason for this change is that all the other major Linux distributions (including [[http://www.debian.org/releases/etch/i386/release-notes/ch-whats-new.en.html|Debian]]) use UTF-8 as well nowadays.
 +
 +===== UTF-8 on grml =====
 +
 +==== Bootoptions ====
 +
 +By default grml uses the UTF-8 encoding. To disable UTF-8 as default encoding just boot using:
 +
 +<code>
 +grml lang=$LANG-iso
 +</code>
 +
 +... where $LANG is the corresponding language. For example use 'lang=de-iso' to use german keyboard and ISO8859-15 encoding.
 +
 +==== Main configuration ====
 +
 +grml uses /etc/default/locale as configuration file for environment settings as suggested and supported by Debian. If you want to change settings inside this file either adjust it manually with the editor of your choice or use the dialog based frontend named grml-setlang on your grml system. Make sure that your login shell sources the file so the configuration is available within your environment (done via /etc/zsh/zshenv in grml's zsh).
 +
 +
 +
 +
 +==== Main troubleshooting within a few seconds ====
 +
 +  * Check the environment settings via 'env | grep -i utf' and the keyboard mode on plain console using 'kbd_mode'. Change environment settings global via "grml-setlang $LANG[-iso]" (make sure you re-login or at least restart the shell so the environment variables set in /etc/default/locale are being read). Change keyboard mode on plain console via 'kbd_mode -a' (ASCII-mode) and 'kbd_mode -u' (UTF-8).
 +  * Use unicode_start and unicode_stop on plain console to switch between different settings.
 +  * If the application still has problems within UTF-8/ISO mode check whether the application has a commandline switch which en-/disables UTF-8 (like 'screen -U', 'uxterm',...) or if there's a configuration file available which can be adjusted according to your needs and the appropriate mode. If that does not work use luit as a wrapper, just invoke something like ''LANG=de_DE.iso88591 luit centericq'.
 +  * Console : before running loadkeys fr, take a look at /usr/share/keymaps/i386/azerty/ to select the correct map. (if you type loadkeys fr, you don't have accents and € ; the correct choice is loadkeys fr-latin9.
 +
 +===== Useful software for working with UTF-8 =====
 +
 +==== Debian packages ====
 +
 +  * console-setup: Setup the font and the keyboard on the console
 +  * dynafont: Module for konwert package which loads UTF-8 fonts dynamically
 +  * utf8-migration-tool: Debian UTF-8 migration wizard (not shipped by grml due to space reasons)
 +  * xutils: provides /usr/bin/luit (a filter that can be run between an arbitrary application and a UTF-8 terminal emulator)
 +
 +
 +==== Tools ====
 +
 +  * ascii2uni: convert 7-bit ASCII representations to UTF-8 Unicode [package uni2ascii]
 +  * isutf8: check whether files are valid UTF-8 [package moreutils]
 +  * uni2ascii: convert UTF-8 Unicode to various 7-bit ASCII representations [package uni2ascii]
 +  * utf8tolatin1: reads utf-8 encoded text on stdin and writes latin1 (iso8859-1) encoded text on stdout [package o3read]
 +  * uxterm: X terminal emulator for Unicode (UTF-8) environments [package xterm]
 +  * vt-is-UTF8: check whether current VT is in UTF8- or byte-mode. [console-tools]
 +  * iconv: Convert encoding of given files from one encoding to another [package libc6]
 +  * unicode_start: put the console in Unicode mode [package console-tools]
 +  * unicode_stop: put the console out of unicode mode (ie. in 8-bit mode) [package console-tools]
 +  * luit: a filter that can be run between an arbitrary application and a UTF-8 terminal emulator
 +
 +===== Software known to have problems with UTF-8 =====
 +
 +You can find software within grml's package selection that does not (yet) support UTF-8/unicode. If you find software having problems within UTF-8 environment not yet listed here please feel free to extend the list.
 +
 +  * mrxvt
 +  * aterm
 +  * centericq (check [[http://repo.or.cz/w/centerim.git|centerim]] for updates or use package centericq-utf8 or centericq via luit)
 +
 +Notice: software known to cause problems with UTF-8 is started within a ISO8859-15 environment plus the luit wrapper on grml, see the following shell configuration part for details.
 +
 +===== Tipps when working with UTF-8 =====
 +
 +==== Shell configuration ====
 +
 +grml's zsh configuration provides a function named isutfenv:
 +
 +<code>
 +isutfenv () {
 +        case "$LANG $CHARSET $LANGUAGE" in
 +                (*utf*) return 0 ;;
 +                (*UTF*) return 0 ;;
 +                (*) return 1 ;;
 +        esac
 +}
 +</code>
 +
 +You can use this function to check whether you have an UTF-8 environment and work around problematic software via something like:
 +
 +<code>
 +isutfenv && [ -n "$LANG" ] && alias mrxvt="LANG=${LANG/(#b)(*)[.@]*/$match[1].iso885915} luit mrxvt"
 +</code>
 +
 +Some further snippets from [[http://grml.org/zsh/|grml's zsh configuration]]:
 +
 +<code>
 +  alias term2iso="echo 'Setting terminal to iso mode' ; echo -e '^[%@'"
 +  alias term2utf="echo 'Setting terminal to utf-8 mode'; echo -e '^[%G'"
 +
 +  alias utf2iso='if isutfenv ; then
 +   for ENV in `env | grep UTF` ; do
 +       eval export "$(echo $ENV | sed 's/UTF-8/iso885915/')"
 +   done
 +   fi'
 +  alias iso2utf='if isutfenv ; then
 +   for ENV in `env | grep '\.iso'` ; do
 +       eval export "$(echo $ENV | sed 's/iso.*/UTF-8/')"
 +   done
 +   fi'
 +</code>
 +
 +==== Get real UTF-8 support on plain console ====
 +
 +On plain console (tty1, tty2,... not xterm & CO!) grml uses the font Uni3-Terminus16 of Debian package console-terminus. Whereas it works pretty OK for common work on console it is not a real UTF-8 capable font. To get real UTF-8 support on your console use dynafont. Please notice that this slows down your terminal as it loads required fonts dynamically.
 +
 +The most common of dynafont's usage:
 +
 +<code>
 +# filterm - dynafont
 +</code>
 +
 +If you are not using framebuffer, you can use this command:
 +
 +<code>
 +# filterm - 512bold+dynafont
 +</code>
 +
 +If the keyboard has ISO-8859-x encoding, then it is possible to run filterm with ISO->UTF 'by-fly' converting, i.e.:
 +
 +<code>
 +# filterm iso2-UTF8 dynafont
 +</code>
 +
 +==== Converting files ====
 +
 +Convert files from Unicode / UTF to ISO:
 +
 +<code>
 +% iconv -c -f utf8 -t iso-8859-15 < utffile > isofile
 +</code>
 +
 +and vice versa:
 +
 +<code>
 +% iconv -f iso-8859-15 -t utf8 < isofile > utffile
 +</code>
 +
 +==== Test UTF-8 capabilities of terminal ====
 +
 +<code>
 +wget http://www.linux-cjk.net/Console/garabik/UTF-8-demo.txt.gz
 +zcat UTF-8-demo.txt.gz
 +</code>
 +
 +==== Running terminal in ISO mode on a UTF-8 system ====
 +
 +You might notice problems with the terminal, if your local system uses UTF-8 whereas the remote system you're connecting to (for example via SSH) uses ISO mode (iso8859-1[5] for example). To get an appropriate terminal running X on your local system just invoke uxterm (a  wrapper around the xterm(1) program that invokes the latter program with the 'UXTerm' X resource class set if necessary). If you want to get a plain ISO terminal even if your local system and its environment uses UTF-8 invoke /usr/bin/iso-term on your grml system. iso-term is a simple wrapper script that changes all the UTF-8 stuff inside your environment (check via running 'env') to iso885915 and invokes x-terminal-emulator then. finally you can also invoke your applications via the luit wrapper, for example via 'LANG=de_DE.iso88591 luit zsh'.
 +
 +
 +===== Further useful ressources =====
 +
 +  * [[http://wiki.debian.org/UTF8BrokenApps|UTF8 Broken Apps @ wiki.debian.org]]
 +  * [[http://www.gentoo-wiki.info/UTF-8|HOWTO Make your system use unicode/utf-8 @ Gentoo-Wiki]]
 +  * [[http://www.cl.cam.ac.uk/~mgk25/unicode.html|UTF-8 and Unicode FAQ for Unix/Linux by Markus Kuhn]]
 +  * [[http://www.linux-cjk.net/Console/garabik/garabik.howto.html|Step by step introduction to switching your debian installation to utf-8 encoding.]]
 
utf8.txt · Last modified: 2011/10/13 23:55 (external edit)
 
Recent changes RSS feed Creative Commons License Valid XHTML 1.0 Valid CSS Grml homepage Driven by DokuWiki