UTF-8 at grml

General information

Starting with version 1.0 grml uses UTF-8 as default encoding. The main reason for this change is that all the other major Linux distributions (including Debian) use UTF-8 as well nowadays.

UTF-8 on grml

Bootoptions

By default grml uses the UTF-8 encoding. To disable UTF-8 as default encoding just boot using:

grml lang=$LANG-iso

… where $LANG is the corresponding language. For example use 'lang=de-iso' to use german keyboard and ISO8859-15 encoding.

Main configuration

grml uses /etc/default/locale as configuration file for environment settings as suggested and supported by Debian. If you want to change settings inside this file either adjust it manually with the editor of your choice or use the dialog based frontend named grml-setlang on your grml system. Make sure that your login shell sources the file so the configuration is available within your environment (done via /etc/zsh/zshenv in grml's zsh).

Main troubleshooting within a few seconds

  • Check the environment settings via 'env | grep -i utf' and the keyboard mode on plain console using 'kbd_mode'. Change environment settings global via “grml-setlang $LANG[-iso]” (make sure you re-login or at least restart the shell so the environment variables set in /etc/default/locale are being read). Change keyboard mode on plain console via 'kbd_mode -a' (ASCII-mode) and 'kbd_mode -u' (UTF-8).
  • Use unicode_start and unicode_stop on plain console to switch between different settings.
  • If the application still has problems within UTF-8/ISO mode check whether the application has a commandline switch which en-/disables UTF-8 (like 'screen -U', 'uxterm',…) or if there's a configuration file available which can be adjusted according to your needs and the appropriate mode. If that does not work use luit as a wrapper, just invoke something like ''LANG=de_DE.iso88591 luit centericq'.
  • Console : before running loadkeys fr, take a look at /usr/share/keymaps/i386/azerty/ to select the correct map. (if you type loadkeys fr, you don't have accents and € ; the correct choice is loadkeys fr-latin9.

Useful software for working with UTF-8

Debian packages

  • console-setup: Setup the font and the keyboard on the console
  • dynafont: Module for konwert package which loads UTF-8 fonts dynamically
  • utf8-migration-tool: Debian UTF-8 migration wizard (not shipped by grml due to space reasons)
  • xutils: provides /usr/bin/luit (a filter that can be run between an arbitrary application and a UTF-8 terminal emulator)

Tools

  • ascii2uni: convert 7-bit ASCII representations to UTF-8 Unicode [package uni2ascii]
  • isutf8: check whether files are valid UTF-8 [package moreutils]
  • uni2ascii: convert UTF-8 Unicode to various 7-bit ASCII representations [package uni2ascii]
  • utf8tolatin1: reads utf-8 encoded text on stdin and writes latin1 (iso8859-1) encoded text on stdout [package o3read]
  • uxterm: X terminal emulator for Unicode (UTF-8) environments [package xterm]
  • vt-is-UTF8: check whether current VT is in UTF8- or byte-mode. [console-tools]
  • iconv: Convert encoding of given files from one encoding to another [package libc6]
  • unicode_start: put the console in Unicode mode [package console-tools]
  • unicode_stop: put the console out of unicode mode (ie. in 8-bit mode) [package console-tools]
  • luit: a filter that can be run between an arbitrary application and a UTF-8 terminal emulator

Software known to have problems with UTF-8

You can find software within grml's package selection that does not (yet) support UTF-8/unicode. If you find software having problems within UTF-8 environment not yet listed here please feel free to extend the list.

  • mrxvt
  • aterm
  • centericq (check centerim for updates or use package centericq-utf8 or centericq via luit)

Notice: software known to cause problems with UTF-8 is started within a ISO8859-15 environment plus the luit wrapper on grml, see the following shell configuration part for details.

Tipps when working with UTF-8

Shell configuration

grml's zsh configuration provides a function named isutfenv:

isutfenv () {
        case "$LANG $CHARSET $LANGUAGE" in
                (*utf*) return 0 ;;
                (*UTF*) return 0 ;;
                (*) return 1 ;;
        esac
}

You can use this function to check whether you have an UTF-8 environment and work around problematic software via something like:

isutfenv && [ -n "$LANG" ] && alias mrxvt="LANG=${LANG/(#b)(*)[.@]*/$match[1].iso885915} luit mrxvt"

Some further snippets from grml's zsh configuration:

  alias term2iso="echo 'Setting terminal to iso mode' ; echo -e '^[%@'"
  alias term2utf="echo 'Setting terminal to utf-8 mode'; echo -e '^[%G'"

  alias utf2iso='if isutfenv ; then
   for ENV in `env | grep UTF` ; do
       eval export "$(echo $ENV | sed 's/UTF-8/iso885915/')"
   done
   fi'
  alias iso2utf='if isutfenv ; then
   for ENV in `env | grep '\.iso'` ; do
       eval export "$(echo $ENV | sed 's/iso.*/UTF-8/')"
   done
   fi'

Get real UTF-8 support on plain console

On plain console (tty1, tty2,… not xterm & CO!) grml uses the font Uni3-Terminus16 of Debian package console-terminus. Whereas it works pretty OK for common work on console it is not a real UTF-8 capable font. To get real UTF-8 support on your console use dynafont. Please notice that this slows down your terminal as it loads required fonts dynamically.

The most common of dynafont's usage:

# filterm - dynafont

If you are not using framebuffer, you can use this command:

# filterm - 512bold+dynafont

If the keyboard has ISO-8859-x encoding, then it is possible to run filterm with ISO→UTF 'by-fly' converting, i.e.:

# filterm iso2-UTF8 dynafont

Converting files

Convert files from Unicode / UTF to ISO:

% iconv -c -f utf8 -t iso-8859-15 < utffile > isofile

and vice versa:

% iconv -f iso-8859-15 -t utf8 < isofile > utffile

Test UTF-8 capabilities of terminal

wget http://www.linux-cjk.net/Console/garabik/UTF-8-demo.txt.gz
zcat UTF-8-demo.txt.gz

Running terminal in ISO mode on a UTF-8 system

You might notice problems with the terminal, if your local system uses UTF-8 whereas the remote system you're connecting to (for example via SSH) uses ISO mode (iso8859-1[5] for example). To get an appropriate terminal running X on your local system just invoke uxterm (a wrapper around the xterm(1) program that invokes the latter program with the 'UXTerm' X resource class set if necessary). If you want to get a plain ISO terminal even if your local system and its environment uses UTF-8 invoke /usr/bin/iso-term on your grml system. iso-term is a simple wrapper script that changes all the UTF-8 stuff inside your environment (check via running 'env') to iso885915 and invokes x-terminal-emulator then. finally you can also invoke your applications via the luit wrapper, for example via 'LANG=de_DE.iso88591 luit zsh'.

Further useful ressources

 
utf8.txt · Last modified: 2011/10/13 23:55 (external edit)
 
Recent changes RSS feed Creative Commons License Valid XHTML 1.0 Valid CSS Grml homepage Driven by DokuWiki