Using UTF-8 Characters

This page is outdated and no longer receives updates!

Requirements for Using UTF-8

NOTE: Use any OpenMRS version 1.6 and beyond.  These version have also been fixed:  1.5.1, 1.4 Beta 2 (aka 1.4.0.22), or 1.3.4 (See ticket:965, ticket:365, ticket:1539, ticket:1943 for updates)

Modify Hibernate Database Connection URL

Modify the database URL connection so it would know that we're passing UTF-8 char.

Add this string into your connection.url in your runtime properties file.

&useUnicode=true&characterEncoding=UTF-8

The connection will become something like this:

connection.url=jdbc:mysql://localhost:3306/openmrs_test?autoReconnect=true&useUnicode=true&characterEncoding=UTF-8

Modify Tomcat Server Configuration

From Tomcat UTF-8

Modify Tomcat's server.xml file and add the following attribute to the connector tag:

URIEncoding="UTF-8"

The end result would be something like this:

Modify MySQL configuration

From MySQL: Configuring the Character Set

Make sure that your mysql installation is setup to use UTF-8, which is not necessarily the default character set in a mysql installation. To create the openmrs database such that its tables will use a given international default character set and collation for data storage, use a CREATE DATABASE statement like this:

Add the following lines to the "[mysqld]" and "[mysql]" sections of the MySQL configuration file, adding the sections if necessary. For Ubuntu, the file is located in /etc/mysql/my.cnf.

For interested developers

These are the changes we had to make to our code:

Modify OpenMRS Web Configuration

Added the following line to OpenMRS web.xml:

Modify OpenMRS Header File

Added the following as the first line in headerFull.jsp and headerMinimal.jsp:<%@ page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8" %>

Check the current setting

To confirm the current character set, modify maintenance/systemInfo.jsp by adding these lines:

Modified Controller to make correct urls

Modify the AddPersonController.java: 23

Fromreturn "?addName=" + name + ...
Toreturn "?addName=" + URLEncoder.encode(name, "UTF-8") + ...
Notice the "name" needs to be explicitly encoded because the function will return the redirection URL to the client. Other similar pattern will also need to be encoded as necessary.

Messages.properties files

You can now directly put unicode characters into messages_XX.properties. You no longer have to type unicode codes into strings (remember how these files used to be full of \u00E9, for example?). The key, however, is to make sure that your file editor encodes the messages_XX.properties file using UTF-8. If you open messages_fr.properties in eclipse, and you see garbage, change the default character encoding of your editor, or try a different application like Notepad++, if necessary.

java.util.Properties

The java Properties class has two methods (load() and store()) that don't support UTF-8, that were used in a couple of places, most notably when loading message string files. There are now alternate methods that you can use in OpenmrsUtil that preform the same function, but read and write using UTF-8.

Reading in UTF-8 files

There is an interested Java issue involving reading in a UTF-8 file.  Some applications (including Microsoft Notepad) add a special character called a byte order mark to the beginning of a UTF-8 file, but Java's standard encoding doesn't support this:

http://bugs.sun.com/view_bug.do?bug_id=4508058

A workaround that *appears* to fix the issue (I haven't done serious testing) is to wrap a file input stream in an org.apache.velocity.io.UnicodeInputStream instance, which is specifically designed to be BOM-aware.  (There is also a BOMInputStream in apache.commons.io, but in a later version of the jar than is included in Openmrs 1.6).