Requirements for Using UTF-8
NOTE: Use any OpenMRS version 1.6 and beyond. These version have also been fixed: 1.5.1, 1.4 Beta 2 (aka 1.4.0.22), or 1.3.4 (See ticket:965, ticket:365, ticket:1539, ticket:1943 for updates)
Modify Hibernate Database Connection URL
Modify the database URL connection so it would know that we're passing UTF-8 char.
Add this string into your connection.url in your runtime properties file.
Code Block |
---|
&useUnicode=true&characterEncoding=UTF-8 |
The connection will become something like this:
Code Block |
---|
connection.url=jdbc:mysql://localhost:3306/openmrs_test?autoReconnect=true&useUnicode=true&characterEncoding=UTF-8 |
Modify Tomcat Server Configuration
From Tomcat UTF-8
Modify Tomcat's server.xml file and add the following attribute to the connector tag:
Code Block |
---|
URIEncoding="UTF-8" |
The end result would be something like this:
Code Block |
---|
<Connector port="8080" maxHttpHeaderSize="8192" ... URIEncoding="UTF-8" /> |
Modify MySQL configuration
From MySQL: Configuring the Character Set
Make sure that your mysql installation is setup to use UTF-8, which is not necessarily the default character set in a mysql installation. To create the openmrs database such that its tables will use a given international default character set and collation for data storage, use a CREATE DATABASE statement like this:
Code Block |
---|
CREATE DATABASE openmrs DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci; |
Add the following lines to the "[mysqld]" and "[mysql]" sections of the MySQL configuration file, adding the sections if necessary. For Ubuntu, the file is located in /etc/mysql/my.cnf.
Code Block |
---|
[mysqld] character-set-server=utf8 collation-server=utf8_general_ci [mysql] default-character-set=utf8 |
For interested developers
These are the changes we had to make to our code:
Modify OpenMRS Web Configuration
Added the following line to OpenMRS web.xml:
Code Block |
---|
<filter> <filter-name>charsetFilter</filter-name> <filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class> <init-param> <param-name>encoding</param-name> <param-value>UTF-8</param-value> </init-param> <init-param> <param-name>forceEncoding</param-name> <param-value>true</param-value> </init-param> </filter> <filter-mapping> <filter-name>charsetFilter</filter-name> <url-pattern>/*</url-pattern> </filter-mapping> |
Modify OpenMRS Header File
Added the following as the first line in headerFull.jsp and headerMinimal.jsp:<%@ page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8" %>
Check the current setting
To confirm the current character set, modify maintenance/systemInfo.jsp by adding these lines:
Code Block |
---|
<tr> <td>Default Locale</td> <td><%= java.nio.charset.Charset.defaultCharset() %></td> </tr> |
Modified Controller to make correct urls
Modify the AddPersonController.java: 23
Fromreturn "?addName=" + name + ...
Toreturn "?addName=" + URLEncoder.encode(name, "UTF-8") + ...
Notice the "name" needs to be explicitly encoded because the function will return the redirection URL to the client. Other similar pattern will also need to be encoded as necessary.
Messages.properties files
You can now directly put unicode characters into messages_XX.properties. You no longer have to type unicode codes into strings (remember how these files used to be full of \u00E9, for example?). The key, however, is to make sure that your file editor encodes the messages_XX.properties file using UTF-8. If you open messages_fr.properties in eclipse, and you see garbage, change the default character encoding of your editor, or try a different application like Notepad++, if necessary.
java.util.Properties
The java Properties class has two methods (load() and store()) that don't support UTF-8, that were used in a couple of places, most notably when loading message string files. There are now alternate methods that you can use in OpenmrsUtil that preform the same function, but read and write using UTF-8.
Reading in UTF-8 files
There is an interested Java issue involving reading in a UTF-8 file. Some applications (including Microsoft Notepad) add a special character called a byte order mark to the beginning of a UTF-8 file, but Java's standard encoding doesn't support this:
http://bugs.sun.com/view_bug.do?bug_id=4508058
A workaround that *appears* to fix the issue (I haven't done serious testing) is to wrap a file input stream in an org.apache.velocity.io.UnicodeInputStream instance, which is specifically designed to be BOM-aware. (There is also a BOMInputStream in apache.commons.io, but in a later version of the jar than is included in Openmrs 1.6).