Right to Left Text support in ibex
Ibex will render Arabic and Hebrew text. Support for right to left (rtl) text includes:
-
mirroring - where a character such as '(' is reversed so that it is still correct when the text is read right to left;
-
shaping of Arabic text - where a character changes shape depending on its surrounding characters;
-
support for the Unicode Bidirectional Algorithm including
the explicit embedding characters: LRO, RLO, LRE, RLE, PDF;
Text is read from the XSL-FO file in the natural order, by which we mean order in which the characters would be written. This means
that for a line of Arabic text (i.e. right to left) the first character on the line (which will be displayed at the right hand end)
is the first character in the XML.
Ibex can determine the direction of text from the letters which make up the text. It is not necessary to use
direction="rtl" to specify text direction.
Specifying
writing-mode="rl-tb" can be used to tell Ibex which side of an element is the start edge. This
affects (1) the order in which fo:table-cell elements are positioned across the fo:table-row, and (b) the
affect of properties such as border-start-width. When writing-mode="lr-tb" the start edge is the left hand edge,
when writing-mode="rl-tb" the start edge is the right hand edge.
For clarity the Arabic text in the example which follows is entered as Unicode values. This prevents your browser from
applying any formatting and makes the example easier to follow. In normal usage Arabic text would be entered as characters, not Unicode values.
Example
The following code displays two Arabic characters separated by a space:
<?xml version="1.0" encoding="utf-8"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
<fo:layout-master-set>
<fo:simple-page-master master-name="page"
page-height="29.7cm"
page-width="21cm" margin="2cm">
<fo:region-body margin-top="3cm"/>
</fo:simple-page-master>
</fo:layout-master-set>
<fo:page-sequence master-reference="page">
<fo:flow flow-name="xsl-region-body">
<fo:block-container writing-mode="rl-tb">
<fo:block font-family="arial" font-size="30pt" >
م ك
</fo:block>
</fo:block-container>
</fo:flow>
</fo:page-sequence>
</fo:root>
The two characters used are:
|
Unicode Value
|
Name
|
Appearance
|
|
U+0645
|
ARABIC LETTER MEEM
|
|
|
U+0643
|
ARABIC LETTER KAF
|
|
The Unicode information comes from
http://www.unicode.org/charts/PDF/U0600.pdf
Ordering
The two characters appear in the above FO in the order U+0645 U+0643. When Ibex renders the PDF it recognises that the text is Arabic and
reverses the order of the characters in the text, to produce this (note the space between the characters):
Shaping
The above image has a space between the two characters. If this space is removed script shaping will take place and the
glyphs will be changed to reflect the position of the character in the word. Each character has four possible formats:
- initial - when the character is the first character in the word;
- medial - when the character is in the middle of the word;
- final - when the character is the last character in the word;
- isolated - when the character is by itself.
When the space between the two letters is removed Ibex applies script shaping and produces the following text:
Script shaping has changed each of the characters as shown in this table:
|
Original Unicode Value
|
Old Name
|
Old Appearance
|
New Unicode Value
|
New Name
|
New Appearance
|
|
U+0645
|
ARABIC LETTER MEEM
|
|
U+FEE3
|
ARABIC LETTER MEEM INITIAL FORM
|
|
|
U+0643
|
ARABIC LETTER KAF
|
|
U+FEDA
|
ARABIC LETTER KAF FINAL FORM
|
|
Which letter is converted to the initial or final form is calculated reading right to left, so the rightmost character
in a word is the initial character, the leftmost word is the final one.
.NET Implementation Details
The .NET version of Ibex ships with an assembly called something like ibexshaping20.dll. The exact name will depend
on the .NET framework you are using and whether your code is compiled for 32 or 64 bits (see the Ibex manual for details).
This assembly contains a C++ wrapper for the
Windows Uniscribe API
which is used to do the shaping. This assembly is loaded using reflection, so if no right to left text will be processed
by your application you do not need to deploy this assembly.
Java Implementation details
To implement script shaping in Java requires the IBM ICU cross-platform Unicode based globalization library, which is distributed for free by IBM.
The Jar file required is called icu4j-3_8_1.jar, it can be downloaded from
http://icu-project.org/userguide/icufaq.html.
The BIDI algorithm can be tested online
here
Ibex attempts to load the required classes for script shaping from the classpath. If it finds icu4j-3_8_1.jar on the
classpath Ibex will do script shaping, otherwise it will not. So specifying a classpath like the one below will work:
java -classpath ibex-4.3.3.jar;icu4j-3_8_1.jar ibex.Run -xml arabic.fo -pdf test.pdf
Versions
Bidirectional text processing is available in Ibex Professional versions, not in Ibex standard versions.
Bidirectional text processing is not available when using .NET Framework 1.0.
Know Issues
In some cases formatting Arabic text with different applications will not produce identical results. Specifically
we have seen instances where Internet Explorer and Mozilla will display the same Arabic text differently.
When using .NET, Ibex uses the Windows Uniscribe API for character shaping, so should produce the same visual results
as Internet Explorer, which also uses Uniscribe.
When using Java, Ibex uses the IBM ICU cross-platform Unicode based globalization library, and so should produce
the same results as Mozilla.