Right to Left Text support in ibex

Ibex will render Arabic and Hebrew text. Support for right to left (rtl) text includes:

  • mirroring - where a character such as '(' is reversed so that it is still correct when the text is read right to left;

  • shaping of Arabic text - where a character changes shape depending on its surrounding characters;

  • support for the Unicode Bidirectional Algorithm including the explicit embedding characters: LRO, RLO, LRE, RLE, PDF;

Text is read from the XSL-FO file in the natural order, by which we mean order in which the characters would be written. This means that for a line of Arabic text (i.e. right to left) the first character on the line (which will be displayed at the right hand end) is the first character in the XML.

Ibex can determine the direction of text from the letters which make up the text. It is not necessary to use direction="rtl" to specify text direction.

Specifying writing-mode="rl-tb" can be used to tell Ibex which side of an element is the start edge. This affects (1) the order in which fo:table-cell elements are positioned across the fo:table-row, and (b) the affect of properties such as border-start-width. When writing-mode="lr-tb" the start edge is the left hand edge, when writing-mode="rl-tb" the start edge is the right hand edge.

For clarity the Arabic text in the example which follows is entered as Unicode values. This prevents your browser from applying any formatting and makes the example easier to follow. In normal usage Arabic text would be entered as characters, not Unicode values.

Example

The following code displays two Arabic characters separated by a space:

<?xml version="1.0" encoding="utf-8"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
  <fo:layout-master-set>
    <fo:simple-page-master master-name="page" 
          page-height="29.7cm" 
          page-width="21cm" margin="2cm">
         <fo:region-body margin-top="3cm"/>
     </fo:simple-page-master>
     </fo:layout-master-set>
     <fo:page-sequence master-reference="page">
          <fo:flow flow-name="xsl-region-body">
               <fo:block-container writing-mode="rl-tb">
                    <fo:block font-family="arial" font-size="30pt" >
                         &#x0645;&#x0020;&#x0643;
                    </fo:block>
               </fo:block-container>
          </fo:flow>
     </fo:page-sequence>
</fo:root>
      

The two characters used are:

Unicode Value

Name

Appearance

U+0645

ARABIC LETTER MEEM

U+0643

ARABIC LETTER KAF

The Unicode information comes from http://www.unicode.org/charts/PDF/U0600.pdf

Ordering

The two characters appear in the above FO in the order U+0645 U+0643. When Ibex renders the PDF it recognises that the text is Arabic and reverses the order of the characters in the text, to produce this (note the space between the characters):

Shaping

The above image has a space between the two characters. If this space is removed script shaping will take place and the glyphs will be changed to reflect the position of the character in the word. Each character has four possible formats:

  • initial - when the character is the first character in the word;
  • medial - when the character is in the middle of the word;
  • final - when the character is the last character in the word;
  • isolated - when the character is by itself.

When the space between the two letters is removed Ibex applies script shaping and produces the following text:

Script shaping has changed each of the characters as shown in this table:

Original Unicode Value

Old Name

Old Appearance

New Unicode Value

New Name

New Appearance

U+0645

ARABIC LETTER MEEM

U+FEE3

ARABIC LETTER MEEM INITIAL FORM

U+0643

ARABIC LETTER KAF

U+FEDA

ARABIC LETTER KAF FINAL FORM

Which letter is converted to the initial or final form is calculated reading right to left, so the rightmost character in a word is the initial character, the leftmost word is the final one.

.NET implementation details

The .NET version of Ibex ships with an assembly called something like ibexshaping20.dll. The exact name will depend on the .NET framework you are using and whether your code is compiled for 32 or 64 bits (see the Ibex manual for details). This assembly contains a C++ wrapper for the Windows Uniscribe API which is used to do the shaping. This assembly is loaded using reflection, so if no right to left text will be processed by your application you do not need to deploy this assembly.

Java implementation details

To implement script shaping in Java requires the IBM ICU cross-platform Unicode based globalization library, which is distributed for free by IBM. The Jar file required is called icu4j-3_8_1.jar, it can be downloaded from http://icu-project.org/userguide/icufaq.html. The BIDI algorithm can be tested online here

Ibex attempts to load the required classes for script shaping from the classpath. If it finds icu4j-3_8_1.jar on the classpath Ibex will do script shaping, otherwise it will not. So specifying a classpath like the one below will work:

 
    java -classpath ibex-4.3.3.jar;icu4j-3_8_1.jar ibex.Run -xml arabic.fo -pdf test.pdf
 

Versions

Bidirectional text processing is available in Ibex Professional Edition, not in Ibex Standard Edition.

Bidirectional text processing is not available when using .NET Framework 1.0.

Know issues

In some cases formatting Arabic text with different applications will not produce identical results. Specifically we have seen instances where Internet Explorer and Mozilla will display the same Arabic text differently.

When using .NET, Ibex uses the Windows Uniscribe API for character shaping, so should produce the same visual results as Internet Explorer, which also uses Uniscribe.

When using Java, Ibex uses the IBM ICU cross-platform Unicode based globalization library, and so should produce the same results as Mozilla.