Friday, 5 November 2010

XeTeX: Breaking long lines of Japanese (and Chinese) text.

While LaTeX/XeTeX is very good at hyphenating Latin words, I recently came across the problem of overflowing boxes when attempting to typeset a long paragraph consisting exclusively of Japanese characters:

母親が育児を放棄した末、大阪市西区のマンションで幼い姉弟が亡くなった事件。多くの住民が異変に気づきながら、児童相談所(児相)に通報したのは1人だけだった。複数の人が通報していれば、児相の危機感も強まったかもしれない。なぜ通報をためらった


When you render this with XeTeX, you've got a warning that your hbox is overfull, and the rendered result shows an awful truncated Japanese line:

Rendering of an non hyphen-able Japanese line
We clearly need to fix this, because most of the time your data will be in the form of a long paragraph without any linebreaks.

In XeTeX-notes.pdf, we learn that we can activate the line-breaking in XeTeX by using:

\XeTeXlinebreaklocale"en"

The locale used doesn't matter much, the important thing is that it activates line breaking for too long lines where the hyphenation mechanism cannot do its job (a long Japanese/Chinese line for instance).

By adding this to your preamble, here's what you get:

A Japanese paragraph with line breaks.
Et voilà, lines are broken so all the characters fit in the page width!

Ps: Thanks to Google news Japan for providing example data.

2 comments:

  1. Thanks for this - my very ugly-looking bibliography has been rescued!

    ReplyDelete
    Replies
    1. Glad it helps! Took me a while to find this one :)

      Delete