Update to Version 3

Sisulizer version 3 is a paid update recommended for all Sisulizer customers.

Still using Sisulizer 1.x or Sisulizer 2008/2010?

Time to update to version 3 now and profit from all new features in version 3.

Specials run until May, 31 2012

Offers are for commercial and industrial customers only.
All prices are net.

Complete Price Sheet.

Not sure which edition is the right one? Visit our Edition Comparison

Social Networks

Please click this facebook button if you want to share this page with friends

Share

(english) (german)


Please click this Google +1

button to give Sisulizer a positive vote in Googles new voting system.

Thank you very much.

Selected Customers

Software Localization News

Version 3 Build 331 released

4/23/2012

The new build comes with many new features. [more]

Top News: Version 3

11/9/2011

Sisulizer version 3 out now. [more]

Tips & Tricks

9/30/2011

You are looking for tips and tricks around Sisulizer? [more]

Delphi Tage

9/8/2011

Delphi Tage 2011 in Cologne are sold out! [more]

Download Build 321

8/12/2011

Please us a download manager for your download. [more]

Segmentation

Segmentation is a feature that breaks paragraphs to sentences to help the translation work. This means that a single paragraph that is a continuous text will be splitted into several sentences. Segmentation rules decide how the breaking is done. Sisulizer uses Segmentation Rules Exchange (SRX) standard to specify the segmentation rules. Choose Tools | General menu and select Segmentation sheet to view and edit segmentation rules.

For a complete set of documentation pages for SRX go to http://www.lisa.org/standards/srx/srx.html.

Regular Expressions

SRX uses regular expressions to describe the rules. Regular expressions are very powerful to describe string patterns.

For regular expression syntax documentation pages goto http://icu.sourceforge.net/userguide/regexp.html

Examples

Let's have few rule examples:

Rule type Before break After Break Language Description
Break [\.\?!]+ \s+ All A break occurs when there is one or more period, question mark, or exclamation mark following one or more white space (space, tab or new line). For example:
Skiing is fun. Swimming is fun tool.
Underline shows the break pattern.
Break [。\.\?!]+ \s+ Japanese A break occurs when there is one or more Asian full stop (Unicode 0x3002), period, question mark, or exclamation mark following one or more white space. For example:
私は東京に住んでいます東京は大きいです。
Underline shows the break pattern.
Exception [a-zà-ö0-9]\. \s+[a-zà-ö] All Disables a break when a lower case character or a number is followed by a period, one ore more white space, and one lower case character. For example:
Raaka-aineena voidaan käyttää esim. vanhoja autonrenkaita.
Underline shows the pattern that is not a break even some break rule would indicate so.
Exception (^|[\s\(\[])Mr. \s+ English Disables a break whenever there is Mr. abbreviation either in the beginning of the sentence or following white space, parenthesis, or bracket. For example:
The British Prime Minister is Mr. Blair.
Underline shows the pattern that is not a break even some break rule would indicate so.

Sisulizer contains both generic segmentation rules (e.g. language indecent rules) and language specific rules. You can add your own rules or remove build in rules.

Applied sources

Segmentation is available with following source types: HTML and XML files, and database data. By default it is turned off. You have to turn it on by using source's property dialog.