Sisulizer version 3 is a paid update recommended for all Sisulizer customers.
Still using Sisulizer 1.x or Sisulizer 2008/2010?
Time to update to version 3 now and profit from all new features in version 3.
Offers are for commercial and industrial customers only.
All prices are net.
Complete Price Sheet.
Not sure which edition is the right one? Visit our Edition Comparison
4/23/2012
The new build comes with many new features. [more]
11/9/2011
Sisulizer version 3 out now. [more]
9/30/2011
You are looking for tips and tricks around Sisulizer? [more]
9/8/2011
Delphi Tage 2011 in Cologne are sold out! [more]
8/12/2011
Please us a download manager for your download. [more]
Segmentation is a feature that breaks paragraphs to sentences to help the translation work. This means that a single paragraph that is a continuous text will be splitted into several sentences. Segmentation rules decide how the breaking is done. Sisulizer uses Segmentation Rules Exchange (SRX) standard to specify the segmentation rules. Choose Tools | General menu and select Segmentation sheet to view and edit segmentation rules.
For a complete set of documentation pages for SRX go to http://www.lisa.org/standards/srx/srx.html.
SRX uses regular expressions to describe the rules. Regular expressions are very powerful to describe string patterns.
For regular expression syntax documentation pages goto http://icu.sourceforge.net/userguide/regexp.html
Let's have few rule examples:
| Rule type | Before break | After Break | Language | Description |
|---|---|---|---|---|
| Break | [\.\?!]+ |
\s+ |
All | A break occurs when there is one or more period, question mark, or exclamation mark following one or more white space (space, tab or new line). For example: Skiing is fun. Swimming is fun tool. Underline shows the break pattern. |
| Break | [。\.\?!]+ |
\s+ |
Japanese | A break occurs when there is one or more Asian full stop (Unicode 0x3002), period, question mark, or exclamation mark following one or more white space. For example: 私は東京に住んでいます。東京は大きいです。 Underline shows the break pattern. |
| Exception | [a-zà-ö0-9]\. |
\s+[a-zà-ö] |
All | Disables a break when a lower case character or a number is followed by a period, one ore more white space, and one lower case character. For example: Raaka-aineena voidaan käyttää esim. vanhoja autonrenkaita. Underline shows the pattern that is not a break even some break rule would indicate so. |
| Exception | (^|[\s\(\[])Mr. |
\s+ |
English | Disables a break whenever there is Mr. abbreviation either in the beginning of the sentence or following white space, parenthesis, or bracket. For example: The British Prime Minister is Mr. Blair. Underline shows the pattern that is not a break even some break rule would indicate so. |
Sisulizer contains both generic segmentation rules (e.g. language indecent rules) and language specific rules. You can add your own rules or remove build in rules.
Segmentation is available with following source types: HTML and XML files, and database data. By default it is turned off. You have to turn it on by using source's property dialog.