Sisulizer's Kaboom - The conversion utility

What is Sisulizer's Kaboom?
Sisulizer's Kaboom is a converter utility for files and Clipboard
data in text format. The convertor is essential for your daily
development and localization work. You can download and
use Kaboom for free.
- The file converter in Kaboom fully supports ANSI,
UNICODE, and DBCS code pages.
- Multi-Converter. Converts a file list in a batch
- The Clipboard converter fully supports UNICODE code pages.
- The Clipboard converter's text input field is available for
the current ANSI codepage (8-bit) of your Windows installation.
- Support for drag'n'drop of files and text.
- NEW in 3.0: Support for plug-in filters on script basis. Please contact
sales
(at) sisulizer.com
if you have a custom filter request.
Kaboom is a classic Visual Basic application with multi-language
string resources localized with Sisulizer. Currently, Kaboom has
English and German strings. Your operating system chooses the
language displayed at startup.
Please donate and get the command-line option
If you like Sisulizer's Kaboom you can support the development of new features
with a donation. Donate what ever it is worth for to you. (A common donation is between USD 15 and USD 25)
As a little
thank you we send you instructions how to use
Kaboom's command-line option. With this instruction
you are able to use Kaboom as command-line tool, eg to convert multiple files out of other
programs, or batch files. Please be aware that sending the instructions to you is a manual process and
can be done while our business hours.
Download
| Product |
Format |
Date |
Size in MB |
| Kaboom 3 |
Setup-EXE |
05/26/2009 |
2 |
System Requirements
Sisulizer's Kaboom needs an operating system with full UNICODE
support like Windows 2000, Windows 2003, Windows XP, Windows
Vista, Windows 7 or better.
Online Manual
After installation, Kaboom is ready to use. You can start it
from the Windows Start menu. On opening, Kaboom shows its main
menu with three entries.

- File Converter. This opens the file converter
in Kaboom.
- Clipboard Converter. This opens the Clipboard
converter
- NEW in 2.5: Multi-Converter. This menu entry opens the batch mode of Kaboom.
You can close Kaboom with the red x-button in the title bar. The F1 key opens
the manual.
File Converter
The file converter allows you to convert text files stored in
one code page into another. You can convert a file written on
a Japanese computer using Shift-JIS into a UNICODE file. Kaboom
checks which conversions are available on your computer and offers
these for your use.
Converting a file

Converting a file with Kaboom is simple. Just follow these steps:
- Select the text file in the foreign code page with the three-dots
button "...". Kaboom allows you to browse and find the file
you want to convert.
- If your file is marked with a BOM (Byte-Order-Mark), then
Kaboom selects the correct code page for you. Most files you
convert will not have a BOM. Therefore, you must select the
correct code page with the Code page group
and Code page options for the source file.
The sorting of the code pages should help you to find the correct
one.
- In the preview, you can see right away if you selected the
correct code page.
- Sometimes files have additional unique specifications. For
example, the line feed encoding of UNIX and Macintosh files
is different. The Additional Filter feature
takes care of this. BASE64 encoding is sometimes used in e-mails.
- Kaboom creates a name for the target file for you. You can
use the three-dots "..." button if you want to choose a different
name for "Target Filename".
- Using the Code page group and Code
page options, you can select the code page of the target
file.
- If the target code page is UTF-7, UTF-8, UTF-16-LE or UTF-16-BE,
then the target file can have a BOM (Byte-Order-Mark). The checkbox
allows you to write a BOM or not.
- You can now click the Convert button to start
the conversation.
Attention: Not all conversions make sense! For
example, if you convert an 8-bit file written with a Cyrillic
code page like KOI8 into an 8-bit file for code page 1252 (Windows
Western), the information will be lost. However, you can convert
the file into a Cyrillic file for the Macintosh (Code page 10007)
or a UNICODE format like UTF-7, UTF-8, or UTF-16. The UNICODE
formats are always a good choice because they can hold up to 65535
different characters, while ANSI files can only have 256 different
characters.
Background Info
|
What is a code page and why
is it needed?
Code pages are necessary because ANSI files only have 8 bits to display a character (char).
This means there are only 256 possible characters--not nearly enough for all languages
of the world.
The American charset needs only 128 different chars = 7-bit. Because 7-bit
was a bit inefficient for computers, this led to the need for another
bit; thus, currently, another 128 possibilities are available to display chars.
On MS-DOS systems, some of these bits have been used for drawing boxes and lines. With
Windows, these boxes and lines have been removed from the charsets and
more foreign chars have been added. For the most Western languages
like English, French, German, and others, these additional chars work efficiently. For example, the German charset
needs only seven extra chars to the US charset - leaving enough space
for special chars from Spain, Norway, and so forth.
However, for certain charsets, such as Cyrillic charsets, the space was not big enough. Codepages
fill that gap. A code page in Windows is nothing more than
information, so that the upper 128 chars use some other characters. For example, instead
of the German umlaut Ü, a Cyrillic Ш appears. both of these items have
the ANSI value 205. Thus, if the Windows codepage 1252 is selected, a Ü
appears, while with the Russian Windows codepage 1251 Ш (sha) is displayed.
If code pages are used, the system cannot possibly show Ü and Ш on the same
display. This is only possible if UNICODE is used. For example, this page uses
UNICODE (UTF-8) to display both chars.
While this solves the problem for most of the languages, the code page technique
does not help languages with more than 128 special characters, such as
Japanese, Korean and Chinese. For these languages, DBCS is available.
While the lower 128 characters are still the same as in US code pages, the upper
128 are specially encoded. In this system, one character of the upper 128 chars starts
a multi-byte sequence. This means that one character is stored in one or many
chars. For example, in Japanese shift-jis, one character can use up to five bytes.
Thus, if a person writes a text file on her or his computer and does not
use UNICODE to save it, the current code page is used. If this file
is given to someone with some other current codepage, the file is not displayed
correctly. So, if you are in Western Europe or the USA, and you get a text file
from someone in Greece, Turkey, China, or Japan, the chances are high that
the file is useless to you. Kaboom can fix these problems. Simply convert
the file into UNICODE and print, edit, or use the file in any way--without losing
information. If you edit the file and you want to return it with your changes,
simply convert the file back into the code page that the receiver needs. Kaboom makes the entire process easy and quick.
|
Background Info |
What is a BOM?
BOM is an acronym for byte-order-mark. BOM describes the order in which
a sequence of bytes is stored in computer memory. The acronym is stored
at the beginning of a text file to tell the reading application the order
in which the bytes are organized, as big-endian or little-endian. BOM also
indicates if a character is stored in 16- or 32-bit UNICODE. And, the BOM
is also used to mark UFT-7 and UTF-8 files. These files are 8-bit files
that use a code to store 16-bit characters. Therefore, the name BOM for
these kinds of files is a bit misleading. While it is convenient to know
the file format, a BOM can be used to mark the format inside the file.
If a file is read by an application not aware of BOMs, the system shows
the characters used to sign the file as data. In this case, you can use
Kaboom to read a file with a BOM and convert the file into a file without
a BOM.
|
Background Info
|
What is little-endian and big-endian?
There are two types of byte-ordering: big- and little-endian. Intel processors
use the little-endian order; this means the more significant digits
in a number are on the right side. If we write a number like 4711, the
most significant digit is 4 (= 4.000) and is on the left side. A BOM (Byte
Order Marker) in text files indicates to the application the direction
to read the numbers.
|
Clipboard
Converter
The Clipboard converter knows the following conversions and filters.
Converting Clipboard data

Converting a text with Kaboom is simple. Just follow these steps:
- Select the text you want to convert in some other application,
such as Windows Notepad, and use the shortcut Ctrl+c to copy
it to the Windows Clipboard.
- Switch to Kaboom's Clipboard converter and, from the Source
list, select one of the Clipboard sources.
- In the Filter group and Filter lists,
select the conversion you want.
- Click the Convert button.
- The converted text appears in the Target
area. If you have select Clipboard in the Target
list, the result is already copied to the Clipboard. If you
select Preview, you can click the Copy
button to transfer the result to the Clipboard. Copying to the
Clipboard also changes the content of the Source
area.
- Select the application where you want to paste the text, such
as Notepad.exe. You can press Ctrl+v on the keyboard to paste
the converted text into Notepad.
If you want to manually enter text into Kaboom, you must select
Input Field (ANSI) in the Source
list. Now, Kaboom can convert what you type into the text field
if you click the Convert button. If you want
to add text to existing text in the source text field, you should
use the standard Windows shortcut for the Clipboard like the Ctrl+v
keys on the keyboard. You can only use your current code page
in the input field. If you want to use UNICODE, please use select
one of the Clipboard types from the Source list.
Available filters in Kaboom
Char Filters
Clean Up String
Replaces white chars (characters) from a string with underline
chars _.
Lower Case
Changes all upper chars to lower chars.
Make Caps
Makes the first char of every word in the string upper case.
Remove White Chars
Removes all punctuation and other chars from the input. In Kaboom,
white chars are the following chars: <Blank><Tab><CR><LF>,;:./(){}[]<>+-~#*&%$§!=\'"
Tabs to Blanks
Changes tab chars into blank chars.
Upper Case
Changes all lower case chars to upper case chars.
Checksums
CRC16
The filter calculates the CRC16 checksum for the string in the
source field.
CRC32
The filter calculates the CRC32 checksum for the string in the
source field.
Internet Checksum
The filter calculates a so-called Internet checksum for the string
in the source field.
Code page Filters
Char to OEM
Converts a string from an ANSI char set into a char set used
in a DOS session.
UTF-16 to UTF-7
Converts a text in UTF-16 into UTF-7. The target field shows
the escape chars used in UTF-7 instead of interpreting them.

UTF-16 to UTF-8
Converts a text in UTF-16 into UTF-8. The target field shows
the escape chars used in UTF-8 instead of interpreting them.
OEM to Char
Converts a string from the char set used in a DOS session into
ANSI char set.
UTF-7 to UTF-16
Converts a text using UTF-7 escaped into UTF-16. The source field
shows the escape chars used in UTF-7, instead of interpreting
them.
UTF-8 to UTF-16
Converts a text using UTF-8 escaped into UTF-16. The source field
shows the escape chars used in UTF-8, instead of interpreting
them.
Code page Finder
This group does not contain classic filters. The functions here
are service functions to find a code page for a number and vice
versa.
Code page Name from Code page Number
This function finds the code page number used by Windows for
a code page name; for example, "shift_jis" or "shift-jis" results
in 932. For some code pages, Kaboom knows more than one name
Code page Number from Code page Name
This function finds the code page name used by Windows for a
code page number; for example, 932 results in "shift_jis". While
there can be more than one name for one code page number, Kaboom
returns the name used in the headers of Mime or HTML-files.
Filenames
This group is also not a classic filter. Nevertheless, the functions
can be sometimes handy in your daily development work.
Calc full filename
This filter can convert a filename like
c:\windows\system32\..\..\test\test.dat
into
c:\test\test.dat.
Long Filename to Short
Modern Windows uses long filenames. However, sometimes the short
8.3 filename representation is needed. This function finds the
short filename.
Path with Drive to UNC
This filter finds the UNC representation of a network path using
a drive letter.
Short Filename to Long
Modern Windows uses long filenames. Sometimes the short 8.3 filename
is given. This function finds the long filename.
Hex Decoder
Hex-Stream
This filter changes a string with hexadecimal numbers into characters.
Hex Encoder
Hex-Dump
This filter changes the character char values into their hexadecimal
representation or vice versa. The output is formatted in columns
and rows so a human can easily read them. There is no decoder
for this format.
Hex-Stream
This filter changes the character char values into their hexadecimal
representation.
Internet Decoder
International Domain Names (IDNA/PunyCode)
There is a new standard for using special chars in URLs called
IDNA, if you want to register a domain name having special chars,
like Japanese, Spanish or French accents or German umlauts. You
can use this filter to remove the computer coding and see the
text in human text. Please be aware that this part of Kaboom is
ANSI-based. Some IDNAs from China do not render correctly on some
Western computers and vice versa.
Internet Encoder
International Domain Names (IDNA/PunyCode)
There is a new standard for using special chars in URLs called
IDNA. If you want to register a domain name having special chars,
like Japanese, Spanish or French accents or German umlauts, you
can use this filter to get the actual text to register. You can
use only special chars your actual system allows for display in
your current ANSI char set.
Mail Data Base64
Base64 encryption is sometimes used in the body of e-mails.
Mail Data Quoted Printable
Quoted printable is found in the body part of e-mails. QP encodes
special chars in a way that they can be transported as 7-bit ANSI.
Mail Header Quoted Binary (RFC1522)
Binary (Base64) encoding is found in the header part of e-mails.
QP encodes special chars in a way that it can be transported as
7-bit ANSI.
Mail Header Quoted Printable (RFC1522)
Quoted printable encoding is found in the header part of e-mails.
QP encodes special chars in a way that it can be transported as
7-bit ANSI.
URL
A URL in the browser encrypts special chars; for example, <Blanks>
become %20. Some spammers try to use this to deceive you. If you
see a URL encoded this way in your e-mail, you will not know where
it links to. Kaboom can decrypt this for you.
CERs and NCRs into Chars
This changes Character Entity References (CER) and Numeric Character
References (NCR) into UNICODE chars. CER and NCR are used in HTML
to describe special characters like umlauts, accented chars, or
signs like < > & and so on. Kaboom supports all 252
for UNICODE defined CERs, like Ü = Ü, ™ = ™,
Á = Á, Ω = Ω, etc. This filter can be used
to decode strings encoded with AntiHarvest, Character Entity Reference
(CER), and HTM Character Entity Reference (CER) to their originals.

Internet Encoder
AntiHarvest (complete NCR)
AntiHarvest changes every char in the input field into Numeric
Character Reference (NCR). NCR is used in HTML to describe special
characters like umlauts, accented chars, or signs like < >
& and so on. Usually, only the special chars are encrypted
as NCR. The AntiHarvest filter encrypts all chars of the string.
The result can be used for links to e-mail addresses on web sites.
This helps to protect you e-mail address from e-mail harvesters
visiting your web site to grab e-mail addresses. The grabbed addresses
will be used to send spam to your postbox.
International Domain Names (IDNA/PunyCode)
There is a new standard for using special chars in URLs called
IDNA. If you want to register a domain name with special chars,
like Japanese, Spanish or French accents, or German umlauts, you
can use this filter to get the actual text to register. You can
use only special chars your actual system allows to display in
your current ANSI char set.
Mail Data Base64
Base64 encryption is sometimes used in the body of e-mails.
Mail Data Quoted Printable
Quoted printable is found in the body part of e-mails. QP encodes
special chars in a way that it can be transported as 7-bit ANSI.
Mail Header Quoted Binary (RFC1522)
Binary (Base64) encoding is found in the header part of e-mails.
QP encodes special chars in a way that it can be transported as
7-bit ANSI.
Mail Header Quoted Printable (RFC1522)
Quoted printable encoding is found in the header part of e-mails.
QP encodes special chars in a way that it can be transported as
7-bit ANSI.
Character Entity Reference (CER)
This changes special chars in the input to their character entity
reference (CER). CER is used in HTML to describe special characters
like umlauts, accented chars, or signs like < >, &,
and so on. Kaboom supports all 252 characters for UNICODE-defined
CERs, like Ü = Ü, ™ = ™, Á = Á,
Ω = Ω, and so forth.
HTML Character Entity Reference (CER)
This changes special chars in the input to their character entity
reference (CER). CER is used in HTML to describe special characters
like umlauts, accented chars, or signs like < >, &,
and so on. Kaboom supports all 252 characters for UNICODE-defined
CERs, like Ü = Ü, ™ = ™, Á = Á,
Ω = Ω, and so forth, making it usefull for HTML files.
URL
A URL in the browser encrypts special chars; for example, <Blanks>
become %20. Some spammers try to use this to deceive you. If you
see a URL encoded this way in your e-mail, you will not know where
it links to. Kaboom can decrypt this format for you.
Line Feeds
CR to CRLF / CRLF to CR / CRLF to LF / LF to CRLF
Different operation systems have different new line definitions.
While Windows uses CRLF (Carriage Return plus Line Feed), UNIX
uses only CR. Sometimes you must work with a UNIX document where
everything seems to be printed in one line in Windows Notepad.
These filters solve the problem.
CRLF to <BR>
This filter changes every new line into an HTML <br>-tag.
CRLF to Blanks
This filter changes every new line into a single blank char ("
").
Other Filters
RLE Encode/Decode
This is a simple running length encoding. If a string contains
the same chars in a row, this encoding shrinks the string.
ROT13
Encrypts a string in a way that a human cannot read it. If you
use the function twice, the effect is reversed.
Soundex
This is not a classical filter. Soundex calculates the "Soundex"
value of a text. Texts with the same Soundex value sound similar
if spoken.
Strip Tags from HTML
Removes tags from HTML and returns the plain text information.
Get HTML
If Kaboom finds HTML format in the Clipboard, Kaboom shows the
complete data on the Clipboard. This contains some header information
and surrounding HTML data. Kaboom can use this header information
to extract the HTML from this data.
Get HTML-Fragment
If Kaboom finds the HTML format in the Clipboard, Kaboom shows
the complete data on the Clipboard. This contains some header
information and surrounding HTML data. Kaboom can use this header
information to extract the HTML fragment from this data.
Copy
Copy does nothing other than copy the source to the target. This
is useful if the HTML format is found on the Clipboard and you
want to get the data, including all headers. Most applications
remove the headers when pasting the data. (See also Get HTML and
Get HTML-Fragment.)
NEW in 2.5: Multi-Converter

Basically the multi-converter works like like the file converter. The difference is that you can drop multiple files to the
list box and then convert them in a batch.
The settings are global
for all files in the list with one exception. The code page
setting of the source file is overwritten by the (optional) Byte-Order-Mark
of a
file or from the (optional) Content-Type of a
HTML file. If none of these exist the code page setting for the
source file is used. The code page actually used to read a specific
file can be found in parentheses
behind its file name.
For a description of the additional filter please read the section about the file converter.
|