FREE SOFTWARE FOR DOS — Text Utils

Free Software for DOS
Text Utilities – 4
Spellers, Dictionaries, Text Analysis, Characters

21 Aug 2006

Global Menu:

Go back to Front Page Menus

Go to top of Text Utils – 1
Go to top of Text Utils – 2
Go to top of Text Utils – 3
Go to top of Text Utils – 5

This page:	ASCII TEXT SPELLCHECKERS
	WORD LISTS AND DICTIONARIES
	WORD COUNT & TEXT ANALYSIS
	ASCII CHARTS
	CHARACTER TRANSLATION AND STRIPPING

Page 1:	GENERAL TEXT VIEWERS
	SMALL / TINY TEXT VIEWERS
	TSR (POPUP) TEXT VIEWERS
	TEXT VIEWERS FOR PROGRAMMERS
	UNIX `man` AND `info` FILE VIEWERS
	COMPILE TEXT TO EXE

Page 2:	PROCESS, FORMAT, FILTER PLAIN TEXT
	FILE SORTING
	DUPLICATE-LINE FILTERS
	TEXT JUSTIFY

Page 3:	SEARCH AND REPLACE
		sed – stream editor
	SEARCH ONLY
		grep – global regular expression print
	LINE KILL / REPLACE
	FILE COMPARE / DIFFERENCE

Page 5:	FILE FORMAT CONVERSION
		UNIX < > DOS
		OTHER CONVERSIONS
	POSTSCRIPT AND PDF: View, print, convert

ASCII TEXT SPELLCHECKERS

International Ispell (1) — Interactive text and HTML spell checker.

unrated

[added 1998-07-03, updated 2005-12-09]

Ispell, an interactive spell checker developed for Unix platforms, can be used as a standalone program or as an external checker for many power editors. This version includes English dictionaries (UK & US), and runs in text or HTML mode. 32-bit DJGPP build, requires 80386+ and a DOS Protected Mode Interface (CWSDPMI or other).

From the program help:

Whenever a word is found that is not in the dictionary,
it is printed on the first line of the screen. If the dictionary
contains any similar words, they are listed with a number
next to each one. You have the option of replacing the word
completely, or choosing one of the suggested words.

Commands are:

 R       Replace the misspelled word completely.
 Space   Accept the word this time only.
 A       Accept the word for the rest of this session.
 I       Accept the word, and put it in your private dictionary.
 U       Accept and add lowercase version to private dictionary.
 0-n     Replace with one of the suggested words.
 L       Look up words in system dictionary.
 X       Write the rest of this file, ignoring misspellings,
         and start next file.
 Q       Quit immediately. Asks for confirmation.
         Leaves file unchanged.
 !       Shell escape.
 ^L      Redraw screen.
 ^Z      Suspend program.
 ?       Show this help screen.

Authors: Geoff Kuenning et al. Port by Eli Zaretskii, Israel (2001).

2005-05-14: v3.3.01.

Downloads
Binaries, manual	isp3301b.zip	(978K)
Source	isp3301s.zip	(721K)

Geoff Kuenning's International Ispell Home Page.

International Ispell (2) — Interactive spell checker, supports 8-bit characters.

unrated

[added 1998-04-06, updated 2005-04-16]

This EMX/gcc-compiled DOS & OS/2 port minimally requires a 386 PC, but I'd recommend a fast 486 or Pentium with at least 8MB RAM and a disk cache. The package is a very large download, containing executables, source, and multiple language dictionaries (Dutch, English, French and German). The compiled English dictionary requires about 4.7MB disk space (contains at least 210,000 unique words including many technical and scientific terms). Supports 8-bit characters. Supports maintenance of a user ("private") dictionary, which by default is stored in the root directory with the filename _english. All in all, I like the comprehensiveness and "intelligence" of this ISPELL. The program itself loads slowly on a Pentium 60 (w/ 8MB RAM), and is much too slow on a 386/20 (8MB). Requires ANSI.SYS or equivalent, and DOS extender (included). I wouldn't waste time downloading this package unless you're willing to invest a _little_ time with setup. Package includes C source code.

Core commands are same as for v3.3, above.

Authors: Geoff Kuenning et al. (1983-1997). Port by Piet Tutelaers, Netherlands (1997).

1997-08-15: v3.1.20.

Download ispellw32.zip (2.5MB).

Geoff Kuenning's International Ispell Home Page.

GNU ispell — Interactive spell checker, runs well on older PCs.

unrated

[added 1998-04-06, updated 2005-04-16]

This old, but widely distributed 16-bit ispell includes only an English dictionary (38,000 words / 156K on disk). Run the program without parameters to check a single word, or pass it a filespec and it will enter a line-by-line interactive check / correction mode. It can check multiple files in sequence if you pass it a wildcarded filespec. The package lacks usage documentation (but see Downloads, below) and unless you're familiar with ispell, you could end up frustrated. Just hit the "?" key when inside the program (or start with ispell ?) to get the list of navigation commands. Easy to use. I'm sure there are additional hidden features, but I haven't used it much. Runs briskly enough on a 386/20.

Commands are:

 R       Replace the misspelled word completely.
 Space   Accept the word this time only
 A       Accept the word for the rest of this file.
 I       Accept the word, and put it in your private dictionary.
 0-9     Replace with one of the suggested words.
 <NL>    Recompute near misses.  Use this if you interrupted
         the near miss generator, and you want it to
         return to this word.
 Q       Write the rest of this file, ignoring misspellings,
         and start next file.
 X       Exit immediately.  Asks for confirmation y/n.
         Leaves file unchanged.
 !       Shell escape.
 ^L      Redraw screen.

To exit single-word mode, type ^C. Package includes the Look utility.

Capabilities which are absent in GNU ispell vs Internatiional Ispell: GNU's is not case sensitive, suffix handling is more primitive and it won't allow non-alphabetical characters into the dictionary.

Authors: Pace Willisson (1988). Port by Pavel Ganelin (1993).

1993-10-26: v4.0 (despite the version number, this is older than the Unix-based versions 3.x).

Downloads
Binaries	ispel40x.zip	(260K)
Source, full docs	ispell-4.0.tar.gz	(379K)

JSPELL — Excellent interactive spell checker (English dictionary).

* * * * *

[added 1998-09-17, updated 1998-10-25]

When considering both ease-of-use and versatility, you won't find a better choice than JSPELL. Note: JSPELL may not run on some faster Pentiums (divide overflow error) – use SLOWDOWN to avoid the error.

Simple to use, mouse compatible interface. Runs on pre-386 machines too.
Unlimited input file length; Max line length 512 characters.
American English dictionary included. See docs for discussion of support of other language dictionaries.
Unique:
- Multiple UNDO operations allowed (up to 400); Last action displayed at bottom of screen.
- Ability to handle file specific dictionary and multiple user dictionaries.
- Includes powerful yet friendly dictionary manager program.
- Network support.
- TeX support.
Configuration file allows:
- Choice of XT or AT keyboard.
- CGA,EGA,VGA or Hercules video cards.
- Define default dictionaries.
- Backup option.
- Define minimum word length.
- more...
Excellent built-in help and documentation.
"Added a feature that can omit lines starting with > or any other string specified by the user in the configuration file jspell.cfg. This feature is useful in spell-checking a reply to an email message."
Freeware: Older registration code file is not needed.

Author: Joohee Jeong (1998). Suggested by Robert Bull, Scott Nesbitt.

1998-10-21: v2.11.

Download jspel211.zip (209K).

SpellTest — Spell checker for plain or html text; interactive mode or file report (English dictionary).

unrated

[added 1999-04-18, updated 2006-03-14]

This speller could be particularly useful to web authors because it ignores HTML codes in documents during a spell check. SpellTest can run in two modes: 1. A simple interactive mode allows manual replace of unknown terms – but has no features like "ignore all" or "add to custom dic"; 2. SpellTest probably functions best as a report-to-file speller. Reported terms are referenced by original document line numbers. No limit on text file sizes. Includes a large 2MB dictionary and user dictionaries are supported. Requires a fast 80386 (80486-100MHz recommended), and about 2MB RAM (4MB recommended).

Usage : spelltst.exe <file> <options>
Options:
       -r:<report name> , by default report.txt
       -n Dont load addishional dictionaries.
       -o Online error fixing. (Ascii text files only).
       -nr Dont create a report file.

Author: Oleg Stepanyuk / Oddin Software, Russia (1999).

1999-02-21: v1.0.

Download spelltst.zip (972K).

GDSPELL — Interactive spell checker handles big files. (English dictionary)

* * *

[updated 2005-04-10]

GDSPELL is an easy to use standalone spell checker from the developers of the freeware NE editor (also included here). Both programs use the same dictionary, so you don't need to clutter your hard disk with different dictionaries. Unlike NE, GDSPELL can check big files, and create and use a custom dictionary. Spell checking dialog is similar to those found in popular word processors.

Limitations:

Although this program can handle lines that exceed 80 columns it fails under a specific condition: If you manually correct the spelling of a word which generates a longer word (e.g., hzrdous to hazardous) – and this increased length pushes other words off the right edge of the display, GDSpell will not correctly analyze those words at the right edge.
English dictionary only.

EXE size: 55K; Dictionary size: 370K

(Thanks to Yves Bellefeuille's freeware list for pointing me to this one).

Author: G.D. Davis (1995); distributed by GDSoft.

1995-06-01: v3.00b.

Download gdsp300b.zip (414K).

Tschek — Spell checker outputs list of all misspelled words to screen or file.

* * *

Most of us use word processors or stand-alone dialog spell checkers to perform "on-the-fly" spell checking and correction (e.g., GDSpell). But sometimes these spell checkers can be cumbersome and time consuming because they prompt word by word. If you are spell checking an HTML or technical document with a "dumb" spell checker, this can be tedious. Of course, you could add all the those strange words or tags to a user dictionary, but that's no fun either. Or, you could use a spell checker that simply outputs a list of unrecognized words to a file without any prompting or correction. You can browse the output file, quickly locate words that are obvious typos, and manually correct the original document (e.g., using a search / replace tool).

Features:

Small package: 20K exe plus 145K main dictionary.
Manually edit the ASCII dictionaries, or create new ones (dicts are plain text files)
You can create custom batch files to check documents with multiple dictionaries.
Can output word frequency stats for a document.
Simple syntax: SPELL DictionaryFile InputFile [OutputFile]
I've sometimes used Tschek to spell check these web pages (after converting to plain text first). It's been useful when dealing with unusual program names and surnames of authors. I regularly add reported terms that aren't misspellings to a plain text user dictionary (sorted according to Tschek rules) so that they don't get reported again. It's not foolproof because I have to determine which reported words are actually misspellings, but it works well enough and is relatively quick.

Limitations:

Text to be checked can have no more than 8000 unique words. (Not an issue for the vast majority of documents).
Included dictionary is not large (about 15000 words).
Not case sensitive (output is lower case).
Size of usable dictionary is limited (20,000 words max). If you happen to possess a very large compatible dictionary, you could split it into smaller dictionaries and then do sequential checks using a batch file. To use Tschek with SIL's Word List try this:
1. Extract and join the four lists into a single 1.2 MB file (e.g., use a CAT utility or the COPY /a command).
2. Split the big file into 6 dictionary files of about 190K each – manually, or using a utility like FCUT (Tschek won't use a dictionary containing more than 20,000 words – about 200K.) If you use a file splitter make sure the first and last lines of each resulting file contain whole words – not broken words. Rename new files – e.g., 1.dic, 2.dic...6.dic. and place them in the same directory as SPELL.EXE.
3. Write a batch file similar to the following, which performs multiple sequential spell checks using SPELL.EXE and the new dictionary files:
```
BIGSPELL.BAT

rem USAGE: bigspell any.txt
@echo off
spell 1.dic %1 1.tmp /b
spell 2.dic 1.tmp 2.tmp /b
spell 3.dic 2.tmp 3.tmp /b
spell 4.dic 3.tmp 4.tmp /b
spell 5.dic 4.tmp 5.tmp /b
spell 6.dic 5.tmp misspell.txt /b
del *.tmp
echo Spell check complete. See MISSPELL.TXT
```

Author: Timo Salmi, Finland (1996).

1996-03-02: v1.5.

Download tschek15.zip (68K).

More in these pages from Timo Salmi.

Look — Look up words (from a word list) to verify spelling.

unrated

[added 2001-10-21, updated 2005-04-16]

Look is not a spell checker but rather lists words from a word list file that most closely match a string (i.e., useful for looking up an uncertain spelling). Look is included in some ISPELL distributions but here it is listed separately to bring more attention to it.

Look.exe appears to work a lot like grep (in fact, it requires grep/egrep/fgrep for -r option). However, it has certain conveniences for looking up words in a spelling list. With no options, look searches a word list file for all words that start with the first characters of the string you give it. Options allow it to ignore caps or small letters, use bona file regular expression wildcards, and use dictionary order. Look is meant to be used within editors like vi that allow you to run external programs. It can also be used on the command line.

Look appears to be happy using any ASCII spelling list, such as SIL's Word List, or Moby Words (users can add, remove, or modify words in such lists with any text editor). By default, look uses a word list named ISPELL.WOR (included), but you can supply a different file as an option.

usage: look [-dfr] string [file]
 -d  dictionary order: consider only letters, digits, and spaces
 -f  fold upper case to lower
 -r  string is a regular expression

Note: Use of regular expression switch -r requires the programs grep/egrep/fgrep (not included) in path.

Suggested by Howard Schwartz.

Look.exe is part of the GNU ispell binary and source packages, above.

WORD LISTS AND DICTIONARIES

Moby Words — English word, name, and phrase lists; 610,000+ entries (ASCII).

* * * * *

[added 2001-10-21, updated 2004-06-29]

Moby Words is part of the Moby Project, a large collection of lists of words and phrases, and works of literature (contents are now in the public domain).

Partial contents of Moby Words:

Over 354,000 single words, excluding proper names, acronyms, or compound words and phrases. This list does not exclude archaic words or significant variant spellings.
74,550 common dictionary words. A list of words in common with two or more published dictionaries. This gives the developer of a custom spelling checker a good beginning pool of relatively common words.
4,946 female names. Frequent given names of females in English speaking countries.
3,897 male names. Frequent given names of males in English speaking countries.
21,986 names. This database contains the most common names used in the United States and Great Britain. Spelling checkers may want to supplement their basic word list with this one.

Author: Grady Ward (1996).

Downloads
Moby Words	mwords.tar.z	(4MB)
Moby Project	moby.tar.z	(26MB)

Get more info at the Moby Words page.

If you don't like a 26MB download, go to the Moby Project page for smaller pieces.

SIL's Word List — ASCII English, 110,000 words, can function as dictionary for some spellers.

unrated

[updated 2004-07-02]

Four text files contain approximately 110,000 English words total. The set can be used as a large dictionary for spellers that can use ASCII-only dictionaries. See Tschek for an example.

From the doc:

This word list includes inflected forms, such as plural nouns and the -s, -ed and -ing forms of verbs. Thus the number of lexical stems represented in the list is considerably smaller than the total number of words.

Author: Evan Antworth / SIL International (1991).

Downloads
A–D	words1.zip	(95K)
E–K	words2.zip	(75K)
L–R	words3.zip	(99K)
S–Z	words4.zip	(85K)

Jorj — English dictionary program.

* * *

Jorj is a stand-alone dictionary program, with two EXE variants (compiled for different memory usage) in one package. Jorj can be run in memory resident (pop-up) or non-memory resident modes. One of the provided executables ("Omega") will use XMS memory when the program is run as a TSR.

One unique feature of Jorj is its ability to search for entries even when your spelling is incorrect. Jorj also has a "word scan" feature that will list all entries containing a given search string. The lexicon has some significant drawbacks. The word list is small but adequate (larger in registered version) and definitions are brief – and not authoritative. Words are syllabified, but parts of speech are lacking. Even with these shortcomings, JORJ still serves as a handy reference.

EXE size: 35K (alpha) or 64K (omega). Dictionary size: 1.2MB.

Author: George Fredal / Jorj Software (1997).

1997-01-01 release.

Download jorj97.zip (652K).

WORD COUNT & TEXT ANALYSIS

Also see UXUTL or the GNU Textutils for UNIXish WC.

WCNT — Count and analyze word frequency in text and HTML documents.

* * * *

One of the more comprehensive "word count" programs I've encountered. It includes a host of options: Can analyze HTML documents (ignores tags in word counts). Count of lines, characters, non-whitespace characters, words, distinct words and unique words. Average length of words, distinct words and unique words. Sorted word lists with frequencies. Word length distribution histograms. Configurable word sets. DOS code page awareness. Multiple filespecs with wildcards: Outputs combined statistics of all files when passed a filespec with wildcards. Donationware.

Author: Branko Radovanovic, Croatia (1997).

1997-04-23: v1.20.

Download wcnt120.zip (20K).

wc — Simple word count program, from Unix.

* * *

A DOS clone of the Unix wc utility with some added features. Unlike WCNT (above), wc: 1) lists individual file stats when passed a filespec with wildcards; 2) can read from standard input as well as from files. wc also generates error level values for use in batch files.

Author: Roman Nurilov (1997).

1997-08-06: v1.1

Download wc_11.zip (10K).

Word Count (WC) — Word counter also counts sentences, calculates readability index.

* * *

[updated 2005-04-16]

Another word count program that can optionally count sentences and generate a rough and ready "readability" index based on a combination of word length and sentence length. Can read from standard input as well as from files.

Author: Bob Ferguson, Netherlands (2000).

2000-02-21: v2.4.

Download wc24.zip (15K).