Free Software for DOS
Text Utilities – 2
Text Formatters & Filters

21 Aug 2006

Global Menu:
Go back to Front Page Menus

Go to top of Text Utils – 1
Go to top of Text Utils – 3
Go to top of Text Utils – 4
Go to top of Text Utils – 5




This page:
PROCESS, FORMAT, FILTER PLAIN TEXT

FILE SORTING

DUPLICATE-LINE FILTERS

TEXT JUSTIFY

Page 1:
GENERAL TEXT VIEWERS

SMALL / TINY TEXT VIEWERS

TSR (POPUP) TEXT VIEWERS

TEXT VIEWERS FOR PROGRAMMERS

UNIX man AND info FILE VIEWERS

COMPILE TEXT TO EXE

Page 3:
SEARCH AND REPLACE

sed – stream editor

SEARCH ONLY

grep – global regular expression print

LINE KILL / REPLACE

FILE COMPARE / DIFFERENCE

Page 4:
ASCII TEXT SPELLCHECKERS

WORD LISTS AND DICTIONARIES

WORD COUNT & TEXT ANALYSIS

ASCII CHARTS

CHARACTER TRANSLATION AND STRIPPING

Page 5:
FILE FORMAT CONVERSION

UNIX < > DOS

OTHER CONVERSIONS

POSTSCRIPT AND PDF: View, print, convert

PROCESS, FORMAT, FILTER PLAIN TEXT

Listed below are both "all-in-one" multi-filter programs and packages containing multiple, single-purpose filters.

Also see sed — Versatile text filter.


awk, gawk, mawk — Powerful text processor, ported from Unix.

* * * * *

[added 1999-03-07, updated 2005-03-28]

Reviewed by Howard Schwartz 1999-02-28

Unix comes standard with a set of programs that can do just about anything imaginable with a text (i.e., ASCII) file. In rough order of complexity they are:

By default, awk reads files, a line at a time, checks each line to see if it matches a pattern, and processes each matching line according to a script of commands. The pattern can be a word, phrase, regular expression, or complex expression. Commands are similar to the C programming language, and have the typical form:

/regular expression/ {one or more commands}

By default, awk keeps track of the line number of each line, counts the number of words in each line, and numbers the words so they can be referred to, like "positional parameters" in a DOS batch file. Thus, awk easily rearranges columns, or words in a line. For instance, the command:

/^[A-Z]/ {print $2, $1, $3}

will reverse the first and second words of each line whose first letter is a capital letter. This might, for instance, reverse the first and last names of lines that are part of an address book.

Other features of awk that may be useful to general users:

There are quite a number of freeware versions of awk. Among the best are GNU's, usually called "gawk". They have been under revision and development for quite some time, and come with several very useful extensions. Another version is mawk, based on Unix and POSIX models.

Happily, GNU has put out a free awk manual, available in a number of formats, including HTML online for the latest version. This manual is both comprehensive and easy to understand. The typical "man page" that comes with awk (even gawk) appears to contain English words, but is all but a secret code to most "non-Unix fluent" human beings.

2005-03-21: GNU gawk v3.1.4, with new networking features. 32-bit DJGPP build, requires 80386+ and a DOS Protected Mode Interface (CWSDPMI or other). Authors: Jürgen Kahrs, Arnold Robbins, David Trueman for GNU. Port by Juan Manuel Guerrero, Germany (2005).

2000-12-06: Gnuish project's gawk v3.0.6. 16-bit DOS & OS/2, 32-bit DOS, and Win32 executables in one package. Get this only if you need a 16-bit EXE for 80286 or older CPU. Authors: Port by Conrad Kwok, Scott Garfinkle, Scott Deifik (2000).

1996-02-04: mawk v1.2.2. 16-bit EXE for DOS & OS/2. Runs fast. Author: Michael Brennan (1996).

Downloads 
gawk 3.1.4
gwk314b.zip
(946K)
EXEs, manual

gwk314d.zip
(2MB)
Docs: dvi/html/ps
gawk 3.0.6
gawk306x.zip
(440K)
EXEs

gawk306s.zip
(1.1M)
Source, manual

gawk306h.zip
(414K)
Manual in WinHelp format
mawk 1.2.2
mawk122x.zip
(103K)
EXE

mawk122s.zip
(245K)
Source, manual

Other documentation: Try the Gawk FAQ page.


LM — Multi-purpose text file formatter, search / replace, and more.

* * * * 1/2

[added 1997, updated 2004-06-28]

LM is described as a text file "line manipulator", yet this description sorely underestimates its uses. On first inspection, the command line syntax of this program will likely be unintelligible to most casual users, and the included documentation may be difficult to decipher. LM's syntax does not resemble the "standard" syntax of most DOS programs. I include LM in this list because of its combination of small size (37K) and versatility to perform a wide variety of general filtering and formatting chores. Thankfully, some sample commands and batch files are provided which the novice user can easily modify to his or her needs. The option list is nearly endless (uses all 26 letters as options – upper and lower case – and then resorts to using symbols!)

A short list of LM's capabilities include:
  1. text wrapping
  2. stripping/appending lines, stripping blank or duplicated lines
  3. convert tabs to spaces
  4. file finding, renaming
  5. text search / replace
  6. change text case
  7. unusual functions:
Info for power users (from the documentation):
The main operations supported are grip/non-grip, search/replace, synchronised line appendage from other files, input/output line selection by line numbers or passwords, spaces/empty lines absorption, filewise update or renaming, line width imposition and etc. Input lines can also be taken from only the command line. Long command parameters may be taken from files.

Author: Zhuhan Jiang, Australia (1996).

2000-06-22: v2.13. Source included.

Download lm213.zip (215K) or the earlier lm206b.zip (118K).


PAGINATE — Format ASCII documents (tables / headers / footers / indent / wrap).

* * *

[updated 2002-11-15]

Paginate is one of the few programs I'll probably never use – but which I can still highly recommend to a specific audience. Paginate is best described as a comprehensive command line ASCII document formatter. As its names suggests, it can paginate a document for printing. But Paginate can also add page headers and footers, indent paragraphs, produce tables, wrap text at defined margins, and more. In order to generate a formatted document, one has to insert instruction codes within the document to be processed. Frankly, a word processor requires much less work and time for most tasks – and I suspect most home users will have little need for Paginate. But others will undoubtedly love it. Well designed for its purpose.

Author: Bruce Guthrie (2002).

2002-08-10: v0208.

Download PAGN0208.ZIP (176K).

More in these pages from Bruce Guthrie.


ENDNOTE — Organizes endnotes in ASCII text files.

* * * *

[added 2005-12-09]

Suppose you're typing a formal article, or even a book, and you need numbered notes at the end of a file or chapter [not at the end of each page]. Suppose also that you are composing notes on-the-fly, as you type the main text. In typical text editors, to place the notes correctly, you would have to move back and forth, from positions inside the text body, to the end and then back into the body. This would be necessary if you type the notes first and then cut/paste them, or if you move the cursor to the end of the file and then type the notes. Either way, you still have to number the notes. Now suppose that after placing and numbering some notes, you create more of them, or move or delete some...

With the text processor ENDNOTE, you can eliminate all the moving around, and the possibility of getting the order or numbering of notes wrong. As you type the main text in your base (input) file, you create notes wherever you are, and mark them. Run ENDNOTE at any time, to create an output file, with the marked notes moved to the end and properly numbered. If you add, move or delete notes from the original file, running ENDNOTE again will adjust and renumber them.

ENDNOTE is a script in two variants, for the awk and for the Perl languages. To run, install a language/script combination, and enter:
awk -f endnote.awk infile.txt >outfile.txt
or
perl endnote.pl infile.txt >outfile.txt

Note: Knowledge of awk or Perl not required – the scripts do all the work.

Author: Eric Pement (2005).

2005-06-18: v1.3.

Downloads



awk script
endnote13_awk.txt
(3K)
Perl script
endnote13_pl.txt
(3K)
Doc file with markup
endnote_v13.txt
(7K)

To see what it does, go to this page: Compare input and output of ENDNOTE script (based on the downloadable doc file, above).


FU — Multi-purpose text filter.

unrated

[added 1998-08-16]

Small (11K) yet versatile and simpler syntax than LM. Works as a filter, but can also use infile and outfile.

Usage: FU :option parameters ...
Select Lines            :CHOP/:CHOPH str [n]  Page
  :COPY                 :DECTRL [HLD]*          :PAGE [len [beg [end]]]
  :DEROFF [chs]         :NUM [width [swidth]]   :VMARG [top [bot]]
  :FIND/:FIND0 s        :PRINTF [n]             :ODD/:EVEN
  :LINES [1st [last]]   :PREFIX/:SUFFIX s       :HEADER [s [lines]]
  :NULL0              Change Spaces             :FOOTER [s [lines]]
  :UNIQUE [B]           :DETAB [n]            Misc.
  :BEGSTR s [n]         :JUST [LRC [rm]]        :COUNT [@LWC]*
  :ENDSTR s [n]         :LEFT [n]               :FILE [n1 [n2]]
  :SURSTR s1 s2 [n]     :RIGHT [n [ch]]         :{ infiles
  :OUTSTR s1 s2 [n]     :STRIP/:STRIPH          :} outfile
Remap Characters        :TRUNC [col]            :}} outfile
  :ASCII                :UNJUST [skip [col]]    :TEE [fname [AW]]
  :DEBOX [box_s]      Columns                 If
  :ENC [key]            :COL [n1 [n2]]          :BREAK
  :LOWER/:UPPER         :DELCOL [n1 [n2]]       :IFIN/:IFOUT beg [end [inc]]
  :TRANS chs [new_s]    :ADDCOL [n1 [n2 [s]]]   :IFSTR/:IFSTR0 str [inc]
Change Lines            :FILCOL [n1 [n2 [s]]]   :IFCHR/:IFCHR0 chs [inc]
  :CHANGE s [new]

Limitations: Line length restriction of 255 characters.

Notes: ?-The string option for HEADER and FOOTER seems to require quotation mark delimiters to output the string correctly (other options with strings don't seem to need delimiters.) Documentation sparse.

Author: David Lo (1990). Suggested by Robert Bull.

1990-07-03: v3.56.

Download fu.zip (18K).


Filter — Multipurpose text filter can also remove ANSI sequences.

unrated

[updated 2004-06-28, updated 2005-04-16]

Filter is a multi-purpose c-line text filter like LM. Fewer features but a more comprehensible syntax.

usage : FILTER [[<]in] [>out] [/option[...]] [...]] [txtopt [...]]
option: C[n,s,d]   Copy n characters from position s to d.
        D[n,p]     Delete n characters at position p.
        E[+/-][n]  Expand tabs ([+]) or replace spacegroups by tabs (-),
                   where n [8] is tab field length.
        F[n,m]     Fill nonblank lines with dots to width n [70],
                   skipping first m [0] lines. Implies /T.
        H          Send this help text to (redirected) output.
        ?          Send this help text to screen (page by page)
        I[n,p]     Insert n spaces at position p.
        J[+/-]     Add/remove Carriage Return before Line Feed [+].
        L[+/-]     Add/remove Line Feed after Carriage Return [+].
        M[n,s,d]   Move n characters from position s to d.
        N[n]       Number lines, use n [4] digits,
        O[n,s,d]   Overwrite n chars from position s to d.
        P          Reset parity bit. Implied by /W.
        R[n]       Remove n trailing lines after processing /S and /X.
        S[n,m]     Skip m lines starting at line n.
        T          Trim trailing blanks. Implied by /F.
        U[+/-]     Convert to upper/lower case [+].
        V[n,s]     Reverse n [all] characters starting at position s.
        W          Wordstar document ==> ASCII textfile. Implies /P.
        X[n,m]     Extract m lines starting at line n.
        Z[+]       Remove NULLs. Z+: also ANSI screen control sequences.
txtopt: /A[+/-][I][p][*] text   Include lines after  specified text only.
        /B[+/-][I][p][*] text   Include lines before specified text only.
        /G[I][p][*] text        Include lines with the specified text only.
        + : Include matching line.
        - : Do not include the matching line (this is the default).
        I : Ignore upper/lower case.
        p : Search for text starting at column p. Default p=1.
        * : Text may be found at any column at or after p.

Author: Bob Ferguson, Netherlands (2000).

2000-03-24: v4.0.

Downloads
Program, source
filter40.zip
(32K)
Description
filter40.txt
(6B)

More in these pages from Bob Ferguson.


TS Filters — Special purpose filters for text & binary files.

* * * *

[updated 2005-08-22]

These individual filters perform specialized tasks not easily accomplished with most text editors.

Package 1: TSFILT
ASC2IBM
toasc.exe in filter format
FLMARG
Add a left margin
FLRMARG
Filter for a right margin
FLSUBS
String substitution
IBM2ASC
toibm.exe in filter format
IBM2LAT1
IBM PC Scandinavian chars to Latin1
LOGFILT
Filters backspaces from logfiles
PC2UNIX
PC text EOL chars > Unix EOL chars
QUOTE
Quotes to messages
TOASC
8-bit IBM > Scandinavian 7-bit ASCII
TOASCI
8-bit IBM > International 7-bit ASCII
TOIBM
7-bit ASCII > Scandinavian 8-bit IBM
U2PC.BAT
Unix EOL chars > DOS EOL chars
UNIX2PC
Unix text EOL chars > PC EOL chars

Package 2: TSFLTB
FILBIN.EXE
General filter for binary files
FILGEN.EXE
Generalized filter for any file
FILTXT.EXE
General filter for text files

DEMOTXT.XLT
How to build a translation table
HTML2IBM.XLT
Scandinavian HTML chars to IBM
IBM2HTML.XLT
Scandinavian IBM chars to HTML
LOWER.XLT
To lowercase, also foreign chars
NOEOF.XLT
Enables reading text past eof
PC2UNIX.XLT
PC newlines to Unix newlines
SIMUL8.XLT
8-bit to look-alike 7-bit chars
STRIP.XLT
Strip the high bit of 8bit chars
TOASC.XLT
Scandinavian IBM to ASCII
TOIBM.XLT
Scandinavian ASCII to IBM
UNIX2PC.XLT
Unix newlines to PC newlines
UPPER.XLT
To uppercase, also foreign chars

Package 3: TSFLTC
CUTW
Omit/extract whole words based on their
positions in lines (1st word, 2nd word)
CUT
Omit/extract columns from files
SLICE
Omit/extract rows from files
COL
Convert strings to a single column
DETAB
Convert tabs to specified number of spaces
CONCAT
Join files side by side (columnar)
UNIQ
Report or filter out repeated lines in a file
ROT13
ROT scramble / descramble text
LOWER / UPPER
Convert text to lower / upper case
COLPUT
Insert a column of text into a file.

Author: Timo Salmi, Finland (1997-03).

Versions       



2003-12-11:
TSFILT
2.2
1996-02-28:
TSFLTB
1.9
2000-08-18:
TSFLTC
2.5

Downloads
TSFILT
tsfilt25.zip
(128K)
TSFLTB
tsfltb19.zip
(69K)
TSFLTC
tsfltc22.zip
(107K)

More in these pages from Timo Salmi.


FILE SORTING

Also see: 32-bit SORT included with the GNU Textutils.


RPSORT — Sorts large files extremely fast.

* * * * *

[added 1998-03-21, updated 2004-06-28]

A super-fast sort program which handles large files. "RPSORT supports numerous sort key types including regular text keys, C language strings, Turbo Pascal strings, signed and unsigned binary integers of any length and several types of binary floating point numbers."

From a reader:
I tested many of the sort programs in the SimtelNet repository on text files. Most are limited somehow (like DOS sort), or choke, or take a long time to sort, or plainly produce a wrong output (missing or extra records, etc.). The final two survivors were msort and rpsort. I tested both on very long text files (tens of megabytes: the collated complete works of Shakespeare, Project Gutenberg). Msort took several tens of minutes, rpsort did the same in *seconds* (I thought it hadn't run at all.) Given that, there was nothing else to say about DOS sort programs, in my opinion.

Author: Robert Pirko (1992). Suggested by João Magalhaes.

1992-12-15: v1.02.

Download rpsrt102.zip (88K).


PCSORT — Full screen text sort program, supports block, word, and multi-line sorting.

unrated

[updated 1998-03-02]

PCSORT (9K) runs as a full screen, interactive program by default but can also function in the role of command line filter. Although source file size is limited by available conventional memory, PCSORT offers an easy-to-use interface and can sort multiline records (up to 9 lines) and blocks simultaneously. Results can be viewed before being written to disk.

                /Sn   n=size of record in lines (1-9)
                /Pn   n=sort priority (1-9)
                 /R   Sort current priority in reverse order
                 /N   Numeric sort current priority
                 /C   Case sensitive sort
              /L[n]   Line sort:
                      n=record sort line (1-9)
/[B][+] nn [xx [y]]   Block or column sort:
                      nn=start column
                      xx=width
                      y=sort line (1-9)
         /W [+|-] n   Word sort:
                      n=word count
                      minus = count from end of record

Screen menu commands: F1 Displays all sort fields; Alt-F1 Resets all the sort variables to their defaults; F2; Save file; F3 New file; F4 Sort text; F5 Increase lines per record (1-9); Shift F5 Decrease lines per record; F6 Select next key priority (1-9); Shift F6 Select previous key priority; F7 Sort order (de/ascending); F8 Alphanumeric or Numeric sort; F9 Select next Field type: Line, block, word or none; Shift F9 Select previous Field type; F10 Mark the record line for line sort or mark block sort field or select sort word count; Shift F10 Reverse selection of word count.

The v. 1.1 update of PCSORT was originally published in 1991 but apparently is not widely distributed on the Net. The pcsort11.zip archive contains the asm source code, the doc file and the com program for PCSORT as updated 4/18/91 to fix a problem with form feeds at ends of data files. Also contains PCSORT article published in PC Mag: see the included *.xyw (XyWrite) docs.

Author: Michael J. Mefford, for PC Magazine (1991). Suggested by Robert Bull.

1991-04-28: v1.1.

Download pcsort11.zip (40K).


RALPH — Sort lines of text in reverse alphabetical order.

* * * *

[added 2005-07-17]

RALPH sorts lines of text from right to left, i.e, lines are read backwards. If input has multi-word lines, then output will be sorted by line-final words, etc. Originally intended as a grammatical analysis tool, it has other uses.

ear
earache
earaches
eardrop
eardrops
eardrum
eardrums
eared
earflap
earflaps
earful
earfuls
elephant
elephantiases
elephantiasis
elephantine
elephants
imprecated
imprecates
imprecating
imprecation
imprecations
raindrop
raindrops
raining
>
        eared
   imprecated
      earache
  elephantine
      raining
  imprecating
       earful
      eardrum
  imprecation
      earflap
     raindrop
      eardrop
          ear
     earaches
elephantiases
   imprecates
elephantiasis
      earfuls
     eardrums
 imprecations
     earflaps
    raindrops
     eardrops
    elephants
     elephant

abajar
abajo
desganar
desganchar
desgano
desgarbado
desgarbilada
desgarbilado
desgarbo
desgargantar
desgargantarse
desgargolar
desgaritar
desgarrada
desgarradamente
desgarrado
>
   desgarbilada
     desgarrada
 desgargantarse
desgarradamente
       desgarbo
     desgarbado
   desgarbilado
     desgarrado
          abajo
        desgano
     desganchar
         abajar
    desgargolar
       desganar
     desgaritar
   desgargantar
Syntax:  ralph [-a] [-p padding] [infile] > [outfile]
  -a           Extract analysis failures from an AMPLE log file.
  -l linesize  Set the maximum line length (default is no limit).
  -p padding   Specify the minimum padding for each line (default is 0).

If no infile is specified, ralph reads from the standard input.
If no outfile is specified, ralph writes to the standard output.

Author: SIL International (1998).

Versions       
1989-01-24:
1.1
DOS16
Runs on DOS 2.0+. Handles files up to ~128K. Bug: Removes top bit from upper ASCII characters – fixed in v1.1b. Package also contains scripts with similar function, for awk and other Unix programs.
1998-09-01:
1.1b
DOS32
DJGPP build, requires 80386+ and a DOS Protected Mode Interface (CWSDPMI or other).
1998-09-01:
1.1b
Win32 console

Downloads
DOS16
ralph.zip
(13K)
DOS32
ralph11b.zip
(25K)
Win32
ralph32-11b.zip
(18K)
Doc
ralphdoc11b.zip
(364B)

Get more programs for linguists from the SIL Software Catalog.


DUPLICATE–LINE FILTERS

RMDUP — Remove duplicate lines from a sorted file.

unrated

Comments from a user:
How do I use this util? Occasionally I download my bookmark file, combine it with a previous one, sort the combined file, and run RMDUP. The result is a compact archive of sites. Then I ruthlessly prune my bookmark file. It gathers more bookmarks. Note that the file must be sorted. That is, duplicate lines must be next to each other to be found. This can work to one's advantage. One can sort only a section of a file. Duplicate lines would be removed from that section only. Case sensitive [RMDUPS] and case insensitive [RMDUPI] versions are included. There may be a maximum line length restriction, but it handles 451 characters just fine.
Usage: RMDUPS < sortedfile [ >outputfile]
SORT < infile | RMDUPS [ > outputfile]

Package contains ANSI C source code, docs in English & Portuguese.

Author: João Magalhaes, Portugal (1997). Suggestion and description by Marianna Van Erp.

1997-03-13 release.

Download rmdup0.zip (11K).


uniq — Remove or display duplicate lines from sorted file.

unrated

[added 1998-04-18]

This clone of a Unix program is similar to RMDUP but offers more options. Besides removing adjacent duplicate lines from output, uniq offers a "reverse" option: display a single representative of just the duplicate lines. In addition, one can designate which fields on lines to search (a field being text separated by tabs or spaces). uniq is case-sensitive only. Can be used either with filter or with input-output filename syntax. Package contains ANSI C source code and Unix-style manual.

Usage: uniq [ -cdu ] [ +|-n ] [ inputfile [ outputfile ] ]
-c  Precede each line with a count of the number of times it occurred
-d  Write one copy of duplicate lines
-u  Copy only lines not repeated in the original file
+n  Skips over the first n characters
-n  Skips over the first n fields

Author: Jason Mathews (1995).

1995-01-14: v1.2.

Download uniq12.zip (10K).

Other variants of uniq are in the TS Filters (up this page), the GNU Textutils, the Berkeley Utilities, and UXUTL.


TEXT JUSTIFY

Just — ASCII text justifying filter.

* * *

This 24K utility can justify paragraphs (introduces spaces to remove ragged margins). Left, right, and center justification supported. It can also automatically draw boxes around justified paragraphs. Package includes C source code and Unix-style manual pages. Free for personal use.

just [options] [infile] [ >outfile]

-w: Specify the desired output page width, in characters.
-m: Specify the line length below which justification should not be attempted.
-l: Specify left justification mode.
-c: Specify centre justification mode.
-r: Specify right justification mode.
-p: Specify padding justification mode.
-xC:Use character C to make a box-surround for justified paragraphs.

Author: Peter Breuer, UK (1993).

1993-04-22: v1.04.

Download pbjst104.zip (33K).


Justify — Flexible text justifying filter.

unrated [added 1998-11-10, updated 2006-03-14]

From the docs:
Justify will reformat already formatted text. It will ignore titles and other header information and reformat paragraphs to any desired style...The input text must be stripped of all tab characters. JUSTIFY must be able to deturmine what constitutes a paragraph. It is important that the input text be consistently formatted.

Also useful for formatting e-mail (see 'e' and 'q' options). Source code (C) and Linux compilation included.

justify columns [bflditohsrweq] [indent] [body] <source >dest
   b - input file paragraph is hanging indented
   f - input file paragraph is fully indented
   l - input file paragraphs are single lines
   d - delete blank line after paragraph read
   i - insert blank line after paragraph read
   t - indent first paragraph line by indent spaces
   o - indent other paragraph lines by body spaces
   h - remove hyphens across line boundaries
   s - double space after . ? ! ." ?" or !"
   m - process m-dash adjacent to words
   w - output for word processors
   r - ragged right margin (otherwise full justification)
   e - EMAIL input -- don't format quotes or headers
   q - EMAIL output -- add '>' to non-blank lines

The mandatory "columns" argument is the number of columns of text to output. The [indent] and [body] arguments are associated with the 't' and 'o' options.

Author: Tom Almy (2004). Suggested by Robert Bull.

2004-03-12: v1.5.

Download justfy15.zip (26K).


Go to Top | Front Page ]


©1994-2004, Richard L. Green.
©2004-2006, Short.Stop.