Free Software for DOS Text Utilities 2 Text Formatters & Filters |
21 Aug 2006 |
---|
Go back to Front Page Menus |
---|
PROCESS, FORMAT, FILTER PLAIN TEXT |
Listed below are both "all-in-one" multi-filter programs and packages containing multiple, single-purpose filters.
Also see sed Versatile text filter.
awk, gawk, mawk Powerful text processor, ported from Unix.
* * * * *
[added 1999-03-07, updated 2005-03-28]
Reviewed by Howard Schwartz 1999-02-28
Unix comes standard with a set of programs that can do just about anything imaginable with a text (i.e., ASCII) file. In rough order of complexity they are:By default, awk reads files, a line at a time, checks each line to see if it matches a pattern, and processes each matching line according to a script of commands. The pattern can be a word, phrase, regular expression, or complex expression. Commands are similar to the C programming language, and have the typical form:
/regular expression/ {one or more commands}
By default, awk keeps track of the line number of each line, counts the number of words in each line, and numbers the words so they can be referred to, like "positional parameters" in a DOS batch file. Thus, awk easily rearranges columns, or words in a line. For instance, the command:
/^[A-Z]/ {print $2, $1, $3}
will reverse the first and second words of each line whose first letter is a capital letter. This might, for instance, reverse the first and last names of lines that are part of an address book.
Other features of awk that may be useful to general users:There are quite a number of freeware versions of awk. Among the best are GNU's, usually called "gawk". They have been under revision and development for quite some time, and come with several very useful extensions. Another version is mawk, based on Unix and POSIX models.
Happily, GNU has put out a free awk manual, available in a number of formats, including HTML online for the latest version. This manual is both comprehensive and easy to understand. The typical "man page" that comes with awk (even gawk) appears to contain English words, but is all but a secret code to most "non-Unix fluent" human beings.
2005-03-21: GNU gawk v3.1.4, with new networking features. 32-bit DJGPP build, requires 80386+ and a DOS Protected Mode Interface (CWSDPMI or other). Authors: Jürgen Kahrs, Arnold Robbins, David Trueman for GNU. Port by Juan Manuel Guerrero, Germany (2005).
2000-12-06: Gnuish project's gawk v3.0.6. 16-bit DOS & OS/2, 32-bit DOS, and Win32 executables in one package. Get this only if you need a 16-bit EXE for 80286 or older CPU. Authors: Port by Conrad Kwok, Scott Garfinkle, Scott Deifik (2000).
1996-02-04: mawk v1.2.2. 16-bit EXE for DOS & OS/2. Runs fast. Author: Michael Brennan (1996).
Downloads | ||||||
gawk 3.1.4 | gwk314b.zip | (946K) | EXEs, manual | |||
gwk314d.zip | (2MB) | Docs: dvi/html/ps | ||||
gawk 3.0.6 | gawk306x.zip | (440K) | EXEs | |||
gawk306s.zip | (1.1M) | Source, manual | ||||
gawk306h.zip | (414K) | Manual in WinHelp format | ||||
mawk 1.2.2 | mawk122x.zip | (103K) | EXE | |||
mawk122s.zip | (245K) | Source, manual |
Other documentation: Try the Gawk FAQ page.
LM Multi-purpose text file formatter, search / replace, and more.
* * * * 1/2
[added 1997, updated 2004-06-28]
LM is described as a text file "line manipulator", yet this description sorely underestimates its uses. On first inspection, the command line syntax of this program will likely be unintelligible to most casual users, and the included documentation may be difficult to decipher. LM's syntax does not resemble the "standard" syntax of most DOS programs. I include LM in this list because of its combination of small size (37K) and versatility to perform a wide variety of general filtering and formatting chores. Thankfully, some sample commands and batch files are provided which the novice user can easily modify to his or her needs. The option list is nearly endless (uses all 26 letters as options upper and lower case and then resorts to using symbols!)
A short list of LM's capabilities include:The main operations supported are grip/non-grip, search/replace, synchronised line appendage from other files, input/output line selection by line numbers or passwords, spaces/empty lines absorption, filewise update or renaming, line width imposition and etc. Input lines can also be taken from only the command line. Long command parameters may be taken from files.
Author: Zhuhan Jiang, Australia (1996).
2000-06-22: v2.13. Source included.
Download lm213.zip (215K) or the earlier lm206b.zip (118K).
PAGINATE Format ASCII documents (tables / headers / footers / indent / wrap).
* * *
[updated 2002-11-15]
Paginate is one of the few programs I'll probably never use but which I can still highly recommend to a specific audience. Paginate is best described as a comprehensive command line ASCII document formatter. As its names suggests, it can paginate a document for printing. But Paginate can also add page headers and footers, indent paragraphs, produce tables, wrap text at defined margins, and more. In order to generate a formatted document, one has to insert instruction codes within the document to be processed. Frankly, a word processor requires much less work and time for most tasks and I suspect most home users will have little need for Paginate. But others will undoubtedly love it. Well designed for its purpose.
Author: Bruce Guthrie (2002).
2002-08-10: v0208.
Download PAGN0208.ZIP (176K).
More in these pages from Bruce Guthrie.
ENDNOTE Organizes endnotes in ASCII text files.
* * * *
[added 2005-12-09]
Suppose you're typing a formal article, or even a book, and you need numbered notes at the end of a file or chapter [not at the end of each page]. Suppose also that you are composing notes on-the-fly, as you type the main text. In typical text editors, to place the notes correctly, you would have to move back and forth, from positions inside the text body, to the end and then back into the body. This would be necessary if you type the notes first and then cut/paste them, or if you move the cursor to the end of the file and then type the notes. Either way, you still have to number the notes. Now suppose that after placing and numbering some notes, you create more of them, or move or delete some...
With the text processor ENDNOTE, you can eliminate all the moving around, and the possibility of getting the order or numbering of notes wrong. As you type the main text in your base (input) file, you create notes wherever you are, and mark them. Run ENDNOTE at any time, to create an output file, with the marked notes moved to the end and properly numbered. If you add, move or delete notes from the original file, running ENDNOTE again will adjust and renumber them.
ENDNOTE is a script in two variants, for the awk and for the Perl languages. To run, install a language/script combination, and enter:
awk -f endnote.awk infile.txt >outfile.txt
or
perl endnote.pl infile.txt >outfile.txt
Note: Knowledge of awk or Perl not required the scripts do all the work.
Author: Eric Pement (2005).
2005-06-18: v1.3.
Downloads | ||||
awk script | endnote13_awk.txt | (3K) | ||
Perl script | endnote13_pl.txt | (3K) | ||
Doc file with markup | endnote_v13.txt | (7K) |
To see what it does, go to this page: Compare input and output of ENDNOTE script (based on the downloadable doc file, above).
FU Multi-purpose text filter.
unrated
[added 1998-08-16]
Small (11K) yet versatile and simpler syntax than LM. Works as a filter, but can also use infile and outfile.
Usage: FU :option parameters ... Select Lines :CHOP/:CHOPH str [n] Page :COPY :DECTRL [HLD]* :PAGE [len [beg [end]]] :DEROFF [chs] :NUM [width [swidth]] :VMARG [top [bot]] :FIND/:FIND0 s :PRINTF [n] :ODD/:EVEN :LINES [1st [last]] :PREFIX/:SUFFIX s :HEADER [s [lines]] :NULL0 Change Spaces :FOOTER [s [lines]] :UNIQUE [B] :DETAB [n] Misc. :BEGSTR s [n] :JUST [LRC [rm]] :COUNT [@LWC]* :ENDSTR s [n] :LEFT [n] :FILE [n1 [n2]] :SURSTR s1 s2 [n] :RIGHT [n [ch]] :{ infiles :OUTSTR s1 s2 [n] :STRIP/:STRIPH :} outfile Remap Characters :TRUNC [col] :}} outfile :ASCII :UNJUST [skip [col]] :TEE [fname [AW]] :DEBOX [box_s] Columns If :ENC [key] :COL [n1 [n2]] :BREAK :LOWER/:UPPER :DELCOL [n1 [n2]] :IFIN/:IFOUT beg [end [inc]] :TRANS chs [new_s] :ADDCOL [n1 [n2 [s]]] :IFSTR/:IFSTR0 str [inc] Change Lines :FILCOL [n1 [n2 [s]]] :IFCHR/:IFCHR0 chs [inc] :CHANGE s [new]
Limitations: Line length restriction of 255 characters.
Notes: ?-The string option for HEADER and FOOTER seems to require quotation mark delimiters to output the string correctly (other options with strings don't seem to need delimiters.) Documentation sparse.
Author: David Lo (1990). Suggested by Robert Bull.
1990-07-03: v3.56.
Download fu.zip (18K).
Filter Multipurpose text filter can also remove ANSI sequences.
unrated
[updated 2004-06-28, updated 2005-04-16]
Filter is a multi-purpose c-line text filter like LM. Fewer features but a more comprehensible syntax.
usage : FILTER [[<]in] [>out] [/option[...]] [...]] [txtopt [...]] option: C[n,s,d] Copy n characters from position s to d. D[n,p] Delete n characters at position p. E[+/-][n] Expand tabs ([+]) or replace spacegroups by tabs (-), where n [8] is tab field length. F[n,m] Fill nonblank lines with dots to width n [70], skipping first m [0] lines. Implies /T. H Send this help text to (redirected) output. ? Send this help text to screen (page by page) I[n,p] Insert n spaces at position p. J[+/-] Add/remove Carriage Return before Line Feed [+]. L[+/-] Add/remove Line Feed after Carriage Return [+]. M[n,s,d] Move n characters from position s to d. N[n] Number lines, use n [4] digits, O[n,s,d] Overwrite n chars from position s to d. P Reset parity bit. Implied by /W. R[n] Remove n trailing lines after processing /S and /X. S[n,m] Skip m lines starting at line n. T Trim trailing blanks. Implied by /F. U[+/-] Convert to upper/lower case [+]. V[n,s] Reverse n [all] characters starting at position s. W Wordstar document ==> ASCII textfile. Implies /P. X[n,m] Extract m lines starting at line n. Z[+] Remove NULLs. Z+: also ANSI screen control sequences. txtopt: /A[+/-][I][p][*] text Include lines after specified text only. /B[+/-][I][p][*] text Include lines before specified text only. /G[I][p][*] text Include lines with the specified text only. + : Include matching line. - : Do not include the matching line (this is the default). I : Ignore upper/lower case. p : Search for text starting at column p. Default p=1. * : Text may be found at any column at or after p.
Author: Bob Ferguson, Netherlands (2000).
2000-03-24: v4.0.
Downloads | ||||
Program, source | filter40.zip | (32K) | ||
Description | filter40.txt | (6B) |
More in these pages from Bob Ferguson.
TS Filters Special purpose filters for text & binary files.
* * * *
[updated 2005-08-22]
These individual filters perform specialized tasks not easily accomplished with most text editors.
Author: Timo Salmi, Finland (1997-03).
Versions | ||||
2003-12-11: | TSFILT | 2.2 | ||
1996-02-28: | TSFLTB | 1.9 | ||
2000-08-18: | TSFLTC | 2.5 |
Downloads | ||||
TSFILT | tsfilt25.zip | (128K) | ||
TSFLTB | tsfltb19.zip | (69K) | ||
TSFLTC | tsfltc22.zip | (107K) |
More in these pages from Timo Salmi.
FILE SORTING |
Also see: 32-bit SORT included with the GNU Textutils.
RPSORT Sorts large files extremely fast.
* * * * *
[added 1998-03-21, updated 2004-06-28]
A super-fast sort program which handles large files. "RPSORT supports numerous sort key types including regular text keys, C language strings, Turbo Pascal strings, signed and unsigned binary integers of any length and several types of binary floating point numbers."
From a reader:I tested many of the sort programs in the SimtelNet repository on text files. Most are limited somehow (like DOS sort), or choke, or take a long time to sort, or plainly produce a wrong output (missing or extra records, etc.). The final two survivors were msort and rpsort. I tested both on very long text files (tens of megabytes: the collated complete works of Shakespeare, Project Gutenberg). Msort took several tens of minutes, rpsort did the same in *seconds* (I thought it hadn't run at all.) Given that, there was nothing else to say about DOS sort programs, in my opinion.
Author: Robert Pirko (1992). Suggested by João Magalhaes.
1992-12-15: v1.02.
Download rpsrt102.zip (88K).
PCSORT Full screen text sort program, supports block, word, and multi-line sorting.
unrated
[updated 1998-03-02]
PCSORT (9K) runs as a full screen, interactive program by default but can also function in the role of command line filter. Although source file size is limited by available conventional memory, PCSORT offers an easy-to-use interface and can sort multiline records (up to 9 lines) and blocks simultaneously. Results can be viewed before being written to disk.
/Sn n=size of record in lines (1-9) /Pn n=sort priority (1-9) /R Sort current priority in reverse order /N Numeric sort current priority /C Case sensitive sort /L[n] Line sort: n=record sort line (1-9) /[B][+] nn [xx [y]] Block or column sort: nn=start column xx=width y=sort line (1-9) /W [+|-] n Word sort: n=word count minus = count from end of record
Screen menu commands: F1 Displays all sort fields; Alt-F1 Resets all the sort variables to their defaults; F2; Save file; F3 New file; F4 Sort text; F5 Increase lines per record (1-9); Shift F5 Decrease lines per record; F6 Select next key priority (1-9); Shift F6 Select previous key priority; F7 Sort order (de/ascending); F8 Alphanumeric or Numeric sort; F9 Select next Field type: Line, block, word or none; Shift F9 Select previous Field type; F10 Mark the record line for line sort or mark block sort field or select sort word count; Shift F10 Reverse selection of word count.
The v. 1.1 update of PCSORT was originally published in 1991 but apparently is not widely distributed on the Net. The pcsort11.zip archive contains the asm source code, the doc file and the com program for PCSORT as updated 4/18/91 to fix a problem with form feeds at ends of data files. Also contains PCSORT article published in PC Mag: see the included *.xyw (XyWrite) docs.
Author: Michael J. Mefford, for PC Magazine (1991). Suggested by Robert Bull.
1991-04-28: v1.1.
Download pcsort11.zip (40K).
RALPH Sort lines of text in reverse alphabetical order.
* * * *
[added 2005-07-17]
RALPH sorts lines of text from right to left, i.e, lines are read backwards. If input has multi-word lines, then output will be sorted by line-final words, etc. Originally intended as a grammatical analysis tool, it has other uses.
ear earache earaches eardrop eardrops eardrum eardrums eared earflap earflaps earful earfuls elephant elephantiases elephantiasis elephantine elephants imprecated imprecates imprecating imprecation imprecations raindrop raindrops raining> eared imprecated earache elephantine raining imprecating earful eardrum imprecation earflap raindrop eardrop ear earaches elephantiases imprecates elephantiasis earfuls eardrums imprecations earflaps raindrops eardrops elephants elephant
abajar abajo desganar desganchar desgano desgarbado desgarbilada desgarbilado desgarbo desgargantar desgargantarse desgargolar desgaritar desgarrada desgarradamente desgarrado> desgarbilada desgarrada desgargantarse desgarradamente desgarbo desgarbado desgarbilado desgarrado abajo desgano desganchar abajar desgargolar desganar desgaritar desgargantar
Syntax: ralph [-a] [-p padding] [infile] > [outfile] -a Extract analysis failures from an AMPLE log file. -l linesize Set the maximum line length (default is no limit). -p padding Specify the minimum padding for each line (default is 0). If no infile is specified, ralph reads from the standard input. If no outfile is specified, ralph writes to the standard output.
Author: SIL International (1998).
Versions | ||||||
1989-01-24: | 1.1 | DOS16 | Runs on DOS 2.0+. Handles files up to ~128K. Bug: Removes top bit from upper ASCII characters fixed in v1.1b. Package also contains scripts with similar function, for awk and other Unix programs. |
|||
1998-09-01: | 1.1b | DOS32 | DJGPP build, requires 80386+ and a DOS Protected Mode Interface (CWSDPMI or other). | |||
1998-09-01: | 1.1b | Win32 console |
Downloads | ||||
DOS16 | ralph.zip | (13K) | ||
DOS32 | ralph11b.zip | (25K) | ||
Win32 | ralph32-11b.zip | (18K) | Doc | ralphdoc11b.zip | (364B) |
Get more programs for linguists from the SIL Software Catalog.
DUPLICATELINE FILTERS |
RMDUP Remove duplicate lines from a sorted file.
unrated
Comments from a user:How do I use this util? Occasionally I download my bookmark file, combine it with a previous one, sort the combined file, and run RMDUP. The result is a compact archive of sites. Then I ruthlessly prune my bookmark file. It gathers more bookmarks. Note that the file must be sorted. That is, duplicate lines must be next to each other to be found. This can work to one's advantage. One can sort only a section of a file. Duplicate lines would be removed from that section only. Case sensitive [RMDUPS] and case insensitive [RMDUPI] versions are included. There may be a maximum line length restriction, but it handles 451 characters just fine.
Usage: RMDUPS < sortedfile [ >outputfile] SORT < infile | RMDUPS [ > outputfile]
Package contains ANSI C source code, docs in English & Portuguese.
Author: João Magalhaes, Portugal (1997). Suggestion and description by Marianna Van Erp.
1997-03-13 release.
Download rmdup0.zip (11K).
uniq Remove or display duplicate lines from sorted file.
unrated
[added 1998-04-18]
This clone of a Unix program is similar to RMDUP but offers more options. Besides removing adjacent duplicate lines from output, uniq offers a "reverse" option: display a single representative of just the duplicate lines. In addition, one can designate which fields on lines to search (a field being text separated by tabs or spaces). uniq is case-sensitive only. Can be used either with filter or with input-output filename syntax. Package contains ANSI C source code and Unix-style manual.
Usage: uniq [ -cdu ] [ +|-n ] [ inputfile [ outputfile ] ] -c Precede each line with a count of the number of times it occurred -d Write one copy of duplicate lines -u Copy only lines not repeated in the original file +n Skips over the first n characters -n Skips over the first n fields
Author: Jason Mathews (1995).
1995-01-14: v1.2.
Download uniq12.zip (10K).
Other variants of uniq are in the TS Filters (up this page), the GNU Textutils, the Berkeley Utilities, and UXUTL.
TEXT JUSTIFY |
Just ASCII text justifying filter.
* * *
This 24K utility can justify paragraphs (introduces spaces to remove ragged margins). Left, right, and center justification supported. It can also automatically draw boxes around justified paragraphs. Package includes C source code and Unix-style manual pages. Free for personal use.
just [options] [infile] [ >outfile] -w: Specify the desired output page width, in characters. -m: Specify the line length below which justification should not be attempted. -l: Specify left justification mode. -c: Specify centre justification mode. -r: Specify right justification mode. -p: Specify padding justification mode. -xC:Use character C to make a box-surround for justified paragraphs.
Author: Peter Breuer, UK (1993).
1993-04-22: v1.04.
Download pbjst104.zip (33K).
Justify Flexible text justifying filter.
unrated [added 1998-11-10, updated 2006-03-14]
From the docs:Justify will reformat already formatted text. It will ignore titles and other header information and reformat paragraphs to any desired style...The input text must be stripped of all tab characters. JUSTIFY must be able to deturmine what constitutes a paragraph. It is important that the input text be consistently formatted.
Also useful for formatting e-mail (see 'e' and 'q' options). Source code (C) and Linux compilation included.
justify columns [bflditohsrweq] [indent] [body] <source >dest b - input file paragraph is hanging indented f - input file paragraph is fully indented l - input file paragraphs are single lines d - delete blank line after paragraph read i - insert blank line after paragraph read t - indent first paragraph line by indent spaces o - indent other paragraph lines by body spaces h - remove hyphens across line boundaries s - double space after . ? ! ." ?" or !" m - process m-dash adjacent to words w - output for word processors r - ragged right margin (otherwise full justification) e - EMAIL input -- don't format quotes or headers q - EMAIL output -- add '>' to non-blank lines
The mandatory "columns" argument is the number of columns of text to output. The [indent] and [body] arguments are associated with the 't' and 'o' options.
Author: Tom Almy (2004). Suggested by Robert Bull.
2004-03-12: v1.5.
Download justfy15.zip (26K).
[ Go to Top | Front Page ]
©1994-2004, Richard L. Green.
©2004-2006, Short.Stop.