Back to home page

DOS ain't dead

Forum index page

Log in | Register

Back to the board
Thread view  Mix view  Order
bencollver

Homepage

11.02.2026, 16:18
 

Ctrl-Z was never actually an EOF character in MS-DOS (Miscellaneous)

MS-DOS didn't have an End-Of-File character of any sort. The MS-DOS system API from version 2.0 onwards treated files as simple octet streams, with no particular octet values having any special meanings. The end-of-file position of a file was recorded in file metadata. (It's the length field in the file's directory entry.) No special meaning was ascribed to character 26 (or indeed to any other character) within file data.

All of the so-called "text file" semantics that are conventionally, but erroneously, ascribed to MS-DOS were in fact artifacts of the C libraries for C compilers targetting DOS. The conversion of CR+LF sequences into just LF was done by the C libraries. The handling of character 26 was done by the C libraries. None of this was actually behaviour inherent in DOS itself.

One can even see this for onesself:

In the source code for the COPY command in FreeDOS explicit application-level code to perform all of the special handling for character 26 and for transforming CR+LF is clearly present. Similar application-level code can be found in many other utilities, such as in the FreeDOS TYPE command for example.

In OpenWatcom C/C++'s fgetc() library function there is the following code:

    if( c == DOS_EOF_CHAR ) {
        fp->_flag |= _EOF;
        c = EOF;
    }


There's identical code in OpenWatcom C/C++'s fread() library function. In OpenWatcom C/C++'s read() library function one finds this code, which ensures that character 26 (which as you can see it erroneously calls "EOF") terminates all reads, resetting the position of the next read to re-read the character and resulting in a zero-byte read if character 26 is the first character read:

    if( buffer[ reduce_idx ] == 0x1a ) {    /* EOF */
        __lseek( handle,
               ((long)reduce_idx - (long)amount_read)+1L,
               SEEK_CUR );
        total_len += finish_idx;
        _ReleaseFileH( handle );
        return( total_len );
    }


Similar code can be found in the OpenWatcom C++ streams functions, and in the run-time libraries of Borland C/C++ for DOS and of DJGPP (the latter in its _filbuf() and read() functions).

* * *

DOS makes no distinction between "text" files and "binary" files in its system API. Files are, to DOS, simple octet streams, with no such division. The DOS API function is INT 0x21 with AX=0x3f, which, as can be seen, does not treat any characters in a file specially, nor perform any translation of the characters in a file. DOS itself is actually a lot more like Unix in this regard than many people think.

Ironically, this greater similarity to Unix was hidden by language libraries, even though several of those language implementations attempted to give Unix-like semantics to DOS as much as they could. This is particularly ironic for DJGPP, for example.

The treatment of character 26 and the handling of "text" files was a shared delusion, common to the C libraries and the code of many programs that ran on top of DOS, from the aforementioned COPY command to text editors. It was wholly layered above DOS itself.

From:

https://web.archive.org/web/20250618015843/http://jdebp.info/FGA/dos-character-26-is-not-special.html

tkchia

Homepage

11.02.2026, 16:43
(edited by tkchia, 11.02.2026, 17:07)

@ bencollver

Was: Ctrl-Z was never actually an EOF character in MS-DOS

Hello bencollver,

It turns out that Michal Necasek had written a reply to precisely this article, where he basically said, "well actually"... :-)

https://www.os2museum.com/wp/misconceptions-on-top-of-misconceptions/

> What CP/M versions 1.x/2.x as well as 86-DOS 0.x had in common is that file sizes were not stored with byte granularity. Instead, file sizes were only tracked in terms of 128-byte "records", which typically happened to correspond to 128-byte floppy disk sectors.
>
> ... for text files, or possibly other data files, this was a problem. No one wanted up to 127 bytes of junk displayed on the screen or sent to the printer. CP/M, like old DEC operating systems, adopted the ASCII SUB (substitute) character in order to solve the problem.

So Ctrl-Z in text files was really a weird holdover from the days of MS-DOS 1.x FCBs. (Incidentally, DOS 1.x lossage also explains why uncompressed .exe files tend to have their MZ headers padded up to a multiple of 512 bytes, even though MZ headers lengths are given as 16-byte paragraphs...)

Thank you!

---
https://codeberg.org/tkchia · https://disroot.org/tkchia · 😴 "MOV AX,0D500H+CMOS_REG_D+NMI"

bencollver

Homepage

11.02.2026, 19:26

@ tkchia

Was: Ctrl-Z was never actually an EOF character in MS-DOS

> Hello bencollver,
>
> It turns out that Michal Necasek had written a reply to precisely this
> article, where he basically said, "well actually"... :-)
>
> https://www.os2museum.com/wp/misconceptions-on-top-of-misconceptions/

What a nice follow-up. Thanks for the link! The comments section is also good.

bretjohn

Homepage E-mail

Rio Rancho, NM,
12.02.2026, 02:45

@ bencollver

Ctrl-Z was never actually an EOF character in MS-DOS

You seem to be conflating some things here. Ctrl-Z is not even a character -- it is a (but not the only) keyboard method of entering ASCII code 26. ASCII code 26 IS the End-Of-File character _in ASCII_. If you're dealing with ASCII text (which you usually are at a DOS command prompt) it simply IS the EOF character. But if you're not dealing with plain/pure ASCII text, code 26 could mean anything. And even in an ASCII text editor (including the editor DOS uses at the command-line), ASCII code 26 may not really mean "End-Of-File" or "End-Of-Input" in the sense you seem to be implying.

For example, if you create a file using the COPY CON FileName method, the way you tell the CON device that you are done entering lines of text (CON uses a line editor, not a text editor) is by entering ASCII code 26 (using any method you want, including Ctrl-Z from the keyboard) and then must also enter an End-Of-Line (usually by hitting Enter on the keyboard, but there are other ways to do that also). The End-Of-Line is required to tell the line editor that you're done editing that line, and the EOF code 26 is simply one of the characters on that line. The text that was entered up to and including the EOF code 26 is stored in the file, but the final End-Of-Line (or any characters entered after the EOF code 26) are not stored in the file.

Technically, the DOS command-line and CON are not "DOS" (not part of the kernel), but to say they are not part of DOS is, at best, misleading and incomplete, sort of like saying bash is not unix. It may technically be correct, but is nowhere near enough of the story to figure out what's really going on.

bencollver

Homepage

12.02.2026, 15:40

@ bretjohn

Ctrl-Z was never actually an EOF character in MS-DOS

> ASCII code 26 IS the End-Of-File character _in ASCII_. If you're dealing
> with ASCII text (which you usually are at a DOS command prompt) it simply
> IS the EOF character.

Here's what Wikipedia has to say about ASCII character 26:

Caret notation:
^Z

Decimal:
26

Hexadecimal:
1A

Abbreviations:
SUB

Control Pictures:


Name:
Substitute

Description:
Replaces a character that was found to be invalid or in error. Should be ignored.

https://en.wikipedia.org/wiki/C0_and_C1_control_codes#C0_controls

bretjohn

Homepage E-mail

Rio Rancho, NM,
14.02.2026, 02:34

@ bencollver

Ctrl-Z was never actually an EOF character in MS-DOS

Let me just say you can't necessarily trust Wikipedia. To claim that DOS essentially ignores (or at least sort of ignores) the EOF character (or what you're calling SUB which means it should be totally ignored as if it doesn't even exist) is simply not true.

For example, if you TYPE a file, DOS assumes the file is ASCII and only types until it finds an EOF (ASCII code 26). Anything after the EOF is ignored, no matter how big the file is. I take advantage of this in some of my programs. As part of the header at the beginning of the programs, I put an EOF character in the header so if somebody TYPEs one of my executable programs (e.g., "TYPE CLOCK.COM") they see a little blurb about the program and then the output stops. Most programs don't do this, and if you TYPE the executable program you usually see a bunch of funky characters and get a few beeps out of the speaker (the ASCII BEL character) before it finally (accidentally) reaches an ASCII code 26 and stops.

Similarly, when using the COPY command, the _default_ is to use binary mode when copying files between disks, but when using COPY to concatenate files, or when either of the two locations is a (character) device instead of a file (block device), the default is to assume ASCII and only copies only until the first EOF is found. This is particularly important when concatenating ASCII files using COPY, since if the last character of the file is an EOF, the EOF character is not copied to the concatenation and a single EOF is automatically added to the end of the concatenation. You can override the default settings of COPY with command-line switches (such as /B to force binary mode), but by default the EOF character matters VERY much to DOS (depending on the specific context).

You can only say DOS "doesn't care" about EOF if you think utilities like TYPE and COPY aren't part of DOS.

Back to the board
Thread view  Mix view  Order
23243 Postings in 2191 Threads, 405 registered users (0 online)
DOS ain't dead | Admin contact
RSS Feed
powered by my little forum