Back to home page

DOS ain't dead

Forum index page

Log in | Register

Back to index page
Thread view  Board view
mht

Homepage

Wroclaw, Poland,
02.11.2008, 14:02
 

Question about some x86 opcodes (Developers)

Hello,

I have a question about specific x86 opcodes that I hope someone here can answer.

The following instructions:
  ADD reg/mem16, immediate
  ADC reg/mem16, immediate
  SUB reg/mem16, immediate
  SBB reg/mem16, immediate
  CMP reg/mem16, immediate

have two forms, one with 16-bit immediate value, and the other with 8-bit immediate value which is sign-extended to 16 bits, for example:
  ADD DI, 1234h  ->  81 C7 34 12
  ADD DI, 12h    ->  83 C7 12

(in general, bits 5..3 in the second byte tell the operation: 000=ADD, 010=ADC, 011=SBB, 101=SUB, 111=CMP). However, it is unclear whether the remaining three arithmetic instructions in this group (001=OR, 100=AND, 110=XOR) also accept "short" immediates, i.e., whether
  OR DI, 12h  ->  83 CF 12     "short"
is valid, or
  OR DI, 12h  ->  81 CF 12 00  "long"
must be used instead.

The [short] for OR/AND/XOR is surely documented for 386+, and seems to be undocumented for earlier x86 processors (I even remeber to have seen an 8086 opcode chart with empty spaces in the corresponding places, but I do not have it anymore).

However, many assemblers and compilers actually do use "short" forms even for 8086 target! These are results of my testing:

JWASM (recent version)          "short"
NASM with "-O2" (0.98.38)       "short"
TASM 3.1                        "short"
TD 3.1   integrated asm         "short"
BC++ 3.1 code generator         "short"
   "     inline asm             "long"
TP 7.0   code generator         "long"
   "     inline asm             "long"
   "     (in runtime library)   "short"
MS-DOS DEBUG (WinXP version)    "long"
DR DOS 6.00 SID (R3.2) debugger "long"
DR-DOS 7.03 DEBUG (R1.51)       "short"


Any additions (particularly, Microsoft and Intel tools) and/or corrections to the above list are of course welcome, but even more welcome is an explanation!

Do asemblers and compilers from major software vendors produce code that does not run on some processors? Rather incredible...

My hypothesis is, that the undocumented "short" opcodes for AND/OR/XOR actually do work on all Intel chips and clones (as NEC V20). Older compilers/assemblers (and parts of their old code still remaining in newer versions) rely on old official Intel specs and always use the "long" form. Newer versions know that the short forms are always safe and use them. But this is only a hypothesis.

Even if DOS world, many software writers do not care about anything below 386 nowadays. But I like to maintain compatibility whenever possible. And, last but not least, the ability to save one byte of code is sometimes critical ;-)

Can anyone help?

Michal
mht@bttr-software.de

Japheth

Homepage

Germany (South),
02.11.2008, 17:51

@ mht
 

Question about some x86 opcodes

> Even if DOS world, many software writers do not care about anything below
> 386 nowadays. But I like to maintain compatibility whenever possible. And,
> last but not least, the ability to save one byte of code is sometimes
> critical ;-)
>
> Can anyone help?

There is an option in Masm:

OPTION NOSIGNEXTEND

and the documentation says:

NOSIGNEXTEND Overrides the default sign-extended opcodes for the AND, OR, and XOR instructions and generates the larger non-sign-extended forms of these instructions. Provided for compatibility with NEC V25 and NEC V35 controllers.

---
MS-DOS forever!

DOS386

03.11.2008, 10:04

@ mht
 

Question about some x86 opcodes

> it is unclear whether the remaining three arithmetic
> instructions in this group (001=OR, 100=AND, 110=XOR) also
> accept "short" immediates, i.e., whether
>   OR DI, 12h  ->  83 CF 12     "short"
> is valid, or
>   OR DI, 12h  ->  81 CF 12 00  "long"
> must be used instead.

> There is an option in Masm:
> OPTION NOSIGNEXTEND
> and the documentation says:
> > NOSIGNEXTEND Overrides the default sign-extended opcodes for
> > the AND, OR, and XOR instructions and generates the larger
> > non-sign-extended forms of these instructions. Provided for
> > compatibility with NEC V25 and NEC V35 controllers.

Funny point :-|

> However, many assemblers and compilers actually do use "short" forms even
> for 8086 target! These are results of my testing:

My test:

FASM always brews the short 3-byte variant. There is no "only 8086" option in FASM, however there is an include file for this purpose http://board.flatassembler.net/download.php?id=2832 , and it doesn't affect those instructions. OTOH

OR DI, WORD $12

brews the 4-byte variant :clap:

---
This is a LOGITECH mouse driver, but some software expect here
the following string:*** This is Copyright 1983 Microsoft ***

mht

Homepage

Wroclaw, Poland,
05.11.2008, 19:54

@ DOS386
 

Question about some x86 opcodes

Things are not that clear... Microsoft people say:

http://support.microsoft.com/kb/69987
For optimization reasons, MASM may generate the opcode 83 for logical AND, OR, and XOR instructions in some cases, rather than opcode 81. Unfortunately, opcode 83 was not documented by Intel for 80x86/8088 processors prior to the 80386. Therefore, some processors (such as the NEC V25 and V35 controllers) and some in-circuit emulators for the 80x86 family do not support this opcode.

and NEC Electronics people say (in V25 and V35 documentation):

http://www.necel.com/cgi-bin/nesdis/o006_e.cgi?article=UPD70330
Even in this case, instructions are executed normally. Take precautions, however, since some emulators do not support the disassembly function or line assembly function for this instruction.

(many thanks to Lucho for both links).

Then I looked into PC-DOS 5.00, MS-DOS 6.22, PC-DOS 7.10 and DR-DOS 7.03 kernel binaries. Microsoft's and IBM's kernels contain mainly the "short" forms (a few "long" forms may be hypothetically explained by the assembler not knowing the value in the first pass, particularly that the non-problematic instructions also happen to be "longer than needed"). DR-DOS kernel contains both forms, probably because of two different assemblers used for different source files (MASM and Digital Reasearch's RASM86) -- at least both are needed to build OpenDOS 7.01.

NEC V20 chips (V25 is a microcontroller version of it) were often used in "Turbo-XT" machines as a faster replacement of Intel 8088. Did anyone ever complained about DOS not running correctly on those? I doubt. Also much of other software would experience problems there. So I think that NOSIGNEXTEND is not necessary on PCs, unless some emulators or debuggers are considered.

Rugxulo

Homepage

Usono,
06.11.2008, 21:06

@ mht
 

Question about some x86 opcodes

> NEC V20 chips (V25 is a microcontroller version of it) were often used in
> "Turbo-XT" machines as a faster replacement of Intel 8088. Did anyone ever
> complained about DOS not running correctly on those? I doubt. Also much of
> other software would experience problems there. So I think that
> NOSIGNEXTEND is not necessary on PCs, unless some emulators or debuggers
> are considered.

I would doubt it's an issue too, but I'd have to ask Jim Leonard (8088-obsessed dude). ;-) Anyways, most popular assemblers I tried seemed to use the short form, which means that they would all be buggy if it was an issue, so I highly doubt they weren't tested "back in the day". Anyways, these days NEC chips are very very very rare, as even a 286 outruns it. The only "bug" commonly known for them is that "aam 16" etc. (i.e. with any operand other than default 10, since it was undocumented by Intel) don't work.

mht

Homepage

Wroclaw, Poland,
08.11.2008, 08:46

@ Rugxulo
 

Question about some x86 opcodes

> Anyways, these days NEC chips are very very very rare, as even a 286
> outruns it.

Today XT machines are rare, in general. But some of them may still work ;-)

> The only "bug" commonly known for them is that "aam 16" etc.
> (i.e. with any operand other than default 10, since it was undocumented by
> Intel) don't work.

Too bad, "aam 16" and "aad 16" are handy for dec-hex conversions. I remember I had to revise some of my old code when I found this information. BTW, it is strange that they do not support it -- for me, the purpose of the second byte was obvious as soon as I noticed that this is a two-byte opcode ;-) (I had no "undocumented" documentation at that time). If NEC really reverse-engineered Intel chips, this is even more strange...

Rugxulo

Homepage

Usono,
12.11.2008, 00:43

@ mht
 

Question about some x86 opcodes

> > Anyways, these days NEC chips are very very very rare, as even a 286
> > outruns it.
>
> Today XT machines are rare, in general. But some of them may still work
> ;-)

I do believe it miracles! :-D

> > The only "bug" commonly known for them is that "aam 16" etc.
> > (i.e. with any operand other than default 10, since it was undocumented
> by
> > Intel) don't work.
>
> Too bad, "aam 16" and "aad 16" are handy for dec-hex conversions. I
> remember I had to revise some of my old code when I found this
> information. BTW, it is strange that they do not support it -- for me, the
> purpose of the second byte was obvious as soon as I noticed that this is a
> two-byte opcode ;-) (I had no "undocumented" documentation at that time).
> If NEC really reverse-engineered Intel chips, this is even more strange...

Jim Leonard didn't have anything extra to add, so I think you're safe. :-)

("Didn't know anything about it, but it looks like you've posed the question, research the question, and answered the question all in the below email!" -- JL)

mht

Homepage

Wroclaw, Poland,
12.11.2008, 16:54

@ Rugxulo
 

Question about some x86 opcodes

Good too hear this! :-)
Thank you for contacting Jim Leonard and letting me know!

mht

Homepage

Wroclaw, Poland,
22.11.2008, 13:59

@ mht
 

Question about some x86 opcodes

By the way:

Am186(TM) and Am188(TM) Family Instruction Set Manual from February 1997 (i.e., after Intel documented the "short" opcodes for 80386) explicitly documents the "short" opcodes on pages 57, 209 and 291:

AND r/m16,imm8   83 /4 ib   AND sign-extended immediate byte with r/m word
OR  r/m16,imm8   83 /1 ib   OR immediate byte with r/m word
XOR r/m16,imm8   83 /6 ib   XOR sign-extended immediate byte with r/m word


Note that the phrase "sign-extended" is missing in the description of OR intruction -- just an omission?

Back to index page
Thread view  Board view
22049 Postings in 2034 Threads, 396 registered users, 259 users online (1 registered, 258 guests)
DOS ain't dead | Admin contact
RSS Feed
powered by my little forum