Back to home page

DOS ain't dead

Forum index page

Log in | Register

Back to the forum
Board view  Mix view

x86 code optimization (Announce)

posted by Rugxulo Homepage, Usono, 22.12.2009, 19:16

> FYI, the CMOVNTQ's are not gone at all :-(
>
> http://pastebin.com/m2b5d7a67

He should avoid -march=i686 or higher and just use -mtune=generic instead.

> > INC/DEC are slow on Pentium 4 on up
>
> And Pentium 4 always had the biggest performance problems, why don't you
> just get an 80386, 8086, or 4004 not causing such trouble ??? :confused:

P4 isn't that bad, speed-wise. At least the high clock speeds and SSE2 compensated. But it did break some common optimizations. However, that doesn't mean that Intel hasn't improved upon things. Atom is in-order and thus uses lots less power while being as fast as a high-end PIII or low-end P4. There are even dual-core / 64-bit Atoms now, cheap! (See Darek's blog.)

> > GCC actually generates this a lot when using -mtune=generic or similar.
>
> And what mtune disables such absurdity ???

I know at least -march=pentium won't use it, but that can penalize newer chips, so I wouldn't recommend it without a good reason.

> > (EDIT: That reminds me, I just found out [a year late, heh] that Darek
> > http : / / www . emulators . com / download . htm open-sourced the DOS
> > version of PC Xformer 3.80, but it's hidden inside the GEMCE900.ZIP
> > sources under \atari8\ folder, apparently needs MASM and VC6 or better.
>
> You love such unrelated side notes :lol3:

Well, it's not THAT unrelated. He's a big pro regarding optimization, e.g. his work on speeding up BOCHS, his very fast Atari800 emulator (runs fast on 486, no small feat!).

> I'm sure GCC optimization experts will be able to explain me:
>
> - What's the goal of REP RET
> - What's the goal of O16 NOP

AMD optimization, just like "O16 NOP", IIRC. P4 has jump hints (ds jz), and SSE2 has "pause" (rep nop). Yeah, I know it's weird.

> - What's the goal of avoiding PUSH and POP and using MOV [ESP+blah]
> instead
>>
>> mov [esp+8],edi ; Why not PUSH ???

It would take more instructions, not necessarily better.

>> mov edi,[esp+8] ; Restore - why not POPE ???

This way keeps ESP unmodified.

> - How bloat (redundant repeated encoding of "boring" or same numbers as
> 32-bit values) improves performance

Cpu design, e.g. one-byte LODSB is slower than bigger, manual MOV due to more RISC-y internal structure since 486. XCHG (atomic) is also slower than push / pop (pairable on original Pentium).

> - Why preserve EBP when you don't need it at all :clap:

Dunno, probably a compiler shortcoming.

 

Complete thread:

Back to the forum
Board view  Mix view
22781 Postings in 2123 Threads, 402 registered users (0 online)
DOS ain't dead | Admin contact
RSS Feed
powered by my little forum