x86 code optimization (Announce)
> FYI, the CMOVNTQ's are not gone at all 
>
> http://pastebin.com/m2b5d7a67
He should avoid -march=i686 or higher and just use -mtune=generic instead.
> > INC/DEC are slow on Pentium 4 on up
>
> And Pentium 4 always had the biggest performance problems, why don't you
> just get an 80386, 8086, or 4004 not causing such trouble ??? 
P4 isn't that bad, speed-wise. At least the high clock speeds and SSE2 compensated. But it did break some common optimizations. However, that doesn't mean that Intel hasn't improved upon things. Atom is in-order and thus uses lots less power while being as fast as a high-end PIII or low-end P4. There are even dual-core / 64-bit Atoms now, cheap! (See Darek's blog.)
> > GCC actually generates this a lot when using -mtune=generic or similar.
>
> And what mtune disables such absurdity ???
I know at least -march=pentium won't use it, but that can penalize newer chips, so I wouldn't recommend it without a good reason.
> > (EDIT: That reminds me, I just found out [a year late, heh] that Darek
> > http : / / www . emulators . com / download . htm open-sourced the DOS
> > version of PC Xformer 3.80, but it's hidden inside the GEMCE900.ZIP
> > sources under \atari8\ folder, apparently needs MASM and VC6 or better.
>
> You love such unrelated side notes
Well, it's not THAT unrelated. He's a big pro regarding optimization, e.g. his work on speeding up BOCHS, his very fast Atari800 emulator (runs fast on 486, no small feat!).
> I'm sure GCC optimization experts will be able to explain me:
>
> - What's the goal of REP RET
> - What's the goal of O16 NOP
AMD optimization, just like "O16 NOP", IIRC. P4 has jump hints (ds jz), and SSE2 has "pause" (rep nop). Yeah, I know it's weird.
> - What's the goal of avoiding PUSH and POP and using MOV [ESP+blah]
> instead
>>
>> mov [esp+8],edi ; Why not PUSH ???
It would take more instructions, not necessarily better.
>> mov edi,[esp+8] ; Restore - why not POPE ???
This way keeps ESP unmodified.
> - How bloat (redundant repeated encoding of "boring" or same numbers as
> 32-bit values) improves performance
Cpu design, e.g. one-byte LODSB is slower than bigger, manual MOV due to more RISC-y internal structure since 486. XCHG (atomic) is also slower than push / pop (pairable on original Pentium).
> - Why preserve EBP when you don't need it at all 
Dunno, probably a compiler shortcoming.
Complete thread:
- MPLAYER update (Win32, tests needed) - DOS386, 20.12.2009, 08:28
![Open in board view [Board]](img/board_d.gif)
![Open in mix view [Mix]](img/mix_d.gif)
- MPLAYER update (Win32, tests needed) - Rugxulo, 20.12.2009, 20:07
- MPLAYER update (Win32, tests needed) - Laaca, 21.12.2009, 01:26
- MPLAYER update (Win32, tests needed) - ron, 21.12.2009, 03:08
- !!! IT WORKS AGAIN !!! - DOS386, 21.12.2009, 07:55
- !!! IT WORKS AGAIN !!! - Khusraw, 21.12.2009, 09:56
- MPLAYER update (Win32, tests needed) - Khusraw, 21.12.2009, 09:34
- MPLAYER update (Win32, tests needed) - Deniska, 23.12.2009, 00:23
- MPLAYER update (Win32, tests needed) - Rugxulo, 23.12.2009, 05:52
- MPLAYER update (Win32, tests needed) - Khusraw, 23.12.2009, 07:52
- MPLAYER update (Win32, tests needed) - RayeR, 25.12.2009, 01:08
- MPLAYER update (Win32, tests needed) - Khusraw, 25.12.2009, 10:48
- MPLAYER update (Win32, tests needed) - RayeR, 25.12.2009, 01:08
- MPLAYER update (Win32, tests needed) - Deniska, 23.12.2009, 00:23
- !!! IT WORKS AGAIN !!! - DOS386, 21.12.2009, 07:55
- MPLAYER update (Win32, tests needed) - ron, 21.12.2009, 03:08
- MPLAYER update (Win32, tests needed) - Rugxulo, 22.12.2009, 10:19
- MPLAYER update (Win32, tests needed) - mht, 22.12.2009, 16:58
- MPLAYER update (Win32, NOT fixed) - DOS386, 22.12.2009, 17:19
- x86 code optimization - Rugxulo, 22.12.2009, 19:16
- x86 code optimization - DOS386, 23.12.2009, 09:05
- x86 code optimization - marcov, 23.12.2009, 22:38
- x86 code optimization - DOS386, 23.12.2009, 09:05
- MPLAYER update (Win32, NOW IS fixed) - DOS386, 19.03.2010, 06:26
- MPLAYER 31139 - DOS386, 13.05.2010, 03:49
- MPLAYER 31170 - DOS386, 06.06.2010, 15:48
- MPLAYER 31139 - DOS386, 13.05.2010, 03:49
- x86 code optimization - Rugxulo, 22.12.2009, 19:16
- MPLAYER update (Win32, tests needed) - Khusraw, 22.12.2009, 18:41
- INC/DEC speed - Rugxulo, 22.12.2009, 19:01
- INC/DEC speed - Khusraw, 24.12.2009, 13:37
- INC/DEC speed - Rugxulo, 25.12.2009, 01:00
- INC/DEC speed - Khusraw, 24.12.2009, 13:37
- INC/DEC speed - Rugxulo, 22.12.2009, 19:01
Mix view