Back to home page

DOS ain't dead

Forum index page

Log in | Register

Back to the forum
Board view  Mix view

Heatshrink compressed drives? - Tamp ISO compression test (Announce)

posted by mceric, Germany, 06.02.2025, 02:42

> Data news, everybody :-)
>
> I made a little histogram counter for CDH files and ran it on some ISOs I
> had around, after "tampisoing" them. Enjoy, or something ;-)

In case you were wondering about the RLE outcomes: I computed some ESTIMATES for how much you could save using a cleverly chosen set of command bytes each of which can express "repeat previous byte/word/dword N times, then copy next M bytes as-is, then read next command byte" for popular values of N and M.

I assumed that the command bytes have to be command words if N is large, but that they would always be either bytes or words. Just RLE, no copying of data from further back etc.

Classic compressors typically used Lempel-Ziv and similar algorithms and command bit strings of variable length, frequently used values expressed as shorter constants (Huffman coding), drawing from a pipeline of command bits which get refilled 1 byte at a time. Commands usually meant something like "copy N upcoming bytes as-is", "repeat N bytes M times", "copy N bytes from M bytes ago (and possibly: then copy X bytes as-is)".

>
> File fdbasecd_2007-09-06.cdh compressed 8333312 to 6699576 bytes
> 4069 sectors, 19.6% saved, 80.4% remaining

This had 5% empty sectors and 70% non-compressible ones. Byte-RLE: min. 85%

> File fdbootcd_0.9.BETA.cdh compressed 10291200 to 8758719 bytes
> 5025 sectors, 14.9% saved, 85.1% remaining

Circa 4% empty and 77% non-compressible sectors. Byte-RLE: min. 88%

> File fdbootcd_0.9rc5.BETA.cdh compressed 11599872 to 9563353 bytes
> 5664 sectors, 17.6% saved, 82.4% remaining

Circa 4% empty and 71% non-compressible sectors. Byte-RLE: min. 86%

> File fdoslite_0.9pre.cdh compressed 36026368 to 22294142 bytes
> 17591 sectors, 38.1% saved, 61.9% remaining

Circa 2% empty and 18% non-compressible sectors. Byte-RLE: min. 82%

> File freedos_1.0_fdfullcd.cdh compressed 160184320 to 147371512 bytes
> 78215 sectors, 8.0% saved, 92.0% remaining

Circa 1% empty and 84% non-compressible sectors. Byte-RLE: min. 95%

> File kramers_nederlandse_taal.cdh compressed 147781632 to 73337302 bytes
> 72159 sectors, 50.4% saved, 49.6% remaining

Circa 1% empty, but only 7% non-compressible sectors. Byte-RLE: min. 92%
Many sectors TAMP-compress a lot. RLE works surprisingly bad for this.

> File hp_windows_print_drivers.cdh compressed 478834688 to 430721071 bytes
> 233806 sectors, 10.0% saved, 90.0% remaining

Circa 1% empty and 76% non-compressible sectors. Byte-RLE: min. 95%

> File logox_3.5.cdh compressed 514738176 to 476102193 bytes
> 251337 sectors, 7.5% saved, 92.5% remaining

Circa 1% empty and 70% non-compressible sectors. Byte-RLE: min. 89%, which is clearly too optimistic. See above.

> File encyclopedie.cdh compressed 580093952 to 517457387 bytes
> 283249 sectors, 10.8% saved, 89.2% remaining

Circa 1% empty and 77% non-compressible sectors. WORD (!) RLE: min. 93%
Here, word-RLE is predicted to save 20% more space than byte-RLE.

> File thesis.cdh compressed 735221760 to 513439278 bytes
> 358995 sectors, 30.2% saved, 69.8% remaining

Circa 1% empty and 31% non-compressible sectors, many in between. WORD-RLE: min. 82%, predicted to save twice as much space than byte-RLE.

>


Conclusion: In some cases, where ISO contain mostly already compressed data mixed with some empty areas such as empty or only partially filled sectors, RLE might be useful. In specific cases, RLE compression might work better when using words instead of bytes as the repeatable units.

Of course, that is only a very rough estimate, because I optimistically assumed that the repeat-postfix bytes magically double as copy-as-is prefix bytes. In reality, you often need more than 1 byte for that.

TAMP always works a lot better than simple RLE schemes, so it is worth the extra computation and complexity. Still, most test candidates above even TAMP-compress to 80-93% of their original size, so the use of (sector-wise) compressed ISOs only has limited use for those. In particular, installer ISOs do not compress well.

PS: https://en.wikipedia.org/wiki/842_(compression_algorithm) is yet another LZ variant - for fast RAM compression, while https://en.wikipedia.org/wiki/Snappy_(compression) is a fast LZ style algorithm without bitstrings.

---
FreeDOS / DOSEMU2 / ...

 

Complete thread:

Back to the forum
Board view  Mix view
22256 Postings in 2057 Threads, 398 registered users, 93 users online (1 registered, 92 guests)
DOS ain't dead | Admin contact
RSS Feed
powered by my little forum