bencollver 27.06.2024, 00:59 |
webdump 2024-05-23 (Announce) |
webdump is (yet another) HTML to plain-text converter tool. |
mbbrutman Washington, USA, 27.06.2024, 16:56 @ bencollver |
webdump 2024-05-23 |
Just curious - why does this need so much memory? |
Rugxulo Usono, 27.06.2024, 20:01 @ mbbrutman |
UnHTML |
> UnHTML (24k) removes HTML and some SGML from text files, leaving the file |
bencollver 28.06.2024, 04:34 @ Rugxulo |
UnHTML |
> > UnHTML (24k) removes HTML |
jadoxa Queensland, Australia, 28.06.2024, 08:43 (edited by jadoxa, 28.06.2024, 09:49) @ bencollver |
UnHTML |
> unhtml gave the following error. |
bencollver 30.06.2024, 16:07 @ jadoxa |
UnHTML |
> Logic error when it tests entities ( |
jadoxa Queensland, Australia, 01.07.2024, 02:05 @ bencollver |
UnHTML |
> > Logic error when it tests entities ( |
bencollver 14.07.2024, 01:30 @ jadoxa |
UnHTML |
> > > Logic error when it tests entities ( |
jadoxa Queensland, Australia, 14.07.2024, 03:22 @ bencollver |
UnHTML |
> I built unhtml with DJGPP and it ran without any error messages. However |
bencollver 14.07.2024, 05:16 (edited by bencollver, 14.07.2024, 05:32) @ jadoxa |
UnHTML |
> > I built unhtml with DJGPP and it ran without any error messages. |
Rugxulo Usono, 14.07.2024, 08:11 @ jadoxa |
DJGPP 2.03p2 (June 2002) |
> I built it with djgpp 2.03 (June 2002 refresh) |
bencollver 15.07.2024, 02:14 (edited by bencollver, 15.07.2024, 02:26) @ jadoxa |
UnHTML |
I found that the unhtml problem was the result of a buffer overflow. With more recent versions of GCC, the stack corruption changes the intitle variable to a garbage value, omitting all output after the overflow. |
jadoxa Queensland, Australia, 16.07.2024, 04:54 @ bencollver |
UnHTML |
> I found that the unhtml problem was the result of a buffer overflow. |
bencollver 16.07.2024, 16:12 @ jadoxa |
UnHTML |
> > I found that the unhtml problem was the result of a buffer overflow. |
bencollver 28.06.2024, 04:26 @ mbbrutman |
webdump 2024-05-23 |
Webdump is aware of the document structure. It parses and crawls the the document tree. During this process it allocates a bunch of memory and uses a lot of stack space. It might be more comparable to Mozilla readability than to unhtml. |
mbbrutman Washington, USA, 28.06.2024, 05:39 (edited by mbbrutman, 28.06.2024, 06:07) @ bencollver |
webdump 2024-05-23 |
> Webdump is aware of the document structure. It parses and crawls the the |
bencollver 28.06.2024, 17:09 @ mbbrutman |
webdump 2024-05-23 |
> Sorry, it just seems shocking that this can't be compiled for 16 bit |
bocke 30.06.2024, 00:13 @ bencollver |
webdump 2024-05-23 |
Just for the reference, you can also use Links web browser to dump a formated text version of the site. |
jadoxa Queensland, Australia, 30.06.2024, 03:00 @ bencollver |
webdump 2024-05-23 |
It's been updated to fix the split close tag issue. |
bencollver 30.06.2024, 16:08 @ jadoxa |
webdump 2024-05-23 |
> It's been updated to fix the split close tag issue. |