mbbrutman
Washington, USA, 27.01.2023, 02:46 |
mTCP: A Unicode enabled IRCjr is available for testing (Announce) |
http://www.brutman.com/mTCP/download/ircjr_2023-01-26.zip
There is a detailed readme.txt file in the zip. Here is the short version:
* This version supports UTF-8 encoding and decoding.
* The mapping from Unicode to your local character set is defined in a file. I've included a mapping for CP437.
* Enable it by setting an environment variable.
If it works correctly, you should be able to use Unicode in messages and on channel topics.
If you are not using CP437 or are using a non-US layout keyboard you need to set your codepage and keyboard layout as you usually do, but also create your own mapping file from Unicode to your code page. I'd like to include more mapping files but I wanted to get some early feedback first.
Thanks,
Mike |
SuperIlu
Berlin, Germany, 27.01.2023, 06:18
@ mbbrutman
|
mTCP: A Unicode enabled IRCjr is available for testing |
> If you are not using CP437 or are using a non-US layout keyboard you need
> to set your codepage and keyboard layout as you usually do, but also create
> your own mapping file from Unicode to your code page. I'd like to include
> more mapping files but I wanted to get some early feedback first.
Damn, nice idea. I'd love to have something similar for DOStodon... --- Javascript on MS-DOS? Try DOjS https://github.com/SuperIlu/DOjS
Fediverse: @dec_hl@mastodon.social |
boeckmann
Aachen, Germany, 27.01.2023, 23:15
@ mbbrutman
|
mTCP: A Unicode enabled IRCjr is available for testing |
> http://www.brutman.com/mTCP/download/ircjr_2023-01-26.zip
>
> There is a detailed readme.txt file in the zip. Here is the short
> version:
>
> * This version supports UTF-8 encoding and decoding.
> * The mapping from Unicode to your local character set is defined in a
> file. I've included a mapping for CP437.
> * Enable it by setting an environment variable.
>
> If it works correctly, you should be able to use Unicode in messages and on
> channel topics.
>
> If you are not using CP437 or are using a non-US layout keyboard you need
> to set your codepage and keyboard layout as you usually do, but also create
> your own mapping file from Unicode to your code page. I'd like to include
> more mapping files but I wanted to get some early feedback first.
>
> Thanks,
> Mike
Very nice. One remark: did you consider mapping Unicode characters to multiple code page characters, for something like, don't hit me, emojis |
mbbrutman
Washington, USA, 28.01.2023, 02:03
@ SuperIlu
|
mTCP: A Unicode enabled IRCjr is available for testing |
> Damn, nice idea. I'd love to have something similar for DOStodon...
It has some nice advantages. Best of all, it allows the user to customize the table and decide how strict or relaxed they want to be with the substitutions.
The mappings are loaded into a hash table. I put a bit of effort into designing a reasonably good hash function so that the lookup time is fairly well bounded. I've seen people do this poorly with a linear table scan, and it showed at runtime. |
mbbrutman
Washington, USA, 28.01.2023, 02:05
@ boeckmann
|
mTCP: A Unicode enabled IRCjr is available for testing |
> Very nice. One remark: did you consider mapping Unicode characters to
> multiple code page characters, for something like, don't hit me, emojis
I honestly didn't think about it. It's possible - just expand the mapping table a little bit.
I think I'll wait on that feature. ;-0 --- mTCP - TCP/IP apps for vintage DOS machines!
http://www.brutman.com/mTCP |
SuperIlu
Berlin, Germany, 28.01.2023, 10:11
@ mbbrutman
|
mTCP: A Unicode enabled IRCjr is available for testing |
> It has some nice advantages. Best of all, it allows the user to customize
> the table and decide how strict or relaxed they want to be with the
> substitutions.
>
> The mappings are loaded into a hash table. I put a bit of effort into
> designing a reasonably good hash function so that the lookup time is fairly
> well bounded. I've seen people do this poorly with a linear table scan,
> and it showed at runtime.
Would you be willing to put the unicode-mapping into a dedicated library under an open source license?
I guess this would get contributions if you put that on GitHub... --- Javascript on MS-DOS? Try DOjS https://github.com/SuperIlu/DOjS
Fediverse: @dec_hl@mastodon.social |
mbbrutman
Washington, USA, 28.01.2023, 17:28
@ SuperIlu
|
mTCP: A Unicode enabled IRCjr is available for testing |
Almost everything I write gets released as open source. The Unicode library is not a big deal .. it's not sophisticated code. It will be part of the next mTCP, assuming I get feedback that it works for other people.
I'm not a big fan of Github and I've never posted code there. I'm really not a fan now that they are using everybody's code to train an algorithm to spit it back without attribution. |
mbbrutman
Washington, USA, 29.01.2023, 22:20 (edited by mbbrutman, 30.01.2023, 05:53)
@ mbbrutman
|
mTCP: A Unicode enabled IRCjr is available for testing |
I've added a Unicode enabled Telnet to the same file. Just set the environment variable to the mapping file and it will interpret Unicode sent via UTF-8. Sending Unicode also works but I have not implemented a "compose" sequence for arbitrary Unicode code points so you are limited to what your keyboard can produce.
[Edit] I've added a "Compose" mode for Unicode to Telnet. Use Alt-Minus (the '-' key), then a four digit hex code for the Unicode code point you want to send. Detailed instructions are in the readme in the text file. |
SuperIlu
Berlin, Germany, 30.01.2023, 18:35
@ mbbrutman
|
mTCP: A Unicode enabled IRCjr is available for testing |
> I've added a Unicode enabled Telnet to the same file. Just set the
> [...]
Nice!
I can totally understand that you are skeptical regarding GitHub. I was sucked into their infrastructure years ago and if I had to look for a home for my projects today I most probably would chose differently.
When do you want to release the source? Your ZIP seems to contain only EXEs so far.
Also: If I decide to use/include your work into DOjS, would you mind if I put the source on GitHub?
Cheers
Ilu --- Javascript on MS-DOS? Try DOjS https://github.com/SuperIlu/DOjS
Fediverse: @dec_hl@mastodon.social |
mbbrutman
Washington, USA, 30.01.2023, 21:24
@ SuperIlu
|
mTCP: A Unicode enabled IRCjr is available for testing |
These are test programs. When (if?) I get some feedback about how well they work I'll decide about making the changes formal. Source code will be included with the next mTCP release, assuming people find these features useful.
I can't stop people from posting my code on GitHub. It's been done already. I don't really have an opinion on it, as long as people know the source of the code and where to get it. |
tom
Germany (West), 31.01.2023, 15:48
@ mbbrutman
|
mTCP: A Unicode enabled IRCjr is available for testing |
> If you are not using CP437 or are using a non-US layout keyboard you need
> to set your codepage and keyboard layout as you usually do, but also create
> your own mapping file from Unicode to your code page. I'd like to include
> more mapping files but I wanted to get some early feedback first.
you may want to include the mapping files from DOSLFN http://adoxa.altervista.org/doslfn/ which has about 25 cpxyzuni.tbl mapping files |
mbbrutman
Washington, USA, 04.02.2023, 20:25
@ tom
|
mTCP: A Unicode enabled IRCjr is available for testing |
> > If you are not using CP437 or are using a non-US layout keyboard you
> need
> > to set your codepage and keyboard layout as you usually do, but also
> create
> > your own mapping file from Unicode to your code page. I'd like to
> include
> > more mapping files but I wanted to get some early feedback first.
>
> you may want to include the mapping files from DOSLFN
> http://adoxa.altervista.org/doslfn/
> which has about 25 cpxyzuni.tbl mapping files
Tom - thanks for the tip. I'll have a look and see if I can borrow (and credit!) some data to generate a few of the more common mappings. CP850 is my next target. |
Laaca
Czech republic, 05.02.2023, 14:19
@ mbbrutman
|
mTCP: A Unicode enabled IRCjr is available for testing |
Yes,please! I also very much advocate using the DOSLFN mapping files. It is simple, very usable and it not good for anyone to introduce a new formats for unicode mappings. --- DOS-u-akbar! |
mbbrutman
Washington, USA, 05.02.2023, 18:10
@ Laaca
|
mTCP: A Unicode enabled IRCjr is available for testing |
> Yes,please! I also very much advocate using the DOSLFN mapping files. It is
> simple, very usable and it not good for anyone to introduce a new formats
> for unicode mappings.
I took a look and I read the code that generates the files to figure out the file format.
Unfortunately, I'm looking for something very different with Telnet and IRC. Strict mappings from the code page characters to Unicode are published, and those table files require a strict mapping. I need something more relaxed ... a lot of Unicode code points have reasonably close substitutes available. For example, there are at least two Unicode "black diamond" characters that I know of, with just slightly different shapes. I map both of those to the character 0x04, which is a diamond, and that's close enough for display purposes. There are a lot of variations of line drawing characters with different line weights that can be represented by the standard line drawing characters, so I map those too. I think for 128 different possible code points I have over 300 Unicode characters mapping to them.
Technically what I'm doing is not correct, but I'd rather see a black diamond or a line drawing character of some sort rather than the standard "I can't display this glyph" tofu character. Which is also why I went with a text file to allow users to define their own mappings; they can be as strict or as sloppy as they want.
I can use the published tables as a starting point, but I suspect for display purposes people will want to see the additional mappings. --- mTCP - TCP/IP apps for vintage DOS machines!
http://www.brutman.com/mTCP |
tom
Germany (West), 06.02.2023, 17:01
@ mbbrutman
|
mTCP: A Unicode enabled IRCjr is available for testing |
>
> Technically what I'm doing is not correct, but I'd rather see a black
> diamond or a line drawing character of some sort rather than the standard
> "I can't display this glyph" tofu character.
I absolutely agree with that.
>
> I can use the published tables as a starting point, but I suspect for
> display purposes people will want to see the additional mappings.
take the tables as a starting point. additionally, the line drawing character groups should be the same for all codepages; not necessarily the byte value, but the group is the same.
actually, I would have thought that these tables are at many places all over the internet as the problem isn't soo original. unfortunately, my google foo lost me completely. |
bretjohn
Rio Rancho, NM, 06.02.2023, 17:23
@ mbbrutman
|
mTCP: A Unicode enabled IRCjr is available for testing |
> Unfortunately, I'm looking for something very different with Telnet and
> IRC. Strict mappings from the code page characters to Unicode are
> published, and those table files require a strict mapping. I need
> something more relaxed ... a lot of Unicode code points have reasonably
> close substitutes available. For example, there are at least two Unicode
> "black diamond" characters that I know of, with just slightly different
> shapes. I map both of those to the character 0x04, which is a diamond, and
> that's close enough for display purposes. There are a lot of variations of
> line drawing characters with different line weights that can be represented
> by the standard line drawing characters, so I map those too. I think for
> 128 different possible code points I have over 300 Unicode characters
> mapping to them.
>
> Technically what I'm doing is not correct, but I'd rather see a black
> diamond or a line drawing character of some sort rather than the standard
> "I can't display this glyph" tofu character. Which is also why I went with
> a text file to allow users to define their own mappings; they can be as
> strict or as sloppy as they want.
>
> I can use the published tables as a starting point, but I suspect for
> display purposes people will want to see the additional mappings.
FWIW, I have a similar philosophy. I created a program called UNI2ASCI and it's included as part of my USB drivers. The strings downloaded from USB devices are stored as UniCode and I wanted a way to display the strings even if they weren't "legitimate" ASCII. It took me quite a while to do, but I scanned through all the UniCode characters and mapped as many as I could reasonably do into _something_ that could be displayed on a "normal" DOS screen. For example, I found 20 different UniCode characters that (in my opinion) looked enough like a "2" that I mapped them that way. There are also some UniCode characters that I map as multiple ASCII characters (e.g., I map the Copyright symbol as "(C)"). I treat UNI2ASCI sort of as a "subroutine" that the other USB programs call when they want to display a UniCode string.
I see two major differences between what I did with UNI2ASCI and what I think you're trying to do, though. The first is that UNI2ASCI currently only supports Code Page 437. It takes a LOT of work to do a fairly "complete" UniCode mapping of the upper half of a Code Page, so I only did one. The other major difference I see is that I did not map any of the control characters (ASCII < 32) like the diamond character you mention. While those characters _can_ be displayed on the screen, when you try to bring them from the screen into a file or printer them or send them across a serial link you can have all kinds of problems.
You can download the source code for UNI2ASCI (it's included in the USB Source Code) from my web site:
http://brejohnson.us
It's in A86 format. Eventually, in my copious spare time I hope to convert all my programs to NASM format in addition to updating them with various items. |