Back to home page

DOS ain't dead

Forum index page

Log in | Register

Back to the forum
Board view  Mix view

International keyboard support (Developers)

posted by bretjohn Homepage E-mail, Rio Rancho, NM, 14.02.2023, 05:05

> Hello bretjohn,
>
> > Creating a custom keyboard layout is not a big deal, but having it
> > automatically be able to notify other programs that may need to know
> what
> > it's doing can be a very big deal.
>
> I think this is not quite relevant for mbbrutman's particular problem(s).
> Namely, we want to know how to map, say, some 8-bit ASCII character value
> such as 0x8a to, say, "á". We do not need to know which
> particular key positions the 0x8a came from, but we do need to
> somehow figure out that 0x8a maps to Unicode
> U+00e1 "á". Then the program can display a "á" and
> save it as U+00e1 (maybe in UTF-8 form) in a text file.

The general problem with this is that there is not a one-to-one map between a DOS Code Page and UniCode.

For example, if your input is UniCode and you're trying to display what you receive in DOS, there are multiple issues. One is that there may be some characters that simply can't be displayed because the UniCode characters don't exist in the current Code Page. In that case, what do you put on the screen since you can't display a "legitimate" character? What I did in my UNI2ASCI program is try to display _something_ on the screen that somewhat resembles the UniCode character (received from a USB device) even though it may not be the "correct" character to display. In UNI2ASCI, if I can't display it I just write the UniCode number (something like "{U+092C}"). Based on what Michael wrote, I think he's trying to do basically the same thing (but I could be wrong).

The other problem when going from UniCode to DOS is that there are several UniCode characters that are effectively duplicates of each other (even though that's not supposed to happen in UniCode). For example, there are more than 20 UniCode characters that are classified as "spaces", with official names such as "No-Break Space", "Zero-Width Space", and "Three-Per-Em Space". I think those can all be _displayed_ as a "regular" space in DOS, though technically they probably shouldn't be because they have additional characteristics beside the fact that the "look like a space" (they have some "metachaaracteristics").

You have similar problems when going the other direction: converting s character from a DOS Code Page to UniCode. Again, we can talk about spaces. There are three DOS characters that "look like" spaces (ASCII 0 or NUL, ASCII 32 or a "normal" space, and ASCII 255 which is normally translated to UniCode as a No-Break Space or NBSP). Some DOS Code Pages also have additional characters that are displayed on the screen as a "space" (e.g., Code Page 869 which is used for Greek has several "space" characters). On a DOS Code Page, they all look exactly the same, but if you were to save them to UniCode which of the UniCode "spaces" should you use?

Now, imagine trying to go back-and-forth between a DOS Code Page and UniCode multiple times and just try to foresee how screwed up the characters can get if you're not 100% consistent in how you do the mapping in both directions, or if you do it differently than the next programmer. Or, trying to import/export something you saved as "UniCode" into another program that natively uses UniCode and actually understands the metacharacteriscs of the different spaces.

How you enter something even as simple as a space can make a difference in what UniCode character to save it as. For example, you probably should differentiate between a No-Break Space and a "regular" Space in a word-processor since it affects the output formatting.

I realize it seems like it should be pretty simple to do, but it's not, at least if you want to do it correctly and interact with other programs.

 

Complete thread:

Back to the forum
Board view  Mix view
22049 Postings in 2034 Threads, 396 registered users, 75 users online (1 registered, 74 guests)
DOS ain't dead | Admin contact
RSS Feed
powered by my little forum