Ticket #43841

add svarlang.lib support

Open Date: 2022-02-12 18:41 Last Update: 2022-02-18 02:27

Reporter:
Owner:
Status:
Closed
Component:
MileStone:
(None)
Priority:
5 - Medium
Severity:
5 - Medium
Resolution:
Fixed
File:
None

Details

I think, the ticket title says it already.

Ticket History (3/12 Histories)

2022-02-12 18:41 Updated by: bttr
  • New Ticket "add svarlang.lib support" created
2022-02-12 18:56 Updated by: mateuszviste
Comment

pkgnet user-facing messages is only a help screen and a couple of error messages. Most messages come from the repository server, meaning that the PHP repo server should also support multiple languages and be informed by pkgnet what is the current user language to output messages in the proper language.

And then there are packages descriptions - these would need to be translated into multiple languages and kept in the lsm files, like:


version: 20220208

description: pulls packages and updates from the internet SvarDOS repository

description.pl: ściąga pakiety i aktualizacje z internetowego repozytorium SvarDOS

description.ru: извлекает пакеты и обновления из интернет-репозитория SvarDOS

description.fr: télécharge les paquets et actualisations depuis le dépôt internet SvarDOS


not to say it is impossible of course, just that making pkgnet multi-lang friendly is a much wider scope than just svarlang.lib support (although svarlang.lib would be a nice first "baby step").

2022-02-12 19:26 Updated by: bttr
Comment

Reply To mateuszviste

pkgnet user-facing messages is only a help screen and a couple of error messages. Most messages come from the repository server, meaning that the PHP repo server should also support multiple languages and be informed by pkgnet what is the current user language to output messages in the proper language.

I see.

And then there are packages descriptions - these would need to be translated into multiple languages and kept in the lsm files, like: ---- version: 20220208 description: pulls packages and updates from the internet SvarDOS repository description.pl: ściąga pakiety i aktualizacje z internetowego repozytorium SvarDOS description.ru: извлекает пакеты и обновления из интернет-репозитория SvarDOS description.fr: télécharge les paquets et actualisations depuis le dépôt internet SvarDOS ----

And that lsm file would be save in UTF-8 encoding?

Don't you like the FreeDOS approach to have a separate lsm file for every language?

not to say it is impossible of course, just that making pkgnet multi-lang friendly is a much wider scope than just svarlang.lib support (although svarlang.lib would be a nice first "baby step").

I bet, you never heard before, but even the longest journey starts with a first step. ;-)

2022-02-12 21:52 Updated by: mateuszviste
Comment

Reply To bttr

And that lsm file would be save in UTF-8 encoding?

I was rather thinking about a messy "multi-codepage" file, where each description line would potentially be in a different codepage.

Don't you like the FreeDOS approach to have a separate lsm file for every language?

To be honest I didn't know they do that now. If that's so, then indeed I don't like it very much, as it means:

- more room for inconsistencies (you need to make sure to keep the version string synced across all lsms)

- more space taken on user's disk

the multiple-descriptions-in-one-file approach has the advantage that it is completely free for the user. It won't take more disk space since the user already has a cluster allocated for the lsm file.

I bet, you never heard before, but even the longest journey starts with a first step. ;-)

I'm fine with that. I just don't like going blindly into a journey without knowing where it ends and what it entails, hence why I felt appropriate to mention the non-svarlang aspects of translating the pkgnet interface. svarlang.lib is trivial to add, but it won't make much difference to the user, who will keep seeing en-only messages when querying pkgnet.

2022-02-12 22:52 Updated by: bttr
Comment

Reply To mateuszviste

Reply To bttr

And that lsm file would be save in UTF-8 encoding?

I was rather thinking about a messy "multi-codepage" file, where each description line would potentially be in a different codepage.

Then hopefully no translator or semi-intelligent text editor will mangle and render the content useless. When we come to the point creating first multi-codepage lsm files, it's probably a good idea to give some advise to translator, what (type of) text editors work fine. Probably translators will use other OSes than DOS.

Don't you like the FreeDOS approach to have a separate lsm file for every language?

To be honest I didn't know they do that now. If that's so, then indeed I don't like it very much, as it means: - more room for inconsistencies (you need to make sure to keep the version string synced across all lsms)

That won't happen, because the version string is only present in the "main" (English) lsm file. Look at, e.g., https://gitlab.com/FreeDOS/base/append/-/tree/master/APPINFO or https://gitlab.com/FreeDOS/base/ambhelp/-/tree/master/APPINFO

- more space taken on user's disk

Indeed.

the multiple-descriptions-in-one-file approach has the advantage that it is completely free for the user. It won't take more disk space since the user already has a cluster allocated for the lsm file.

Don't forget about the different code pages for a single language. In FreeDOS there's a line like, e.g., "Language: FR, 850" in French lsm files. For German it doesn't really matter, if you use 437, 850, or 858, because all letters a-zA-ZäöüÄÖÜß are at the same position for these three code pages. Not sure for other languages, but I guess, you can answer that at least for Polish and French.

If that's required, your example would change to "description.fr.858: télécharge les paquets et actualisations depuis le dépôt internet SvarDOS"

I bet, you never heard before, but even the longest journey starts with a first step. ;-)

I'm fine with that. I just don't like going blindly into a journey without knowing where it ends and what it entails, hence why I felt appropriate to mention the non-svarlang aspects of translating the pkgnet interface. svarlang.lib is trivial to add, but it won't make much difference to the user, who will keep seeing en-only messages when querying pkgnet.

Thanks for sharing your insights. :-)

2022-02-12 23:20 Updated by: mateuszviste
Comment

Reply To bttr

Then hopefully no translator or semi-intelligent text editor will mangle and render the content useless. When we come to the point creating first multi-codepage lsm files, it's probably a good idea to give some advise to translator, what (type of) text editors work fine. Probably translators will use other OSes than DOS.

Is it really a risk? I might be wrong, but I think that any 8-bit editor should be fine, since it does not care about codepage anyway, it just displays bytes. It means of course that only one or two lines will be rendered properly (ie. the ones that match the codepage currently used), the others will be displayed with weird chars inside, but the editor should not break it as long as the translator does not try "fixing" other languages...

If someone tries editing such lsm file with an utf-8 editor then it's obviously a different story and everything will break.

That won't happen, because the version string is only present in the "main" (English) lsm file. Look at, e.g., https://gitlab.com/FreeDOS/base/append/-/tree/master/APPINFO or https://gitlab.com/FreeDOS/base/ambhelp/-/tree/master/APPINFO

I see... It becomes more and more complex. :-/

- more space taken on user's disk

Indeed.

What I find worrying is that the space will increase with the amount of supported languages: if we support 15 languages one day, then the cluster waste starts to be truly significant. It's actually one of the reasons that pushed me into developing SvarLANG, so all translations are kept in a single file for each package (and the file format is also much easier/faster to parse, and there is a couple of other advantages, too).

2022-02-12 23:44 Updated by: bttr
Comment

Reply To mateuszviste

Reply To bttr

Then hopefully no translator or semi-intelligent text editor will mangle and render the content useless. When we come to the point creating first multi-codepage lsm files, it's probably a good idea to give some advise to translator, what (type of) text editors work fine. Probably translators will use other OSes than DOS.

Is it really a risk? I might be wrong, but I think that any 8-bit editor should be fine, since it does not care about codepage anyway, it just displays bytes. It means of course that only one or two lines will be rendered properly (ie. the ones that match the codepage currently used), the others will be displayed with weird chars inside, but the editor should not break it as long as the translator does not try "fixing" other languages...

I agree, that this should work, but people new to SvarDOS or SvarDOS translations might feel, there could be some risk. So, take that feeling away by explicitly saying "It is recommended to edit translations directly on a running SvarDOS instance. You can use editor foo, bar, or foo bar, but not Blocek or Mined.", or something similar.

If someone tries editing such lsm file with an utf-8 editor then it's obviously a different story and everything will break.

UTF-8 editors are very common on newer OSes, you know? ;-)

That won't happen, because the version string is only present in the "main" (English) lsm file. Look at, e.g., https://gitlab.com/FreeDOS/base/append/-/tree/master/APPINFO or https://gitlab.com/FreeDOS/base/ambhelp/-/tree/master/APPINFO

I see... It becomes more and more complex. :-/

What did you expect? i8n IS complex.

- more space taken on user's disk

Indeed.

What I find worrying is that the space will increase with the amount of supported languages: if we support 15 languages one day, then the cluster waste starts to be truly significant. It's actually one of the reasons that pushed me into developing SvarLANG, so all translations are kept in a single file for each package (and the file format is also much easier/faster to parse, and there is a couple of other advantages, too).

I see.

2022-02-17 01:19 Updated by: mateuszviste
Comment

Reply To bttr

If someone tries editing such lsm file with an utf-8 editor then it's obviously a different story and everything will break.

UTF-8 editors are very common on newer OSes, you know? ;-)

I am aware. Yet there are people that still find a way to commit 8-bit ANSI stuff to svn, so it would appear the 8-bit age is not entirely gone yet.

What did you expect? i8n IS complex.

You mean i18n probably. We do not do i18n, nor even l10n... it's all just about translating strings. The little hoop to jump over is the codepage stuff.

Anyway, status for today:

- pkgnet messages can be translated now (available currently in EN, PL and DE).

- the repo server is also able to produce localized messages (EN and PL only for now).

what's left is the description of packages. I don't like the FreeDOS approach with multiple LSM files with one file per language and one extra "master file" with version and stuff. I'd definitely see the LSM file containing all descriptions in available languages. This comes for free to the user, as all the LSM content will fit in a single cluster. Three ways possible to approach this I think:

1. utf-8 encoding of the LSM file

2. multi-codepage LSM file (each description potentially in a different codepage)

3. some hybrid approach, eg. "all stored in utf-8 in svn, then mass-converted to multi-codepage files in distribuable packages"

and then an alternative "in-between" approach:

4. LSM is en-only, but online descriptions (displayed by pkgnet) are multi-lang because the repo API has some source of translated descriptions

Option 3 is quite messy and requires fiddling with all zip packages all the time, so not an option I am very fond of.

Option 2 is nice only because it allows to present a local description to the user on the installed system. I am not sure this is a valuable advantage. Other than that, this approach is complex as it requires to process non-standard multi-encoding files.

Option 1 might be the most reasonable on the long term. Esp. since the descriptions are not used after installation anyway, so it does not matter that it's encoded in a DOS-incompatible way. But it does mean repackaging all packages, and repackaging them again and again each time a translation is added or modified.

Option 4 has the advantage that we don't touch existing packages, only provide a little flat file with translated (utf-8) descriptions that would be displayed through pkgnet queries. I think this option is a very good start to explore how it works out, and then maybe in the future integrate it somehow tighter right into packets.

2022-02-17 02:21 Updated by: mateuszviste
Comment

package descriptions can be translated now, and put into a json file in website/repo. I added polish translations for bsum, dosmid and pkgnet.

2022-02-17 02:53 Updated by: bttr
Comment

Reply To mateuszviste

Reply To bttr

If someone tries editing such lsm file with an utf-8 editor then it's obviously a different story and everything will break.

UTF-8 editors are very common on newer OSes, you know? ;-)

I am aware. Yet there are people that still find a way to commit 8-bit ANSI stuff to svn, so it would appear the 8-bit age is not entirely gone yet.

"to commit 8-bit ANSI stuff to svn" -> Yeah, that's me. ;-)

What did you expect? i8n IS complex.

You mean i18n probably. We do not do i18n, nor even l10n... it's all just about translating strings. The little hoop to jump over is the codepage stuff.

Yes, i18n. But okay, let's call it just NLS again.

- the repo server is also able to produce localized messages (EN and PL only for now).

Nice.

what's left is the description of packages. I don't like the FreeDOS approach with multiple LSM files with one file per language and one extra "master file" with version and stuff. I'd definitely see the LSM file containing all descriptions in available languages. This comes for free to the user, as all the LSM content will fit in a single cluster. Three ways possible to approach this I think:

1. utf-8 encoding of the LSM file

2. multi-codepage LSM file (each description potentially in a different codepage)

3. some hybrid approach, eg. "all stored in utf-8 in svn, then mass-converted to multi-codepage files in distribuable packages"

and then an alternative "in-between" approach:

4. LSM is en-only, but online descriptions (displayed by pkgnet) are multi-lang because the repo API has some source of translated descriptions

Option 3 is quite messy and requires fiddling with all zip packages all the time, so not an option I am very fond of.

Option 2 is nice only because it allows to present a local description to the user on the installed system. I am not sure this is a valuable advantage. Other than that, this approach is complex as it requires to process non-standard multi-encoding files.

Option 1 might be the most reasonable on the long term. Esp. since the descriptions are not used after installation anyway, so it does not matter that it's encoded in a DOS-incompatible way. But it does mean repackaging all packages, and repackaging them again and again each time a translation is added or modified.

Option 4 has the advantage that we don't touch existing packages, only provide a little flat file with translated (utf-8) descriptions that would be displayed through pkgnet queries. I think this option is a very good start to explore how it works out, and then maybe in the future integrate it somehow tighter right into packets.

Yes, option 4 seems to be a good start. (OSDN's quoting function is a little awkward, I think.)

2022-02-17 19:34 Updated by: mateuszviste
  • Status Update from Open to Closed
2022-02-18 02:27 Updated by: bttr
  • Resolution Update from None to Fixed

Attachment File List

No attachments

Edit

You are not logged in. I you are not logged in, your comment will be treated as an anonymous post. » Login