Ticket #36003

floating point parsing is locale-dependent

Open Date: 2016-01-30 23:44 Last Update: 2018-10-06 18:16

Reporter:
Owner:
Type:
Status:
Open [Owner assigned]
Component:
MileStone:
(None)
Priority:
3
Severity:
5 - Medium
Resolution:
None
Vote
Score: 0
No votes
0.0% (0/0)
0.0% (0/0)

Details

The parsing of the arithmetic floating point in yash depends on the user's locale. Only the current locale's floating point character is accepted. That means the arithmetic grammar of scripts using floating point depends on the end user's current locale settings, which kills portability and seems clearly undesirable. The only way to make such a script portable is to unset LC_ALL and set LC_NUMERIC to POSIX.

unset -v LC_ALL
x=1.24 y=1.74
LC_NUMERIC=POSIX        # floating point is '.'
z=$((x+y))
echo "OK1: $z"
LC_NUMERIC=nl_NL.UTF-8  # floating point is ','
z=$((x+y))
echo "OK2: $z"

Output:

OK1: 2.98
yash: arithmetic: `1.24' is not a valid number
yash: arithmetic: `1.74' is not a valid number

Attachment File List

No attachments

Ticket History (3/8 Histories)

2016-01-30 23:44 Updated by: mcdutchie
  • New Ticket "floating point parsing is locale-dependent" created
2016-01-31 17:10 Updated by: magicant
Comment

I did some research to compare behavior with other shells.

The below table shows whether string conversion is locale-Dependent or Independent in different situations:

-POSIX.1-2008yash 2.40zsh 5.1.1ksh 93u+ 2012-08-01
parsing literals in arithmetic expansionsN/AIID
parsing variable values in arithmetic expansionsN/ADID
printing results of arithmetic expansionsN/ADID
parsing operands of the printf built-inDDbothD
printing results of the printf built-inDDDD
2016-02-03 22:28 Updated by: magicant
Comment

Locale-dependency of floating-point conversion was once considered in #19731. Since then, interpretation of floating-point literals which directly appear in arithmetic expansion has been locale-independent. However, interpretation of floating-point values in a variable was kept locale-dependent to maintain interoperability with the printf built-in.

Consider passing the result of arithmetic expansion as an operand to the printf built-in. As required by POSIX.1-2008, the built-in interprets the operand locale-dependenly, and as such, the result of arithmetic expansion must have been formatted locale-dependently. Also consider assigning the result of arithmetic expansion to a variable and using the variable in another arithmetic expansion. Since a locale-dependently formatted value is assigned to the variable, the shell needs to interpret the variable value locale-dependently when evaluating the arithmetic expression.

That said, I have to admit that this does not work if you change the locale during execution of a shell script. One possible solution might be to provide a safe way to convert between locale-independent and -dependent format of floating-point values.

2016-02-12 00:09 Updated by: magicant
Comment

I don't come up with a good idea yet how to offer conversion from locale-dependent format to locale-independent format. AFAIK no other shell supports such conversion. Should yash have a special method for conversion which is incompatible with other shells?

2016-02-13 08:39 Updated by: None
Comment

Here are my thoughts on the matter.

I think conversion from locale-dependent to locale-independent format is wrought with problems and I don't think you should try it.

I understand your argument about interoperability with the printf built-in, but I don't think that's worth the price either. Currently, if a yash script executed in France or the Netherlands produces floating point output for parsing by another instance of that script run in Japan or America, it will fail to parse unless locale settings are altered.

Instead, I think yash should act like zsh -- in my opinion, it strikes the best balance. Internally, everything float is done as if the POSIX/C locale were active, including producing results of arithmetic expansion. Only printf produces locale-dependent output. If printf output needs to be interoperable with the shell, that's easy: it should simply be invoked as LC_ALL=C printf. Locale-dependent output should not be used as input.

2016-02-13 08:41 Updated by: mcdutchie
Comment

I wrote the preceding anonymous comment. Forgot to log in, sorry.

2016-02-13 23:19 Updated by: magicant
Comment

Instead, I think yash should act like zsh

In that plan, we need to consider the possibility for the printf built-in to misinterpret floating-point arguments. Zsh's printf built-in first interprets the argument locale-dependently, and only if it fails, it tries to interpret it as an arithmetic expansion. If the C-locale representation of a number could be interpreted as a different number in the current locale, the printf built-in would print the wrong number.

Specifically, in some locals like da_DK, periods are used to group digits into 3-digit components. For example, the number 12.000 is considred not as twelve but as twelve thousand. This does not work well with C-locale representation of numbers where the period is a decimal point.

2018-10-06 18:16 Updated by: magicant
  • Priority Update from 5 - Medium to 3
  • Details Updated

Edit

You are not logged in. I you are not logged in, your comment will be treated as an anonymous post. » Login