Ticket #47772

variable values depends on LC_CTYPE

Open Date: 2023-04-06 01:37 Last Update: 2023-09-11 00:06

Reporter:
Owner:
Type:
Status:
Closed
Component:
MileStone:
(None)
Priority:
5 - Medium
Severity:
5 - Medium
Resolution:
Invalid
File:
None
Vote
Score: 0
No votes
0.0% (0/0)
0.0% (0/0)

Details

It seems that variables cannot hold byte values unless LC_CTYPE=C and it seems like it is not possible to change that for a script.

Consider this script test.sh:

  1. a="$(printf '\037\213\010')"
  2. b="$(printf '\037\213\020')"
  3. if [ "$a" = "$b" ]; then
  4. echo "shouldnt happen..."
  5. fi
  6. if [ "$(printf '\213')" = "$(printf '\214')" ]; then
  7. echo "this too shouldn't happen..."
  8. fi
  9. awk -v a="$(printf '\213')" -v b="$(printf '\214')" 'END { if (a==b) print "awk shouldnt happen"}' /dev/null

When executing it in dash, bash or busybox ash, it will not print anything. But with yash it prints:

ncopa-desktop:~$ yash test.sh
shouldnt happen...
this too shouldn't happen...
awk shouldnt happen
ncopa-desktop:~$ 

It seems that variables can only hold 7-bit ascii values.

Now, if I set LC_CTYPE=C when spawning yash, it does pass:

ncopa-desktop:~$ LC_CTYPE=C yash -c ". ./test.sh"
ncopa-desktop:~$ 

But if I set LC_CTYPE=C within the a spawned shell, it still fails:

ncopa-desktop:~$ yash -c "LC_CTYPE=C ; . ./test.sh"
shouldnt happen...
this too shouldn't happen...
awk shouldnt happen
ncopa-desktop:~$ 

This effectively means that it is impossible to write or execute portable shells for yash that uses 8-bit bytes in variables. There is no way a portable script can control what locale the user has set in its yashrc.

This happens on Alpine Linux which uses musl libc and I believe the default locale in musl is C.utf8.

This was discovered when debugging yash for alpine linux' tiny-cloud script: https://gitlab.alpinelinux.org/alpine/cloud/tiny-cloud/-/blob/fda9a350a1dfb4a33e9e4bf9e5272d5b4f74f541/lib/tiny-cloud/init-main#L69

Ticket History (2/2 Histories)

2023-04-06 01:37 Updated by: ncopa
  • New Ticket "variable values depends on LC_CTYPE" created
2023-09-11 00:06 Updated by: magicant
  • Owner Update from (None) to magicant
  • Status Update from Open to Closed
  • Resolution Update from None to Invalid
  • Component Update from (None) to shell-main
Comment

\037\213\010 is not a valid text if decoded in UTF-8. If your locale is UTF-8, you can only use texts that are valid UTF-8 encoding of Unicode strings.

If you set LC_CTYPE to C (without encoding), you can only use ASCII characters.

You cannot change the encoding during the execution of a shell script. This is required by POSIX.

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_05_03

Changing the value of LC_CTYPE after the shell has started shall not affect the lexical processing of shell commands in the current shell execution environment or its subshells. Invoking a shell script or performing exec sh subjects the new shell to the changes in LC_CTYPE.

Attachment File List

No attachments

Edit

You are not logged in. I you are not logged in, your comment will be treated as an anonymous post. » Login