Ticket #35283

Extra empty field after IFS non-whitespace terminated fields

Open Date: 2015-06-25 12:00 Last Update: 2016-03-14 22:36

Reporter:
Owner:
Type:
Status:
Closed
Component:
MileStone:
(None)
Priority:
5 - Medium
Severity:
5 - Medium
Resolution:
Fixed
File:
None
Vote
Score: 0
No votes
0.0% (0/0)
0.0% (0/0)

Details

$ IFS=':'
$ x='a:b::'
$ set -- $x
$ echo $#
4

yash counts 4 fields instead of 3, like most other shells do. In other words, yash does not treat the non-whitespace IFS character as a field terminator.

POSIX: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_05 "The shell shall treat each character of the IFS as a delimiter and use the delimiters as field TERMINATORS to split the results of parameter expansion and command substitution into fields." (emphasis mine)

bash, ash, dash, ksh93 and mksh act according to POSIX.

pdksh (which is obsolete) and zsh act like yash.

Ticket History (3/12 Histories)

2015-06-25 12:00 Updated by: mcdutchie
  • New Ticket "Extra empty field after IFS non-whitespace terminated fields" created
2015-06-25 12:01 Updated by: mcdutchie
  • Details Updated
2015-06-26 08:37 Updated by: magicant
Comment

What do you mean by the emphasis on the word "terminators"? Yash is simply obeying the algorithm required by POSIX.

Each occurrence in the input of an IFS character that is not IFS white space, along with any adjacent IFS white space, shall delimit a field

If you have n non-whitespace IFS characters (and no whitespace IFS characters) in the input word, the result is (n + 1) fields. How is any other interpretation possible?

2015-06-26 09:13 Updated by: mcdutchie
Comment

If you have n non-whitespace IFS characters (and no whitespace IFS characters) in the input word, the result is (n + 1) fields. How is any other interpretation possible?

By interpreting those non-whitespace characters as field terminators, not separators. Terminating means ending, not separating. If each character terminates (ends) a field, then in the given example there are n fields for n characters. The reason I emphasized the word "terminators" in the POSIX text was to draw your attention to this.

For whitespace IFS characters, there is no difference between separator and terminator, because it is specified that all IFS whitespace at the beginning and end of the string is supposed to be ignored. But for non-whitespace IFS characters, the difference is important.

(Just FYI, zsh will likely have fixed this in POSIX mode for the next release -- a preliminary patch has gone out to the zsh-workers list after I posted a similar bug report there.)

2015-06-26 13:31 Updated by: magicant
Comment

The meaning of the word "terminate" is not explicitly defined in POSIX, so your interpretation may be possible, but it obviously contradicts the sentence I previously quoted, which states that each non-whitespace IFS character is a "delimiter". If POSIX should require that the last empty field be omitted, the wording of POSIX should be changed to state it clearly.

See also the rationale for field splitting. http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xcu_chap02.html#tag_23_02_06_05 It describes the field splitting algorithm in a different manner. It neither states or implies that the last empty field should be omitted.

2015-07-19 02:03 Updated by: magicant
  • Status Update from Open to Closed
  • Resolution Update from None to Invalid
  • Ticket Close date is changed to 2015-07-19 02:03
2015-12-04 09:05 Updated by: None
Comment

Though the current spec is not very clear, the intent has been clarified several times on the austin group mailing list, in standard interpretations and elsewhere. See the current discussion on the dash mailing list: http://thread.gmane.org/gmane.comp.shells.dash/1184/focus=1187

2015-12-04 09:29 Updated by: None
Comment

This this thread: http://web.archive.org/web/20050510193928/http://www.opengroup.org/austin/mailarchives/ag/msg08044.html

That lead to https://standards.ieee.org/findstds/interps/1003.1-2001/1003.1-2001-98.html and the current wording in the spec.

The interpretation was written on the basis that one shell should pass this test script: http://web.archive.org/web/20050414022354/http://www.research.att.com/~gsf/public/ifs.sh which yash, posh and dash currently fail on (for different reasons). yash fails for the same reason as zsh.

2015-12-04 19:06 Updated by: magicant
  • Resolution Update from Invalid to None
  • Status Update from Closed to Open
Comment

All right, this looks worth reconsideration. Thank you for following up.

2016-03-08 23:35 Updated by: magicant
Comment

Although the language of the standard still seems ambiguous to me, the intended interpretation of the standard has been clarified in http://www.open-std.org/JTC1/SC22/WG15/docs/rr/9945-2/9945-2-98.html. So, I'm going to change the behavior of yash to match the intention. I'll also add an option so that users can restore the current "unintended" behavior if they wish.

2016-03-12 22:27 Updated by: magicant
Comment

The interpretation was written on the basis that one shell should pass this test script: http://web.archive.org/web/20050414022354/http://www.research.att.com/~gsf/public/ifs.sh which yash, posh and dash currently fail on (for different reasons). yash fails for the same reason as zsh.

I have committed a fix in my local repository, but this script still fails yash. Examples of the failures are:

IFS=": "; x=":"; set x $x; shift; echo "[$#]($1)" # expected "[1]()" got "[0]"
echo ": ::" | ( IFS=": " read x y; echo "($x)($y)" ) # expected "()(::)" got "()( ::)"

I doubt these are actually the wrong behavior, however. The first example expects one empty field, but such a field is subject to empty field removal. The second expects no leading whitespace, but POSIX is ambiguous about whether the whitespace must be removed in this case.

These errors are not field splitting issues, so I don't address them in this ticket anyway.

2016-03-14 22:36 Updated by: magicant
  • Resolution Update from None to Fixed
  • Ticket Close date is changed to 2016-03-14 22:36
  • Status Update from Open to Closed
Comment

r3608 is the fix.

Attachment File List

No attachments

Edit

You are not logged in. I you are not logged in, your comment will be treated as an anonymous post. » Login