Ticket #39094

quoting issues with [[ string =~ re ]]

Open Date: 2019-03-30 20:53 Last Update: 2019-04-29 23:24

Reporter:
Owner:
Type:
Status:
Open [Owner assigned]
Component:
MileStone:
(None)
Priority:
5 - Medium
Severity:
5 - Medium
Resolution:
None
File:
None
Vote
Score: 0
No votes
0.0% (0/0)
0.0% (0/0)

Details

2.48 has introduced a Korn-style [[...]] construct. For the =~ operator, I see the bash32+ approach, as opposed to the bash31/zsh one was chosen with regards to quoting.

{ a =~ '.' ] does match as it does in zsh.

But:

[[ a =~ '.' ]] doesn't match, because quotes remove their special meaning of regex operators.

Now, a problem with that is that (, ), and | are regex operators but cannot appear in a normal shell word. At the moment in yash:

yash -c '[[ a =~ a||a =~ b ]]'

works (like in zsh) where || is the "OR" token inside [[...]]

But you can't use the | ERE operator:

$ ./yash -c '[[ a =~ a|b ]]'
yash -c:1: syntax error: invalid word `|' between `[[' and `]]'

Same as in zsh, but in zsh, like in bash3.1, you'd write [[ a =~ 'a|b' ]], but that doesn't work in yash because those quotes remove its special meaning to |.

In zsh, [[ a =~ (a|b) ]] works because (a|b) is the same syntax as the (a|b) glob operator (specific to zsh, ksh has @(a|b) instead).

There's a similar problem with ( and ):

$ ./yash -c '[[ x =~ (aa)* ]]'
yash -c:1: syntax error: `(' is not a valid operand in the conditional expression

yash also has the same bug (actually worse) as bash originally had in that, to remove the special meaning of re operators, it escapes them with \ before calling regcomp.

But it inserts that backslash even when it should not, like inside bracket expressions (as bash originally did), but also when before characters that are not regexp operators (bash didn't have that bug).

That means that [[ '\' =~ ["."] ]] matches (like in old bash versions), but also [[ x =~ "<" ]] on systems where \< is the word boundary operator for instance.

yash should insert that \ only where needed (where [...] is a special case, also beware of [^]")"]).

There's also the question of whether [[ b =~ [a"-"c] ]] should work the same as [[ b = [a"-"c] ]]

Ticket History (3/5 Histories)

2019-03-30 20:53 Updated by: stephane-c
  • New Ticket "quoting issues with [[ string =~ re ]]" created
2019-03-31 04:50 Updated by: stephane-c
  • Details Updated
2019-03-31 05:04 Updated by: stephane-c
Comment

Obviously, the easiest (and I'd argue cleanest) resolution is to adopt the bash31/zsh approach instead.

You may also want to consider adding support for PCREs instead of EREs in the future (as zsh does with the rematchpcre option; PCREs are the new de-facto regex standard these days). And with the bash32+ approach, do a correct escaping could become tricky.

ksh93 behaves a bit like bash32+, but quoting works differently with quotes and with backslashes and quotes only disable some RE operators ([[ a =~ ".+" ]] matches there but not [[ a =~ \.\+ ]] nor [[ a = "a*" ]])

2019-04-29 17:56 Updated by: magicant
Comment

Since yash introduced the double-bracket command only for compatibility reasons, I'm not willing to intentionally diverge from the original ksh behaviors. To support ksh-like handling of | and parentheses, however, I need to implement the quirky syntax parser that treats them as normal word characters. *sigh*

2019-04-29 23:24 Updated by: stephane-c
Comment

Reply To magicant

Since yash introduced the double-bracket command only for compatibility reasons, I'm not willing to intentionally diverge from the original ksh behaviors.

Note that [[ =~ ]] comes from bash, not ksh. ksh93 added it later, but it's unfinished and pretty bogus there as mentioned above. ksh88, pdksh and all its derivatives don't have it.

Attachment File List

No attachments

Edit

You are not logged in. I you are not logged in, your comment will be treated as an anonymous post. » Login