about summary refs log tree commit diff
path: root/src
diff options
context:
space:
mode:
authorTuomas Tynkkynen <tuomas@tuxera.com>2018-02-25T21·51+0200
committerTuomas Tynkkynen <tuomas@tuxera.com>2018-03-02T15·30+0200
commita0e38c16bc7773d0fc62771a9935e079c00899ef (patch)
tree9f30fc3be4c4be6742d220a1b0363be8f053d383 /src
parent939cf4ccebb199e61da648873fb078ae8833263f (diff)
libexpr: Recognize newline in more places in lexer
Flex's regexes have an annoying feature: the dot matches everything
except a newline. This causes problems for expressions like:

"${0}\
"

where the backslash-newline combination matches this rule instead of the
intended one mentioned in the comment:

    <STRING>\$|\\|\$\\ {
                    /* This can only occur when we reach EOF, otherwise the above
                    (...|\$[^\{\"\\]|\\.|\$\\.)+ would have triggered.
                    This is technically invalid, but we leave the problem to the
                    parser who fails with exact location. */
                    return STR;
                }
However, the parser actually accepts the resulting token sequence
('"' DOLLAR_CURLY 0 '}' STR '"'), which is a problem because the lexer
rule didn't assign anything to yylval. Ultimately this leads to a crash
when dereferencing a NULL pointer in ExprConcatStrings::bindVars().

The fix does change the syntax of the language in some corner cases
but I think it's only turning previously invalid (or crashing) syntax
to valid syntax. E.g.

"a\
b"

and

''a''\
b''

were previously syntax errors but now both result in "a\nb".

Found by afl-fuzz.
Diffstat (limited to 'src')
-rw-r--r--src/libexpr/lexer.l9
1 files changed, 5 insertions, 4 deletions
diff --git a/src/libexpr/lexer.l b/src/libexpr/lexer.l
index e5e01fb5831a..1e9c29afa133 100644
--- a/src/libexpr/lexer.l
+++ b/src/libexpr/lexer.l
@@ -85,6 +85,7 @@ static Expr * unescapeStr(SymbolTable & symbols, const char * s, size_t length)
 %}
 
 
+ANY         .|\n
 ID          [a-zA-Z\_][a-zA-Z0-9\_\'\-]*
 INT         [0-9]+
 FLOAT       (([1-9][0-9]*\.[0-9]*)|(0?\.[0-9]+))([Ee][+-]?[0-9]+)?
@@ -146,8 +147,8 @@ or          { return OR_KW; }
 <INITIAL,INSIDE_DOLLAR_CURLY>\" {
                 PUSH_STATE(STRING); return '"';
               }
-<STRING>([^\$\"\\]|\$[^\{\"\\]|\\.|\$\\.)*\$/\" |
-<STRING>([^\$\"\\]|\$[^\{\"\\]|\\.|\$\\.)+ {
+<STRING>([^\$\"\\]|\$[^\{\"\\]|\\{ANY}|\$\\{ANY})*\$/\" |
+<STRING>([^\$\"\\]|\$[^\{\"\\]|\\{ANY}|\$\\{ANY})+ {
                 /* It is impossible to match strings ending with '$' with one
                    regex because trailing contexts are only valid at the end
                    of a rule. (A sane but undocumented limitation.) */
@@ -178,7 +179,7 @@ or          { return OR_KW; }
                    yylval->e = new ExprIndStr("''");
                    return IND_STR;
                  }
-<IND_STRING>\'\'\\. {
+<IND_STRING>\'\'\\{ANY} {
                    yylval->e = unescapeStr(data->symbols, yytext + 2, yyleng - 2);
                    return IND_STR;
                  }
@@ -208,7 +209,7 @@ or          { return OR_KW; }
 \#[^\r\n]*    /* single-line comments */
 \/\*([^*]|\*+[^*/])*\*+\/  /* long comments */
 
-.           return yytext[0];
+{ANY}           return yytext[0];
 
 }