Index: [thread] [date] [author] [stats]
  From: markus schnalke <meillo@marmaro.de>
  To  : <masqmail@marmaro.de>
  Date: Wed, 22 Feb 2012 16:28:49 +0100

Re: [masqmail] Config file parser: rval

[2012-02-19 11:29] Jan Hauke Rahm <jhr@debian.org>
> On Sat, Feb 18, 2012 at 07:28:45PM +0100, markus schnalke wrote:
>> I'm working in the src/conf.c code currently. Seems as if I'd like to
>> convert it into a nice recursive-descent parser, which it not truly
>> is.
>>
>> Well, independent of that is this sentence about r-values, quoted from
>> masqmail.conf(5):
>>
>>	If the expression is on multiple lines or contains characters
>>	other than  letters, digits or the characters `.', `‐', ` ',
>>	`/', `;', `@', `:', it must be quoted.
>>
>> The code matches to the description. But I don't understand the
>> reasons for this limitation.
>>
>> Let me explain. We have two kinds of rvals: quoted ones and not quoted
>> ones. If it's a quoted one, then it's easy: The value is everything
>> until the next (not escaped) quote. However, if it's not quoted, then
>> the current parser stops at symbols other than the above. But why? To
>> be able to put multiple statements on one line? No, as the remainder
>> of the line will get eaten anyways. I can't find a reason.
>>
>> Does anyone of you see it?
>>
>> Maybe it's just old cruft or had been a bad idea.
>
> Not sure, rather a wild guess, but maybe this was an attempt of caution?
> Whoever wrote this code may have feared characters in the config file
> that might break the code.

Thanks for this explanation. I haven't thought of that yet.

> In other words: if you don't know what you'll get, allow characters from
> a white list rather than proceeding blindly.
>
> Although, I must say, I don't see how this is protecting much. In quoted
> rvals you'd have to fear the same effects, don't you?

The limitation affects only not-quoted rvals. Independet of the
usefulnes of such a white-set, I see no reason to split between quoted
and not-quoted.

These kinds of protection are useful if you process user input, but
are they useful here? The code parses the configuration files, which
are under the admin's control, most likely that's root. If he likes to
break his own system, then he may just do so.

The question is: Why do we require quotes at all? Postfix for instance
simply takes the remainder of the line (everything after the equal
sign) as the rval. That seems natural to me.

For backward compatibility I could keep the quoted form, although it
does add corner-cases. Starting with the next release the non-quoted
form could be the default. (We only would need to take quoted hash
symbols literally, instead of taking them as a comment start.)


> Maybe, given that read lval basically has the same code (and it seems to
> make more sense there, although I'm not sure there either), this might
> have been a copy&paste + a few more allowed chars + quoted strings.

I plan to rework the conf parser more extensively. This is related to
my preparation to a compiler construction exam currently. ;-)


>> If there are no objections, then I'll change the code to take
>> everything after the equal sign as the r-value, except a possible
>> comment. Quotes would then only be neccessary if one would like to
>> assign a multiple line value. (Additionally, I'd like to make a
>> backslash as last character before a newline to conceal the line
>> break.)
>
> Personally, I'd play around a bit too see if I could break anything by
> putting weird chars into the input string, multibyte or whatever.

If masqmail breaks in these cases, then checks in the inner parts
should be added, IMO. The conf file parser should:
- ignore empty lines and comments
- split lines into lval and rval

Anything beyond should be done by the parts that deal with the actual
kind of the value. This means: Yes, we surely can limit the possible
input characters, but this should not be done by the parser itself.


> I'd expect some  weird behavior but maybe I'm wrong.

If you find examples, then report them as bugs. ;-)


> If it actually works,
> maybe read lval() and read rval() could make use of a common new
> function that simply parses a string. It wouldn't even hurt if one could
> quote before the '=' sign anyways, would it?

If it wouldn't hurt, what use are the quotes then anyway?


meillo


Index: [thread] [date] [author] [stats]