[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: policy around 'wontfix' bug tag



On Sun, Feb 04, 2018 at 02:27:00PM +0100, Nicolas George wrote:
Michael Stone (2018-02-04):
But a better parser would allow the same functionality, without being
confusing, inconsistent, and hard to maintain. So yes, I'll stand by
"complete misfeature".

Can you describe what you mean by "better parser" in more details?

Beware that the "same functionality" includes "same convenience".
Convenience is hard to achieve.

Well, it's not particularly convenient for people to have to constantly wonder why the parser isn't doing what they think it should do. I've been getting the questions and bug reports for 20 years, so trust me when I say that people have trouble predicting the output of a given input.

As far as "better parser" that means something that requires the input to be fully specified, and does not try to guess based on natural language parsing. For example, what does "last month" mean? What does it mean when you're on the 31st and the previous month didn't have a 31st? What date is 1/2? What time zone is "EST"? Making guesses seems "convenient" but when you hit corner cases and things break horribly, that's not convenient after all. Most date parsers address this by requiring a format specifier along with the input, so you can say something like "parse '1/2' assuming the input is numericday/numericmonth". Is it less "convenient" to have to specify the format? Maybe, but it's also a heck of a lot more reliable. Someone else pointed out postgresql's date parser, which lets you do things like specify a date and then add something like "interval '1 day'". Specifying the fact that a particular string is an interval makes the parsing much more regular than trying to pull the interval out of natural language. At one point date would appear to properly parse ISO8601 input (YYYY-mm-ddTHH:MM:SS) but it interpreted the "T" as a timezone specifier instead of the ISO8601 delimiter. (Compare output with YYYY-mm-ddUHH:MM:SS or YYYY-mm-ddSHH:MM:SS.) Why would it ever have been "convenient" to put a alphabet character timezone specifier after the date and before the time? Who knows, but the natural language parser was doing its best to guess a meaning for the input. That particular issue was fixed, but how you can tell whether you're using a version that works the old way or the new way? (Answer: you can't easily do so. If you had to specify a format it would be easier to hard fail if trying to use a format that wasn't understood rather than soft fail and produce random output.) Is it "convenient" that there's a natural language parser that only understands english? Maybe, if you speak english?

Mike Stone


Reply to: