[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: codesearch across lines



On 2017-11-12, The Wanderer <wanderer@fastmail.fm> wrote:
>
>> (?m)(\W|^)panda.*str(\W|$)
>
> That would be expected to find only documents containing 'panda'
> followed by 'str'. To also find ones which contain 'str' followed by
> 'pandas' (and add the missing 's' back in), you'd probably want:
>
> (?m)(\W|^)(pandas.*str|str.*pandas)(\W|$)
>
> I have not tested this, but I use similar '(a.*b|b.*a)' regexes on a
> semi-regular basis for searching one of my own text archives.

I tried that, actually, following the same logic, or thought I did
(there might have been a typo somewhere) yet it produced *less* results,
but trying the formula again now it seems to "work" (although the regex
is pretty useless because it matches reams and reams of stuff because
'anda' 'panda' 'pandas' 'expandable' 'str' 'struct' 'instruction' 'castration',
etc. are all matched).

This produces two hits (from the same file) only:

 (?m)(\W|^)\bpanda\b.*\bstr\b|\bstr\b.*\bpanda\b(\W|$)

I'm not certain how you're supposed to construct the formula to only match
the literal strings (if literal is indeed the term) "panda" and "str".

I also have no idea what the expected results might look like. To add
insult insult to injury, I know nothing about regexes either.

;-)


> (Also, I'm not sure the '\W' bits are needed, but I don't know the field
> of what's-being-searched-for well enough to be certain about why those
> may have been added.)
>


Reply to: