Extended regular expressions (ERE-s) are used in several places in the LDM system -- primarily as a pattern for matching
Consequently, it's important to know how to use ERE-s.
See the official description of ERE-s.
There is an excellent tool for determining a matching ERE from example text.
Here is an incomplete summary of ERE syntax:
Text:.
Any single character[
chars]
Character class: One of chars[^
chars]
Character class: None of chars text1|
text2 Alternative: text1 or text2 Quantifiers:?
0 or 1 of the preceding text*
0 or N of the preceding text (N > 0)+
1 or N of the preceding text (N > 1){M,N}
M through N of the preceding text (either may be missing) Grouping:(
text)
Grouping of text (either to set the borders of an alternative or for making backreferences where the Nth group can be used in a pqact PIPE action with (for example)\
N) Anchors:^
Start of string anchor$
End of string anchor Escaping:\
char escape that particular character (for instance to specify the characters ".[]()
" etc.)
Don't use ERE-s with a ".*" prefix because:
This only applies to ERE-s with a ".*" prefix: the ERE ".*", by itself, is perfectly OK.
The inefficiency of pathological ERE-s can be seen by using the UNIX time utility with the LDM's regex utility. First the non-pathological case:
$ time regex -n 10000 \ -s 'lksjdfklsdjfkljsdfkljsdljfsdlkjfdlskjfldjflkjsdflkjsdflkjsd' \ "some-sort-of-pattern" no match real 0m0.044s user 0m0.040s sys 0m0.000s
The above indicates that ten-thousand comparisons of the given string against the ERE took 0.04 seconds of user-time.
The timing of the corresponding pathological ERE is much different:
$ time regex -n 10000 \ -s 'lksjdfklsdjfkljsdfkljsdljfsdlkjfdlskjfldjflkjsdflkjsdflkjsd' \ ".*some-sort-of-pattern" no match real 0m18.424s user 0m17.720s sys 0m0.020s
The above took 17.72 seconds of user-time, meaning that the non-pathological ERE is 443 times more efficient than the pathological one. More complex pathological ERE-s have even worse results.