Extended regular expressions (ERE-s) are used in several places in the LDM system -- primarily as a pattern for matching
Consequently, it's important to know how to use ERE-s.
See the official description of ERE-s.
There is an excellent tool for determining a matching ERE from example text.
Here is an incomplete summary of ERE syntax:
Text:
. Any single character
[chars] Character class: One of chars
[^chars] Character class: None of chars
text1|text2 Alternative: text1 or text2
Quantifiers:
? 0 or 1 of the preceding text
* 0 or N of the preceding text (N > 0)
+ 1 or N of the preceding text (N > 1)
{M,N} M through N of the preceding text (either may be missing)
Grouping:
(text) Grouping of text
(either to set the borders of an alternative or
for making backreferences where the Nth group can
be used in a pqact PIPE action with (for example) \N)
Anchors:
^ Start of string anchor
$ End of string anchor
Escaping:
\char escape that particular character
(for instance to specify the characters ".[]()"
etc.)
Don't use ERE-s with a ".*" prefix because:
This only applies to ERE-s with a ".*" prefix: the ERE ".*", by itself, is perfectly OK.
The inefficiency of pathological ERE-s can be seen by using the UNIX time utility with the LDM's regex utility. First the non-pathological case:
$ time regex -n 10000 \
-s 'lksjdfklsdjfkljsdfkljsdljfsdlkjfdlskjfldjflkjsdflkjsdflkjsd' \
"some-sort-of-pattern"
no match
real 0m0.044s
user 0m0.040s
sys 0m0.000s
The above indicates that ten-thousand comparisons of the given string against the ERE took 0.04 seconds of user-time.
The timing of the corresponding pathological ERE is much different:
$ time regex -n 10000 \
-s 'lksjdfklsdjfkljsdfkljsdljfsdlkjfdlskjfldjflkjsdflkjsdflkjsd' \
".*some-sort-of-pattern"
no match
real 0m18.424s
user 0m17.720s
sys 0m0.020s
The above took 17.72 seconds of user-time, meaning that the non-pathological ERE is 443 times more efficient than the pathological one. More complex pathological ERE-s have even worse results.