Regular Expressions Syntax

Oxygen XML Author uses the Java regular expression syntax. It is similar to that used in Perl 5, with several exceptions. Thus, Oxygen XML Author does not support the following constructs:

  • The conditional constructs (?{X}) and (?(condition)X|Y).
  • The embedded code constructs (?{code}) and (??{code}).
  • The embedded comment syntax (?#comment).
  • The preprocessing operations \l, \u, \L, and \U.

When using regular expressions, note that some sets of characters from XPath/XML Schema/Schematron are slightly different than the ones used by Oxygen XML Author/Java in the text searches from the Find/Replace dialog boxes. The most common example is with the \w and \W set of characters. To ensure consistent results between the two, it is recommended that you use the following constructs in the Find/Replace dialog boxes:

  • /w - [#x0000-#x10FFFF]-[\p{P}\p{Z}\p{C}] instead of \w
  • /W - [\p{P}\p{Z}\p{C}] instead of \W

There are some other notable differences that may cause unexpected results, including the following:

  • In Perl, \1 through \9 are always interpreted as back references. A backslash-escaped number greater than 9 is treated as a back reference if at least that many sub-expressions exist. Otherwise, it is interpreted, if possible, as an octal escape. In this class octal escapes must always begin with a zero. In Java, \1 through \9 are always interpreted as back references, and a larger number is accepted as a back reference if at least that many sub-expressions exist at that point in the regular expression. Otherwise, the parser will drop digits until the number is smaller or equal to the existing number of groups or it is one digit.
  • Perl uses the g flag to request a match that resumes where the last match left off.
  • In Perl, embedded flags at the top level of an expression affect the whole expression. In Java, embedded flags always take effect at the point where they appear, whether they are at the top level or within a group. In the latter case, flags are restored at the end of the group just as in Perl.
  • Perl is forgiving about malformed matching constructs, as in the expression *a, as well as dangling brackets, as in the expression abc], and treats them as literals. This class also accepts dangling brackets but is strict about dangling meta-characters such as +, ? and *.

Was this helpful?