A regular expression is a special text string used to describe a search pattern, according to certain syntax rules. For example, l[0-9]+
matches "l" followed by one or more digits.
It is always better to solve a genuinely simple problem in a simple way; when you go beyond simple, think about regular expressions. However, use regular expressions judiciously and only when necessary. Using regular expressions indiscriminately can result in significant performance overhead.
Host Integrator uses regular expressions for:
Host Integrator conditional expressions can contain regular expressions.
Read and write substitutions are a list of regular expressions, executed in the order they are listed, that perform replacements within an attribute or recordset field string.
These are the basic elements of regular expressions.
Character | Description | Example |
---|---|---|
\ (backslash) |
Escape character used to represent characters that would otherwise be a part of a regular expression. | "\." = the period character. |
[abc] |
Match any character listed within the square brackets. | [abc] matches a, b or c |
\d,\w, and \s |
Shorthand character classes matching digits 0-9, word characters (letters and digits) and whitespace respectively. Can be used inside and outside character classes | [\d\s] matches a character that is a digit or whitespace |
\D,\W, and \S |
Negated versions of the above. Should be used only outside character classes. | \D matches a character that is not a digit |
\b |
Word boundary. Matches at the position between a word character (anything matched by \w) and a non-word character (anything matched by [^\w] or \W) as well as at the start or end of the string if the first or last characters in the string are word characters or an alphanumeric sequence. Use to perform a "whole words only" search using a regular expression in the form of \bword\b .
|
\b4\b matches 4 that is not part of a larger number. |
\B |
Non-word boundary. \B is the negated version of \b. \B matches at every position where \b does not. Effectively, \B matches at any position between two word characters as well as at any position between two non-word characters. |
\B.\B matches b in abc |
. (period) |
Match any single character | "." matches x or any other character. |
x (reg character) |
Match an instance of character "x". | x matches x |
^x |
Match any character except for character "x". | [^a-d] matches any character except a, b, c, or d |
| (pipe) |
Match either the part on the left side, or the part on the right side. Can be strung together into a series of options. The pipe has the lowest precedence of all operators. Use grouping to alternate only part of the regular expression. |
abc|def|xyz matches abc, def or xyz
|
(abc) (parentheses) |
Used to group sequences of characters or expressions. | (Larry|Moe|Curly) Howard matches Larry Howard, Moe Howard, or Curly Howard |
{ } (curly braces) |
Used to define numeric qualifiers | a{3} matches aaa |
{N,} |
Match must occur at least "N" times | Z{1,} matches when "Z" occurs at least once |
{N,M} |
Match must occur at least "N" times, but no more than "M" times | a{2,4} matches aa, aaa or aaaa |
? (question mark) |
Makes the preceding item optional or once only. The optional item is included in the match if possible. | abc? matches ab or abc |
* (star) |
Match on zero or more of the preceding match. Repeats the previous item zero or more times. As many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is not matched at all. | "go*gle" matches ggle, gogle, google, gooogle, and so on. |
+ (plus) |
Match on 1 or more of the preceding match. Repeats the previous item once or more. As many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is matched only once. | "go+gle" matches gogle, google, gooogle, and so on (but not ggle.) |
^ (caret) |
Match the beginning of a string. Matches a position rather than a character. | ^. matches a in abc\ndef. Also matches d in "multi-line" mode. |
$ (dollar) |
Match the end of a string. Matches a position rather than a character. Also matches before the very last line break if the string ends with a line break. | .$ matches f in abc\ndef. Also matches c in "multi-line" mode. |
VHI conditional expression matching "Page N of M" when N = M.
(PageStatus=~s/Page ([0-9]+/$1/) = (PageStatus=~s/Page ([0-9]+ of ([0-9]+)/$1/)
Matches when an error message is displayed on the status line.
ERROR [0-9]{1,4}: .*
Match 3 instances of a string.
"/(John){3}/"
(Matches John John John)
Match any of several first names, followed by a common last name.
"(Homer|Marge|Bart|Lisa|Maggie) Simpson"
(Matches any member of the Simpson family)
When doing read or write substitutions with attributes or recordset fields, the expression is equivalent to:
s/<Search for>/<Replace with>/g
This is, essentially, a global search and replace.
Host Integrator attributes and recordset fields accept a list of string substitutions that are applied in the order listed in the user interface. By performing multiple replacements, it is possible to isolate words or substrings within an attribute.
Remove the first word of a string
"^[A-Za-z]+ ", ""
Remove the last word of a string
" [A-Za-z]+$", ""
Remove the first 2 words of a string
"^[A-Za-z]+ [A-Za-z]+ ", ""
Remove all but the last word of a string
"([A-Za-z]+) ", ""
Remove leading blanks
"^\s+",""
Remove trailing blanks
"\s+$", ""
Remove symbols
"<symbols>",""
Normalize spaces (Replace 2 or more spaces with a single space.)
"\s{2,}", ""
See Substituting a Regular Expression for a Recordset Field to walk through a step-by-step procedure using the CCSDemo sample.
There are also a considerable number of resources available on the Internet to help you understand regular expressions.
© 1999-2007 Attachmate Corporation. All rights reserved. Terms of Use.