Regular-expressions Module

Make-Foo vs. Foo Functions

There are quite a few make-fooer functions hanging around. Now that regexp-position does caching, these are basically useless, but we've kept them around for backwards compatibility. Unfortunately, internally most of the functions are implemented in terms of make-regexp-positioner. To minimize the amount of rewriting, I've liberally applied seals and inline declarations so that make-regexp-positioner won't clobber all type information. The downside, of course, is that everything's sealed, but hey, no one ever subclassed [ed: specialized?] regexp-position anyway.

Caching

Parsing a regexp is not cheap, so we cache the parsed regexps and only parse a string if we haven't seen it before. Because in practice almost all regexp strings are string literals, we're free to choose \== or \= depending on whatever's fastest. However, because a string is parsed differently depending on whether the search is case sensitive or not, we also have to keep track of that information as well. (The case dependent parse boils down to the parse creating a <character-set>, which must be either case sensitive or case insensitive).

Note: Currently, only regexp-position uses this cache, because the other functions are still using make-regexp-positioner. With caching, that make-regexp-whatever stuff should probably go.

Exported Names


regexp-position[Function]

The index of a regexp in a string

Synopsis

regexp-position (big, regexp, #key start, end, case-sensitive) => (regexp-start, #rest marks)

Parameters

bigAn instance of <string>. The string to parse.
regexpAn instance of <string>.
start:An instance of <object>. Where to start parsing the string. Defaults to 0.
end:An instance of <object>. If defined, where to stop parsing the string. Defaults to #f.
case-sensitive:An instance of <object>. Match case in regexp while parsing. Defaults to #f.

Return Values

regexp-startAn instance of false-or(<integer>). If defined, the index of the match.
marksInstances of false-or(<integer>). The position of the end of the matche in the string (see below).

Description

Find the position of a regular expression inside a string. If the regexp is not found, return #f, otherwise return a variable number of marks.

This function returns the index of the start of the regular expression in the big-string, or #f if the regular expression is not found. As a second value, it returns the index of the end of the regular expression in the big-string (assuming it was found; otherwise there is no second value). These values are called marks, and they come in pairs, a start-mark and an end-mark. If there are groups in the regular expression, regexp-position will return an additional pair of marks (a start and an end) for each group. If the group is matched, these marks will be integers; if the group is not matched, the marks will be #f. So

regexp-position("This is a string", "is");
	    

returns values(2, 4) and

regexp-position("This is a string", "(is)(.*)ing");
	    

returns values(2, 16, 2, 4, 4, 13), while

regexp-position("This is a string", "(not found)(.*)ing");
	    

returns #f. Marks are always given relative to the start of big-string, not relative to the start: keyword.


regexp-replace[Function]

Replace information in a string.

Synopsis

regexp-replace (input, regexp, new-substring, #key count, case-sensitive, start, end) => (changed-string)

Parameters

inputAn instance of <string>. The string to parse and replace pieces of.
regexpAn instance of <string>.
new-substringAn instance of <string>. The replacement string.
count:An instance of <object>. If supplied, number of substitutions to make. Defaults to #f.
case-sensitive:An instance of <object>. Match case in regexp while parsing. Defaults to #f.
start:An instance of <object>. Where to start parsing the string. Defaults to 0.
end:An instance of <object>. If defined, where to stop parsing the string. Defaults to #f.

Return Values

changed-stringAn instance of <string>.

Description

This replaces all occurrences of regexp in input with new-substring. If count: is specified, it replaces only the first count occurrences of regexp. (This is different from Perl, which replaces only the first occurrence unless /g is specified) New-substring can contain backreferences to the regexp. For instance,

regexp-replace("The rain in Spain and some other text",
               "the (.*) in (\\w*\\b)", "\\2 has its \\1")
	    

returns "Spain has its rain and some other text". If the subgroup referred to by the backreference was not matched, the reference is interpreted as the null string. For instance,

regexp-replace("Hi there", "Hi there(, Bert)?",
               "What do you think\\1?")
	    

returns "What do you think?" because ", Bert" wasn't found.


translate[Method]

Equivalent to Perl's tr. Does a character by character translation.

Synopsis

translate (input, from-set, to-set, #key delete, start, end) => (output)

Parameters

inputAn instance of <string>. The string to translate.
from-setAn instance of <string>. String specification of a character set.
to-setAn instance of <string>. Another character set.
delete:An instance of <object>. If #t, any characters in the from-string that don't have matching characters in the to-string are deleted. Defaults to #f.
start:An instance of <object>. Where to start parsing the string. Defaults to 0.
end:An instance of <object>. If defined, where to stop parsing the string. Defaults to #f.

Return Values

outputAn instance of <string>.

Description

This is equivalent to Perl's tr/// construct. From-string is a string specification of a character set, and to-string is another character set. Translate converts input character by character, according to the sets. For instance,

translate("any string", "a-z", "A-Z")
	    

will convert "any string" to all uppercase: "ANY STRING".

Like Perl, character ranges are not allowed to be "backwards". The following is not legal:

translate("any string", "a-z", "z-a")
	    

(This restriction may be removed in future releases) Unlike Perl's tr///, translate doesn't return the number of characters translated.

If delete: is #t, any characters in the from-string that don't have matching characters in the to-string are deleted. The following will remove all vowels from a string and convert periods to commas:

translate("any string", ".aeiou", ",", delete: #t)
	    

Delete: is #f by default. If delete: is #f and there aren't enough characters in the to-string, the last character in the to-string is reused as many times as necessary. The following converts several punctuation characters into spaces:

translate("any string", ",./:;[]{}()", " ");
	    

Start: and end: indicate which part of input to translate. They default to the entire string.

Note: Translate is always case sensitive.


split[Function]

Breaks up a string along boundary characters.

Synopsis

split (pattern, input, #key count, remove-empty-items, start, end) => (#rest whole-bunch-of-strings)

Parameters

patternAn instance of <string>. The regexp to split on.
inputAn instance of <string>. The string to parse and replace pieces of.
count:An instance of <object>. If supplied, maximum number of strings to return. Defaults to #f.
remove-empty-items:An instance of <object>. Magically skips empty items when #t. Defaults to #t.
start:An instance of <object>. Where to start parsing the string. Defaults to 0.
end:An instance of <object>. If defined, where to stop parsing the string. Defaults to #f.

Return Values

whole-bunch-of-stringsInstances of <string>.

Description

This is like Perl's split function. It searches input from occurrences of pattern, and returns substrings that were delimited by that regexp. For instance,

split("-", "long-dylan-identifier")
	    

returns values("long", "dylan", "identifier"). Note that what matched the regexp is left out. Remove-empty-items, which defaults to true, magically skips over empty items, so that

split("-", "long--with--multiple-dashes")
	    

returns values("long", "with", "multiple", "dashes"). Count is the maximum number of strings to return. If there are n strings and count is specified, the first count - 1 strings are returned as usual, and the countth string is the remainder, unsplit. So

split("-", "really-long-dylan-identifier", count: 3)
	    

returns values("really", "long", "dylan-identifier"). If remove-empty-items is #t, empty items aren't counted.

Start: and end: indicate what part of input should be looked at for delimiters. They default to the entire string. For instance,

split("-", "really-long-dylan-identifier", start: 8)
	    

returns values("really-long", "dylan", "identifier").

Note: Unlike Perl, empty regular expressions are never legal regular expressions, so there is no way to split a string into a bunch of single character strings. Of course, in Dylan this is not a useful thing to do (as one can get each character of the string by iteration or by indexing), so this is not really a problem.


join[Function]

Does the opposite of split.

Synopsis

join (delimiter, #rest strings) => (big-string)

Parameters

delimiterAn instance of <string>.
stringsInstances of <object>.

Return Values

big-stringAn instance of <string>.

Description

This is like Perl's join function. This is not really any more efficient than concatenate-as, but it's more convenient.

join(":", word1, word2, word3)
	    

is equivalent to

concatenate(word1, ":", word2, ":", word3)
	    

(and no more efficient).


<illegal-regexp>[sealed Class]

Signaled when a function receives an illegal regular expression.

Superclasses

<error>

Initialization Keywords

regexp:An instance of <string>. The regexp that caused the error

Description

Signaled when a function receives an illegal regular expression.

Deprecated Functions

These functions still work, but are deprecated. Use the foo functions (described above) instead of these make-foo functions.


make-regexp-positioner[Function]

[Deprecated] Creates a function that finds the index of a regexp in a string.

Synopsis

make-regexp-positioner (regexp, #key byte-character-only, needs-marks, maximum-compile, case-sensitive) => (regexp-positioner)

Parameters

regexpAn instance of <string>.
byte-character-only:An instance of <object>. Ignored. Defaults to #f.
needs-marks:An instance of <object>. Ignored. Defaults to #f.
maximum-compile:An instance of <object>. Ignored. Defaults to #f.
case-sensitive:An instance of <object>. Match case in regexp while parsing. Defaults to #f.

Return Values

regexp-positionerAn instance of <function>. The function to execute a match on a string.

Description

Once upon a time, this was how you interfaced to the NFA stuff (maximum-compile: #t). That's gone. Now it's just here for backwards compatibility. All keywords except case-sensitive are now ignored.


make-regexp-replacer[Function]

[Deprecated] Creates a function that replaces information in a string.

Synopsis

make-regexp-replacer (regexp, #key replace-with, case-sensitive) => (replacer)

Parameters

regexpAn instance of <string>.
replace-with:An instance of <object>. The replacement string.
case-sensitive:An instance of <object>. Match case in regexp while parsing. Defaults to #f.

Return Values

replacerAn instance of <function>. The function that does the replacement.

Description

This returns an anonymous replacer function that is either

method (big-string, #key count, start, end)
		

or

method (big-string, replace-string, #key count, start, end)
		

The first form is returned if the replace-with: keyword isn't supplied, otherwise the second form is returned.


make-translator[Method]

[Deprecated] Creates a function that translates a string.

Synopsis

make-translator (from-set, to-set, #key delete) => (translator)

Parameters

from-setAn instance of <string>. String specification of a character set.
to-setAn instance of <string>. Another character set.
delete:An instance of <object>. If #t, delete from-set characters not in to-set. Defaults to #f.

Return Values

translatorAn instance of <function>. The function that does the translation.

Description

This returns an anonymous translation function.


make-splitter[Function]

[Deprecated] Creates a function that splits a string.

Synopsis

make-splitter (pattern) => (splitter)

Parameters

patternAn instance of <string>. The regexp to split on.

Return Values

splitterAn instance of <function>. (If you're Brit, don't smile.) The function that does the split on the string.

Description

This returns an anonymous splitter function.