class Regex
Overview
A Regex represents a regular expression, a pattern that describes the contents of strings. A Regex can determine whether or not a string matches its description, and extract the parts of the string that match.
A Regex can be created using the literal syntax, in which it is delimited by
forward slashes (/
):
/hay/ =~ "haystack" # => 0
/y/.match("haystack") # => #<Regex::MatchData "y">
Interpolation works in regular expression literals just as it does in string literals. Be aware that using this feature will cause an exception to be raised at runtime, if the resulting string would not be a valid regular expression.
x = "a"
/#{x}/.match("asdf") # => #<Regex::MatchData "a">
x = "("
/#{x}/ # => ArgumentError
When we check to see if a particular regular expression describes a string, we can say that we are performing a match or matching one against the other. If we find that a regular expression does describe a string, we say that it matches, and we can refer to a part of the string that was described as a match.
Here "haystack"
does not contain the pattern /needle/
, so it doesn't match:
/needle/.match("haystack") # => nil
Here "haystack"
contains the pattern /hay/
, so it matches:
/hay/.match("haystack") # => #<Regex::MatchData "hay">
Regex methods that perform a match usually return a truthy value if there was
a match and nil
if there was no match. After performing a match, the
special variable $~
will be an instance of Regex::MatchData
if it matched, nil
otherwise.
When matching a regular expression using #=~
(either String#=~
or
Regex#=~
), the returned value will be the index of the first match in the
string if the expression matched, nil
otherwise.
/stack/ =~ "haystack" # => 3
"haystack" =~ /stack/ # => 3
$~ # => #<Regex::MatchData "stack">
/needle/ =~ "haystack" # => nil
"haystack" =~ /needle/ # => nil
$~ # => nil
When matching a regular expression using #match
(either String#match
or
Regex#match
), the returned value will be a Regex::MatchData
if the expression
matched, nil
otherwise.
/hay/.match("haystack") # => #<Regex::MatchData "hay">
"haystack".match(/hay/) # => #<Regex::MatchData "hay">
$~ # => #<Regex::MatchData "hay">
/needle/.match("haystack") # => nil
"haystack".match(/needle/) # => nil
$~ # => nil
Regular expressions have their own language for describing strings.
Many programming languages and tools implement their own regular expression language, but Crystal uses PCRE, a popular C library for providing regular expressions. Here give a brief summary of the most basic features of regular expressions - grouping, repetition, and alternation - but the feature set of PCRE extends far beyond these, and we don't attempt to describe it in full here. For more information, refer to the PCRE documentation, especially the full pattern syntax or syntax quick reference.
The regular expression language can be used to match much more than just the
static substrings in the above examples. Certain characters, called
metacharacters,
are given special treatment in regular expressions, and can be used to
describe more complex patterns. To match metacharacters literally in a
regular expression, they must be escaped by being preceded with a backslash
(\
). .escape
will do this automatically for a given String.
A group of characters (often called a capture group or
subpattern)
can be identified by enclosing it in parentheses (()
). The contents of
each capture group can be extracted on a successful match:
/a(sd)f/.match("_asdf_") # => #<Regex::MatchData "asdf" 1:"sd">
/a(sd)f/.match("_asdf_") { |md| md[1] } # => "sd"
/a(?<grp>sd)f/.match("_asdf_") # => #<Regex::MatchData "asdf" grp:"sd">
/a(?<grp>sd)f/.match("_asdf_") { |md| md["grp"] } # => "sd"
Capture groups are indexed starting from 1. Methods that accept a capture
group index will usually also accept 0 to refer to the full match. Capture
groups can also be given names, using the (?<name>...)
syntax, as in the
previous example.
A character or group can be
repeated
or made optional using an asterisk (*
- zero or more), a plus sign
(#+
- one or more), integer bounds in curly braces
({n,m}
) (at least n
, no more than m
), or a question mark
(?
) (zero or one).
/fo*/.match("_f_") # => #<Regex::MatchData "f">
/fo+/.match("_f_") # => nil
/fo*/.match("_foo_") # => #<Regex::MatchData "foo">
/fo{3,}/.match("_foo_") # => nil
/fo{1,3}/.match("_foo_") # => #<Regex::MatchData "foo">
/fo*/.match("_foo_") # => #<Regex::MatchData "foo">
/fo*/.match("_foooooooo_") # => #<Regex::MatchData "foooooooo">
/fo{,3}/.match("_foooo_") # => nil
/f(op)*/.match("fopopo") # => #<Regex::MatchData "fopop" 1: "op">
/foo?bar/.match("foobar") # => #<Regex::MatchData "foobar">
/foo?bar/.match("fobar") # => #<Regex::MatchData "fobar">
Alternatives can be separated using a
vertical bar
(|
). Any single character can be represented by
dot
(.
). When matching only one character, specific
alternatives can be expressed as a
character class,
enclosed in square brackets ([]
):
/foo|bar/.match("foo") # => #<Regex::MatchData "foo">
/foo|bar/.match("bar") # => #<Regex::MatchData "bar">
/_(x|y)_/.match("_x_") # => #<Regex::MatchData "_x_" 1: "x">
/_(x|y)_/.match("_y_") # => #<Regex::MatchData "_y_" 1: "y">
/_(x|y)_/.match("_(x|y)_") # => nil
/_(x|y)_/.match("_(x|y)_") # => nil
/_._/.match("_x_") # => #<Regex::MatchData "_x_">
/_[xyz]_/.match("_x_") # => #<Regex::MatchData "_x_">
/_[a-z]_/.match("_x_") # => #<Regex::MatchData "_x_">
/_[^a-z]_/.match("_x_") # => nil
/_[^a-wy-z]_/.match("_x_") # => #<Regex::MatchData "_x_">
Regular expressions can be defined with these 3 optional flags:
i
: ignore case (PCRE_CASELESS)m
: multiline (PCRE_MULTILINE and PCRE_DOTALL)x
: extended (PCRE_EXTENDED)
/asdf/ =~ "ASDF" # => nil
/asdf/i =~ "ASDF" # => 0
/asdf\nz/i =~ "ASDF\nZ" # => nil
/asdf\nz/im =~ "ASDF\nZ" # => 0
PCRE supports other encodings, but Crystal strings are UTF-8 only, so Crystal regular expressions are also UTF-8 only (by default).
PCRE optionally permits named capture groups (named subpatterns) to not be unique. Crystal exposes the name table of a Regex as a Hash of String => Int32, and therefore requires named capture groups to have unique names within a single Regex.
Defined in:
regex/match_data.crregex/regex.cr
json/any.cr
yaml/any.cr
Class Method Summary
-
.error?(source)
Determines Regex's source validity.
-
.escape(str) : String
Returns a String constructed by escaping any metacharacters in
str
. -
.union(*patterns : Regex | String) : self
Union.
-
.union(patterns : Enumerable(Regex | String)) : self
Union.
-
.new(source, options : Options = Options::None)
Creates a new Regex out of the given source String.
Instance Method Summary
-
#+(other)
Union.
-
#==(other : Regex)
Equality.
- #===(other : JSON::Any)
- #===(other : YAML::Any)
-
#===(other : String)
Case equality.
-
#=~(other : String)
Match.
-
#=~(other)
Match.
-
#inspect(io : IO)
Convert to String in literal format.
-
#match(str, pos = 0, options = Regex::Options::None)
Match at character index.
-
#match_at_byte_index(str, byte_index = 0, options = Regex::Options::None)
Match at byte index.
-
#name_table
Returns a Hash where the values are the names of capture groups and the keys are their indexes.
-
#options : Regex::Options
Return a
Regex::Options
representing the optional flags applied to this Regex. -
#source : String
Return the original String representation of the Regex pattern.
-
#to_s(io : IO)
Convert to String in subpattern format.
Instance methods inherited from class Reference
==(other)==(other : self) ==, hash hash, inspect(io : IO) : Nil inspect, object_id : UInt64 object_id, same?(other : Reference)
same?(other : Nil) same?, to_s(io : IO) : Nil to_s
Instance methods inherited from class Object
!=(other)
!=,
!~(other)
!~,
==(other)
==,
===(other)===(other : YAML::Any)
===(other : JSON::Any) ===, =~(other) =~, class class, clone clone, crystal_type_id crystal_type_id, dup dup, hash hash, inspect
inspect(io : IO) inspect, itself itself, not_nil! not_nil!, tap(&block) tap, to_json to_json, to_pretty_json(io : IO)
to_pretty_json to_pretty_json, to_s
to_s(io : IO) to_s, to_yaml(io : IO)
to_yaml to_yaml, try(&block) try
Class methods inherited from class Object
==(other : Class)
==,
===(other)
===,
cast(other) : self
cast,
from_json(string_or_io) : self
from_json,
from_yaml(string : String) : self
from_yaml,
hash
hash,
inspect(io)
inspect,
name : String
name,
to_s(io)
to_s,
|(other : U.class)
|
Class Method Detail
Determines Regex's source validity. If it is, nil
is returned.
If it's not, a String containing the error message is returned.
Regex.error?("(foo|bar)") # => nil
Regex.error?("(foo|bar") # => "missing ) at 8"
Returns a String constructed by escaping any metacharacters in str
.
string = Regex.escape("\*?{}.") # => "\\*\\?\\{\\}\\."
/#{string}/ # => /\*\?\{\}\./
Union. Returns a Regex that matches any of patterns
. If any pattern
contains a named capture group using the same name as a named capture
group in any other pattern, an ArgumentError will be raised at runtime.
All capture groups in the patterns after the first one will have their
indexes offset.
re = Regex.union(/skiing/i, "sledding")
re.match("Skiing") # => #<Regex::MatchData "Skiing">
re.match("sledding") # => #<Regex::MatchData "sledding">
Union. Returns a Regex that matches any of patterns
. If any pattern
contains a named capture group using the same name as a named capture
group in any other pattern, an ArgumentError will be raised at runtime.
All capture groups in the patterns after the first one will have their
indexes offset.
re = Regex.union([/skiing/i, "sledding"])
re.match("Skiing") # => #<Regex::MatchData "Skiing">
re.match("sledding") # => #<Regex::MatchData "sledding">
re = Regex.union({/skiing/i, "sledding"})
re.match("Skiing") # => #<Regex::MatchData "Skiing">
re.match("sledding") # => #<Regex::MatchData "sledding">
Creates a new Regex out of the given source String.
Regex.new("^a-z+:\s+\w+") # => /^a-z+:\s+\w+/
Regex.new("cat", Regex::Options::IGNORE_CASE) # => /cat/i
options = Regex::Options::IGNORE_CASE | Regex::Options::EXTENDED
Regex.new("dog", options) # => /dog/ix
Instance Method Detail
Union. Returns a Regex that matches either of the operands. If either operand contains a named capture groups using the same name as a named capture group in the other operand, an ArgumentError will be raised at runtime. All capture groups in the second operand will have their indexes offset.
re = /skiing/i + /sledding/
re.match("Skiing") # => #<Regex::MatchData "Skiing">
re.match("sledding") # => #<Regex::MatchData "sledding">
Equality. Two regexes are equal if their sources and options are the same.
/abc/ == /abc/i # => false
/abc/i == /ABC/i # => false
/abc/i == /abc/i # => true
Case equality. This is equivalent to #match
or #=~
but only returns
true
or false
. Used in case
expressions. The special variable
$~
will contain a Regex::MatchData
if there was a match, nil
otherwise.
a = "HELLO"
b = case a
when /^[a-z]*$/
"Lower case"
when /^[A-Z]*$/
"Upper case"
else
"Mixed case"
end
b # => "Upper case"
Match. Matches a regular expression against other
and returns
the starting position of the match if other
is a matching String,
otherwise nil
. $~
will contain a Regex::MatchData if there was a match,
nil
otherwise.
/at/ =~ "input data" # => 7
/ax/ =~ "input data" # => nil
Match. When the argument is not a String, always returns nil
.
/at/ =~ "input data" # => 7
/ax/ =~ "input data" # => nil
Convert to String in literal format. Returns the source as a String in
Regex literal format, delimited in forward slashes (/
), with any
optional flags included.
/ab+c/ix.inspect # => "/ab+c/ix"
Match at character index. Matches a regular expression against String
str
. Starts at the character index given by pos
if given, otherwise at
the start of str
. Returns a Regex::MatchData
if str
matched, otherwise
nil
. $~
will contain the same value that was returned.
/(.)(.)(.)/.match("abc").not_nil![2] # => "b"
/(.)(.)/.match("abc", 1).not_nil![2] # => "c"
/(.)(.)/.match("クリスタル", 3).not_nil![2] # => "ル"
Match at byte index. Matches a regular expression against String
str
. Starts at the byte index given by pos
if given, otherwise at
the start of str
. Returns a Regex::MatchData if str
matched, otherwise
nil
. $~
will contain the same value that was returned.
/(.)(.)(.)/.match_at_byte_index("abc").not_nil![2] # => "b"
/(.)(.)/.match_at_byte_index("abc", 1).not_nil![2] # => "c"
/(.)(.)/.match_at_byte_index("クリスタル", 3).not_nil![2] # => "ス"
Returns a Hash where the values are the names of capture groups and the keys are their indexes. Non-named capture groups will not have entries in the Hash. Capture groups are indexed starting from 1.
/(.)/.name_table # => {}
/(?<foo>.)/.name_table # => {1 => "foo"}
/(?<foo>.)(?<bar>.)/.name_table # => {2 => "bar", 1 => "foo"}
/(.)(?<foo>.)(.)(?<bar>.)(.)/.name_table # => {4 => "bar", 2 => "foo"}
Return a Regex::Options
representing the optional flags applied to this Regex.
/ab+c/ix.options # => IGNORE_CASE, EXTENDED
Return the original String representation of the Regex pattern.
/ab+c/x.source # => "ab+c"
Convert to String in subpattern format. Produces a String which can be embedded in another Regex via interpolation, where it will be interpreted as a non-capturing subexpression in another regular expression.
re = /A*/i # => /A*/i
re.to_s # => "(?i-msx:A*)"
"Crystal".match(/t#{re}l/) # => #<Regex::MatchData "tal">
re = /A*/ # => "(?-imsx:A*)"
"Crystal".match(/t#{re}l/) # => nil