Coding with Honour

The personal blog of Sam Stokes.

Regex style in Ruby

| Comments

Reading Patrick McKenzie’s excellent practical example of metaprogramming, I came across a line of code I didn’t understand:

caller[0][/`([^']*)'/, 1]

That line taught me three new things about Ruby:

  1. The syntax for the subscript operator [] allows multiple arguments. (It turns out I already knew this in another context: [1,1,2,3,5,7][2,3] => [2,3,5])
  2. You can subscript a String with a Regexp, returning the first match: "goal"[/[aeiou]/] => "o" (nil is returned if there is no match).
  3. If you throw in an index n, then you get the nth capturing group of the first match: "xaabb"[/(.)\1/, 1] => "a" (or nil again if no match).

That last one is interesting, because it means there’s a concise way I didn’t previously know about to achieve a common regex task: checking if an input string matches a given format, and if so, extracting part of the format. Say we want to pull out the domain from an email address, but complain if we can’t find it:

"foo@example.com"[/@(.*)/, 1] or raise "bad email"
# => "example.com"

Before learning this trick I would have either used a temporary match object a la Java, or gritted my teeth and used a global variable Perl-style:

match = /@(.*)/.match("foo@example.com")
if match
  match[1]
else
  raise "bad email"
end
# => "example.com"

if "foo@example.com" =~ /@(.*)/
  $1
else
  raise "bad email"
end
# => "example.com"

Both of those seem rather verbose. They can be golfed into one-liners, but the readability starts to suffer:

$1 if "foo@example.com" =~ /@(.*)/ or raise "bad email" # => "example.com"
 
require 'andand'
/@(.*)/.match("foo@example.com").andand[1] # => "example.com"

So I’m left wondering what’s the most readable and/or idiomatic style for regexes in Ruby. TMTOWTDI indeed! Even now I know what it means, "xaabb"[/(.)\1/,1] makes me double-take slightly - it’s an unusual way to use [] - but I guess it’s just another Ruby idiosyncracy I’ll come to know and love.

Comments