define_method is an odd creature. Paul Gross’s post about its unusual behaviour inspired me to do some research. I even went and browsed through Ruby’s source to attempt to understand it, but that turned out to be unnecessary: I reimplemented it in pure1 Ruby. In this post, I’m going to try and explain why define_method behaves the way it does.
Methods vs. blocks
Methods and blocks in Ruby are similar things; they are both containers for code, and both of them can accept a number of arguments. On the other hand, the scoping rules and argument semantics for blocks and methods are very, very different.
Let’s start with scope. Blocks have access to the local variables of the scope they are created in; this is called static scoping. This implies that they can be used to create closures. Closures are a very useful construct for callbacks, for example2. Methods do not have this property.
Another difference is the way they handle arguments. Methods are pretty anal about them; if they expect you to give them three, you’d better give them three or they will complain by raising an ArgumentError.
# relevant output is in the comments def foo(arg) arg end foo("bar") # => "bar" foo # ArgumentError: wrong number of arguments (0 for 1) foo("bar", "baz") # ArgumentError: wrong number of arguments (2 for 1)
Blocks are a different story. In general, they are a bit looser about accepting arguments they don’t expect. It depends, however. This is an aspect of Ruby that I’ve always found very confusing: there are two kinds of argument semantics for blocks. To understand how this is possible, a couple of things have to be explained first.
Procs vs. lambdas
There are two ways to create an object that encapsulates a block in Ruby: lambda3 or Proc.new. Let’s call the former type of object a lambda, and the latter a proc4. Procs and lambdas can be executed by using their call method. Procs and lambdas have different argument semantics, amongst a few other differences. Let’s compare them:
p2 = Proc.new { |foo, bar| [foo, bar] } l2 = lambda { |foo, bar| [foo, bar] } p2.call("foo", "bar") # => ["foo", "bar"] p2.call("foo") # => ["foo", nil] p2.call # => [nil, nil] p2.call("foo", "bar", "baz") # => ["foo", "bar"] l2.call("foo", "bar") # => ["foo", "bar"] l2.call("foo") # ArgumentError: wrong number of arguments (1 for 2) l2.call # ArgumentError: wrong number of arguments (0 for 2) l2.call("foo", "bar", "baz") # ArgumentError: wrong number of arguments (3 for 2)
As you can see, procs really don’t care about what they’re given. Missing arguments are replaced with nil, and obsolete arguments are just ignored. Lambdas, on the other hand, seem to be as strict as methods. However, this is only true for lambdas with two or more arguments. Wha?
Yeah, I know.
It gets worse
When a proc or lambda is created from a block with 0 or 1 arguments, different rules apply. Let’s look at a block with no arguments first:
p0 = Proc.new { puts "nothing here" } l0 = lambda { puts "nothing here" } p0.call # => nil p0.call("foo") # => nil l0.call # => nil l0.call("foo") # => nil
Looks like both of them don’t really care: all arguments are ignored. This was to be expected for procs, but for lambdas this behaviour is a bit surprising. When our block has a single argument, other mechanisms come into play:
p1 = Proc.new { |foo| foo } l1 = lambda { |foo| foo } p1.call("foo") # => "foo" p1.call("foo", "bar") # => ["foo", "bar"] # warning: multiple values for a block parameter (2 for 1) p1.call # => nil # warning: multiple values for a block parameter (0 for 1) l1.call("foo") # => "foo" l1.call("foo", "bar") # => ["foo", "bar"] # warning: multiple values for a block parameter (2 for 1) l1.call # => nil # warning: multiple values for a block parameter (0 for 1)
Again, lambdas and procs behave identically. When they are called with too many arguments, these are interpreted as an array, and a warning is issued. It’s almost like the splat operator is implicitly present. What’s a bit curious is that something similar also happens when no arguments are supplied; except nil is assigned to the argument, rather than [], which I would have found a more logical choice.
Strict vs. loose
Let’s bring some structure into all of this; we could say that there are two kinds of argument semantics in Ruby: strict and loose. Methods always have strict semantics. Procs always have loose semantics. Lambdas have loose semantics when they have 0 or 1 arguments, and strict semantics otherwise.
Of course, turning blocks into objects isn’t something we do very often. At least I don’t — yield is so much handier! It looks like using yield and implicit block parameters gives you loose semantics:
def run(*args) yield(*args) end run("bar", "baz") { |bar, baz| [bar, baz] } # => ["bar", "baz"] run("bar") { |bar, baz| [bar, baz] } # => ["bar", nil] run("bar", "baz", "bananas") { |bar, baz| [bar, baz] } # => ["bar", "baz"]
I can’t think of any good reasons why these things are the way they are. It’s quite complicated and not very well documented. If anyone knows any, please don’t hesitate to let me know.
Enter define_method
define_method lets you define methods dynamically. It turns a block in to a method, pretty much. As if things weren’t confusing enough! It exhibits some pretty strange behaviour, as mentioned in Paul Gross’s post. But, knowing what we know now about argument semantics, it looks a lot like define_method turns blocks into lambdas and lets you call them as methods.
First, I just want to illustrate how define_method can indeed be useful because it allows you to make use of static scoping:
class Foo bar = "bar" def get_bar bar end end Foo.new.get_bar # NameError: undefined local variable or method `bar' class Foo bar = "bar" define_method :get_bar do bar end end Foo.new.get_bar # => "bar"
I am aware that as examples go, this one is pretty contrived. I’m sorry, I can’t think of anything better
I figured out all of this by trying to reimplement define_method in pure Ruby. The whole proc vs. lambda thing was the missing link. Once I realised this, it was pretty straightforward. I’ll show you the end result, but I’m warning you, it isn’t pretty:
class Foo @@blocks = {} def self.my_def(name, &blk) @@blocks[name] = lambda(&blk) # IMPORTANT! eval(" def #{name}(*args) @@blocks[:#{name}].call(*args) end ") end my_def :no_args do p "no args" end my_def :one_arg do |one| p one end my_def :two_args do |one, two| p one p two end end
This is basically Paul Gross’s example, but I replaced define_method by my own implementation. I used a class variable hash to store the blocks, because I had no idea how to make them available inside the created methods otherwise. Remember that method definitions are not statically scoped
There is hope yet
It is said that a lot of this has changed in Ruby 1.9. I haven’t gotten round to testing it yet, but I think I will. I hope they have considerably simplified these things. They should, if they haven’t.
Notes
- ↑1 Well, pure… it isn’t pretty, but it is Ruby
- ↑2 I’m not going to discuss the virtues of closures and static scoping here; enough people have already done that. I’m sure Google will be glad to help you out if you want to know more!
- ↑3 Note that
procis an alias forlambda, to make things even more confusing. In addition, the created object’s class isProcin both cases! - ↑4 This is my own convention; I’ve seen other people use these terms in the same fashion, but nobody can stop you from calling a lambda a proc, really. I mean, even Ruby thinks it’s a
Proc
2 Comments
I just ran most of your examples in Ruby 1.9, where things have been made less confusing. The result is:
* Methods created with define_method work the same as normal methods.
* Lambdas have the same argument semantics as methods regardless of arity.
* Non-lambda procs of arity 1 no longer pack multiple arguments into an array: like multiple-argument procs they just ignore the extras.
Also:
* In Ruby 1.9, proc is an alias for Proc.new rather than lambda.
* It bears mentioning that blocks (not bundled into any flavor of Proc) behave like Proc.new (in both 1.8 and 1.9).
* In Ruby 1.8, you can get a one-argument block/Proc to take the first arg and ignore the rest by declaring your argument with a trailing comma.
Well, that’s good news! Sounds like they pretty much “solved” it
The only use case for the array packing behaviour that I can think of is
each_slice, pretty much. I won’t miss it, at any rate.I hadn’t heard of the trailing comma idiom before. Thanks for pointing that out! Oh, I did mention the default behaviour of unbundled blocks (e.g. loose semantics), by the way