YAML in a cake? No, thank you!

While implementing fixture support in the testsuite the question arised which format should be used for the fixtures? The first answer was: YAML, of course. It is used in Ruby on Rails, so it cannot be bad ;-) Hm. Let’s have a look at a simple YAML example:

// urls.yml
cakephp:
  id: 1
  name: CakePHP website
  url: http://www.cakephp.org

manual:
  id: 2
  name: CakePHP manual
  url: http://manual.cakephp.org

It looks nice. But there is one “problem”. It violates the DRY (Don’t repeat yourself) principle: the column names are repeated in each record.

So I decided to use a different approach: plain PHP. It is simple: each fixture is a class, and each record is a function. The YAML example from above rewritten as a class looks like:

class Urls
{
    function cakephp()
    {
        return array(1, 'CakePHP website', 'http://www.cakephp.org');
    }

    function manual()
    {
        return array(2, 'CakePHP manual', 'http://manual.cakephp.org');
    }
}

Simple, isn’t it?

10 Comments

  1. Posted August 20, 2006 at 6:14 pm | Permalink

    And where do you define the order of the columns? Does it take the order in the database?

    Not sure I like your approach because there is NO inline documentation. And a change in the table layout will probably require a re-write of every single fixture for that table. Unless you always add columns at the end – which I don’t.

  2. Posted August 20, 2006 at 6:52 pm | Permalink

    Agreed with Patrice. I know that apparently the Cake Overlords decided they didn’t want YAML in Cake, but I think that repeating column names is much better than having no column names at all.

  3. Posted August 21, 2006 at 2:11 am | Permalink

    “No Magic Numbers” is a far, far more important rule than DRY. Your Cake implementation has the magic numbers 0, 1 and 2. To eliminate them, we need to use an associative array and define meaningful names for the keys. Which would eliminate the one ‘advantage’ of the PHP method and leave us with a more verbose syntax than the YAML option.

    Premature optimization is the root of all evil. — Donald Knuth

  4. Posted August 21, 2006 at 3:27 am | Permalink

    I also agree with the comments above. The PHP solution uses more syntax and it seems less flexible. Plus the repetition of the PHP syntax itself seems less DRY than simply repeating column names.

  5. Posted August 21, 2006 at 5:40 pm | Permalink

    @all: Thanks for your feedback. You are right, column names are useful, so I added an array to the class:

    var $columns = array(‘id’, ‘name’, ‘url’);

    DRY was not the only reason I have chosen plain PHP over YAML:
    – a PHP solution is more consistent with cake
    – there is IDE support for PHP, but not for YAML (thanks to the templates of my IDE I am faster writing a method than a YAML record)
    – YAML looks ugly if you mix it with code

  6. Posted August 22, 2006 at 3:34 am | Permalink

    The PHP solution is more consistent at the cost of being more verbose:

    :id:name:url:

    versus

    function(){returnarray(,”,”);}

    (excluding whitespace). Also note that the PHP version requires the class wrapper and the columns array, neither of which are required for the YAML version.

    Lack of IDE support just means you need a better IDE. :)

    And I think code tends to look ugly next to YAML, but that is just an aesthetic choice.

    Trade-offs, trade-offs, trade-offs. Everything is trade-offs. :)

  7. Posted August 24, 2006 at 7:14 am | Permalink

    @scott lewis: Yeah, you are right, in software engineering there are always trade-offs you have to make.

  8. Posted August 26, 2006 at 6:30 pm | Permalink

    I think I like YAML a lot, I’m using it for my unit testing data right now and I’m very pleased with the efficiency.

    Regarding your problem with it’s DRY’ness, you could do something like this:

    posts_table_definition:
    – id
    – name
    – url

    cakephp:
    – 1
    – CakePHP website
    http://www.cakephp.org

    manual:
    – 2
    – CakePHP manual
    http://manual.cakephp.org

    But I don’t even see why you’d want to do that. When you need to change table fields, you can run a quick search & replace, which your IDE should handle, even for YAML, and your done. The entire point of YAML is that it’s not about DRY’ness but about readability for humas, which is exactly what you want when dealing with test data.

    Regarding your other issues with it:
    – Yes, PHP is more consistent with Cake. But only because Cake goes with Conventions over Configuration, otherwise PHP would maybe never have become the language used for configuration files. But we are not talking about configuration files here, we are talking about Domain data. And data is usally stored in the database when working CakePHP. Since database isn’t the first choice for test fixtures, I think YAML wins over php as a form of data storage.

    – Another concept of YAML is, that you don’t *need* an IDE to be reasonably able to edit it. But you do need one for php, so again, in my oppinion YAML wins.

    – “YAML looks ugly if you mix it with code”: Your sql database looks ugly when you mix it with code too? Or do you mean putting business logic in your yaml files? I wouldn’t see why that would be needed. Different fixtures, different files.

  9. Posted August 30, 2006 at 9:37 am | Permalink

    @Felix: The point with DRY’ness is not the updating of the field names, but the creation of the records. I am lazy, so I don’t like to repeat the column names again and again (which is not necessary with your posted example). Readability is IMHO not that important, as the audience for the test data are developers. So the test data must be readable for developers, but not necessary for everyone else.

    The “problem” with fixtures is that they sometimes do not store plain data. Fixtures (as in Railsland) can contain loops or function calls, so in my opinion it is wrong to store fixtures in a data format. MAybe the format shown in the post is not perfect and can be improved, but I think it is the direction to go.

    I think the concept of YAML that you do not need an IDE to reasonably edit the files is not really important for developers. I don’t know how you work, but if I write tests, I am always in my IDE.

    Well, in Rails you can mix your YAML files with Ruby snippets to allow for dynamic fixtures, so you can do something like (the example is in PHP):

    password: <?php echo md5(‘thepassword’); ?>

  10. Posted September 7, 2006 at 6:45 pm | Permalink

    Ok, I think I can see your point as well. Yes, I write my tests with an IDE as well and I think this shouldn’t be the decision making factor for fixtures. However, I think readability of your fixtures *does* matter:

    I am a developer, I can read insane arrays containing 6 or 7 nested other arrays (thanks to paranthese matching of my IDE), but I don’t want to. When writing tests (and apps too), I try to apply KISS as much as possible because if your tests are complicated they might as well be faulty which is not acceptable for TDD. Now I think managing nested arrays for Cake Model data can work alright, but recently I wrote a GIF file decoder & encoder in PHP which required tons of nested arrays and fixture data. YAML was *very* helpful throughout the entire process of cataloging a set of images to see if the decoder can decode them successfuly. So regarding the test suite you could leave the choice to the developer on what fixture type they want to use. Inline PHP can be done with a little regex and eval.


%d bloggers like this: