Development/Databases

Background
The Growstuff site stores its data in a database system called PostgreSQL; in development and testing we use a lighter-weight system called SQLite. These are both relational database management systems (RDBMSes). The general idea of relational databases is


 * data is stored in tables;
 * you should never store the same piece of information more than once (or else the two copies will get out of sync and then you'll never know which one is correct);
 * to achieve this, obsessively add cross-references between the tables.

So instead of storing information about the owner of a garden along with the rest of the garden, we split the information into a "members" table and a "gardens" table, and to each entry in the "gardens" table we add a cross-reference to the entry in the "members" table describing that garden's owner. Then, if we need to change the information we hold about the owner of the garden, we only have to change it in one place: the "members" table. Every time we look up the owner of a garden we follow the cross-reference to the "members" table and get up-to-date information on the owner.

Let's look at how that works concretely. The following code is taken from db/schema.rb:

create_table "gardens", :force => true do |t| t.string  "name",       :null => false t.integer "owner_id" t.string  "slug",       :null => false t.datetime "created_at", :null => false t.datetime "updated_at", :null => false end

add_index "gardens", ["slug"], :name => "index_gardens_on_slug", :unique => true add_index "gardens", ["owner_id"], :name => "index_gardens_on_user_id"

Each member has a unique integer called id associated with them. The id column is called a primary key, and Rails generates them for us. The code above generates the Gardens table: note that it has a column called owner_id, which contains the id of the owner of each garden. This is called a foreign key. The combination of primary and foreign keys allows us to do cross-referencing queries (called joins) reliably.

The add_index lines add indices to the Gardens table (unsurprisingly) - these make it faster to look up gardens by their owner_id or slug (the friendly text in the URL). This means we can quickly find all of Skud's gardens, or the data required to display the page http://growstuff.org/gardens/kew.

Relational databases are good for ensuring data integrity and for complex, ad-hoc queries. They are very difficult to scale to datasets that are too large to fit on one machine, but this is unlikely to be a problem for us for a while (if ever).

For a longer worked example, see this blog post. For more on relational databases, see Wikibooks. Once you've got your head around joins and keys, read the essay Gay Marriage: The Database Engineering Perspective, an entertaining look at some of the issues that arise when designing relational database schemas.

Why two RDBMSes?
Using two database systems like this exposes us to problems arising from differences between PostgreSQL and SQLite. Such problems would be difficult to debug because we'd be using the wrong database on our development boxes. We're doing it this way because Postgres can be complex to install and set up, and we want to make it as easy as possible to get started coding on Growstuff. If this proves to be a problem we can revisit this decision later.

ActiveRecord
Rails talks to the database using an object-relational mapping (ORM) system called ActiveRecord. This does what it says on the tin: every row in a database table is represented as a Ruby object. Fields in the row become member variables of the object; foreign keys turn into accessor methods, with the relevant join queries performed for you (so in Ruby code, you could write garden.owner and get a Ruby object representing garden</tt>'s owner). Look in the Ruby files under app/models</tt>: you'll see that every class inherits from ActiveRecord::Base.

Migrations
Rails also makes it comparatively easy to update the database's schema (the layout of the tables and indices). Look under db/migrate</tt>, and you'll see lots of Ruby scripts with names like 20121027035231_add_slug_to_crops.rb</tt>. Each script (called a migration) represents one change to the database's schema. When you run rake db:migrate</tt>, Rake figures out which migrations have been applied to your database and which haven't, and runs the new migrations to bring your database schema up to date. It also updates db/schema.rb</tt> so it accurately describes the state of your database - don't edit this file by hand! Instead, write migrations to make the changes you want. You can learn more about migrations at Ruby on Rails Guides: Migrations.

There are some non-obvious migration antipatterns, such as putting model code in your migrations; see this blog post for an explanation of some of the most common.

Seeds
The file db/seeds.rb</tt> contains Ruby code for initialising the database with seed data - this is the best way to ensure that your database contains some standard data. Run this file using rake db:seed</tt>.