Monday, June 8, 2015

Another thought on diversity

A couple months ago, I was both amused and encouraged by the example of camaraderie among people who can be notorious for their religious wars factionalism. 

Today I had occasion to muse upon diversity again, albeit from a different angle.

I'm working on the database of an application that will--if all pans out--eventually be available to professionals in Canada and the United States (at a minimum).  The superficial view is that we can get by with just an English-language version.  At least during an invitation-only beta.  In the long run, however, French and Spanish will have to be added.

If my Gentle Reader is not a programmer, the only thing that s/he needs to understand is that there are three significant classes of numbers in data design (and programming for it):
  • One
  • Zero
  • Any number greater than one
As Three Dog Night famously pointed out, one is the loneliest number.  But it's also (by far) the simplest to wrap our brains (and code) around.  For example:  Currently, we each only have one height, one weight, one age.  Easy-peasy, right?

If a programmer is creating a table,  s/he can get away with a spreadsheet-like model.  For instance, each customer would have their data on a single row:  Account ID, First Name, Last Name, Middle Initial, Current Balance, and so on. 

Zero complicates matters.  For instance, I had a chequing account years before my driver's license.  Which sometimes made it awkward at the counter, at least in the U.S., because that magic rectangle of plastic was somehow a talisman against the cheque bouncing.  (In essence, one driver's license number was expected, but it was just plausible that zero was an acceptable number.)  Usually the clerk and I worked something out that involved my University ID and my Social Security card.  To some extent, that prepared me for the limbo between the time Dennis & I pulled up stakes for Canada and the time we obtained our Social Insurance Numbers (which things like utilities, banks, etc. expect to set up accounts).  Again, it was just possible that zero was an acceptable number, but extra hoop-jumping (and sometimes expense) was involved.

To the programmer, it means some extra brain-work.  One has to remember to set database tables to accept no (a.k.a. NULL) values.  One also has to remember to allow "None of the above"-type options in drop-down boxes.  Also to give "none" or NULL a pass when validating data passing in or out of the database.  Extra brain-work, naturally, translates to extra time, and of course extra expense.

For numbers over one, though?  Hoo-boy...that's a whole different ball-game.  Let's circle back (in management-speak) to the language thing.  For this particular application, we can set up each client to have a preferred language.  But--particularly among those who speak French and/or Spanish--it's more than likely that a client will be fluent in more than one language.

If we were still storing client info. in a spreadsheet, we'd simply have a "Preferred Language" column and another titled something like "Second Language."  In most cases, that column would just be wasted space.  And wait a minute:  What about people who are fluent in more than two languages?  Case in point:  One of my step-in-laws speaks six languages.  (He's originally from the Netherlands and now lives in the U.S., which pretty much guarantees four languages.) 

That's the point where a programmer trades in the spreadsheet for something called a relational database.  The database replaces the single table of clients with three:  One for basic client info.  (all that "one" stuff like name, age, balance, etc.), one small one that merely lists the supported languages, and another table that links clients and languages.  The beauty of that arrangement is threefold:
  1. There's no wasted space from that unused "Second Language" column.  Which may not seem like a big deal for a few hundred records, but when you scale into the tens of thousands and millions really adds up.
  2. The solution is infinitely scalable--it can accommodate the linguistic xenophobe as well as the Babel Fish with no hacks required.
  3. (Bonus) The data stays cleaner.  If the "Preferred Language" and "Second Language" are text fields, then someone will inevitably misspell language names.  That just corrupts the data.  In the relational model, the third (i.e. "association") table links clients and languages by their ID numbers.  Which not only takes up less hard drive space (numbers are cheaper to store than sets of letters), but makes it much, muuuuuch harder to screw things up.
Of course, it adds a level of complexity to the way the code interacts with the database.  But here's the thing:  In my dozen-plus years of coding, I've found that 1-to-1 relationships between data are the exception rather than the norm.  In the end, it's easier (meaning less painful and expensive) to bake in diversity from the get-go than try to hack it in after the fact.

And, as it turns out, that's not the worst metaphor for coping with the world, either.   Just sayin'.