Things I wish I learned about databases

I loved Database Design. It’s an elective class in a lot of CS curriculums, but I think it should absolutely be required in today’s data-driven ecosystem. Understanding how to structure data goes way beyond SQL, goes way beyond relational algebra. It’s a skill that falls more under liberal arts and general knowledge than ‘strict engineering’.

I’m not saying that American Studies majors should learn about Boyce-Codd Normal Form, but tools like Django and Rails are putting journalists, designers, and mathematicians in a position to create data structures, who (bless them) often don’t know what they’re doing.

I wish we covered legacy databases

I’m working at a small branch of a national foreign aid tracker. As a dozen person dev team, we’re in charge of a database that’s comfortably non-trivial; millions of rows, hundreds of tables. It’s outside the scope of one person’s ability to wrangle it or even parse it; the spawn of multiple dev teams, multiple ORMs (Django’s insistence on prefacing tables with an app name has led to, amongst other travesties, survey_trip_survey_response_option_related_response_option), coders and non-coders, people hungry for information.

This thing is the Whomping Willow of databases, and I don’t know the secret switch to calm it down long enough to normalize it. In class, we barely covered the ‘formal process of normalization’, but that was either with dummy databases or a zero-stake scenario: there’s no chance I can spend a week sifting through each entity and create an entirely new database. My work is triage, and I’ve learned to piecemeal my progress by supplementing my short-term progress with long-term efforts to work on the underlying schemas.

I wish we covered how to communicate good practices

I wrote a one-page memo to everyone on our team who had administrative access on the database, outlining some basic steps to manage the growing maelstrom of information. In short, they were:

  • Data shouldn’t be redundant. If you’ve got a list of publications that you need to use somewhere else, it’s better to reference that list than recreate it.
  • Looking through a million rows takes time, but it might be quicker — and easier to read — than joining four tables that are each ten thousand rows and then reading through those.
  • The data you’re looking at is the same as the data everybody else is looking at. The data you modify is the same, too.
  • Django doesn’t know everything, and neither does anyone else. Before creating or deleting anything, ask someone else.

I’ve been able to find metaphors for foreign keys and primary keys, but we spent so much time jumping straight to the code aspect of things that learning database design outside of an engineering context fell by the wayside.

I wish we covered ORMs

By the time someone takes a database design course, they’ve taken at minimum software development, data structures, and probably algorithms. They’re well versed in class syntax, in storage and inheritence.

ORMs are really easy. They distill SQL to the important parts and allow for practical understanding of the fundamentals needed to write great SQL and craft great schemas — unlike relational algebra. Pretend you know nothing in the world except Python. Which is nicer?:

Respondents.objects.filter(title_contains=‘Professor’)

or

SELECT * FROM ARCANE_TABLE_NAME_RESPONDENTS
WHERE ARCANE_TABLE_NAME.ANTIQUATED_TITLE_ATTRIBUTE LIKE ‘%Professor%’

I’m not saying SQL isn’t important — a Databases class without SQL is a Algorithms class without sorting, but in my mind there’s a progression of steps in an ideal Databases syllabus:

  1. What a database looks like
  2. How to interact (create, remove, update, delete) from a database
  3. Why you use databases
  4. How to create a database
  5. How best to interact with a database
  6. How best to create a database

You need SQL — and the attunement with a database it offers — for the last two. But by teaching SQL immediately alongside the fundamentals of database access, you take the spotlight off of the latter. I graduated the class knowing how to perfectly chain OUTER JOINs but unable to create extensible databases in the real world; and to be honest, it’s not that tough to Google SQL syntax.

  1. voigtlander reblogged this from justyduke
  2. justyduke posted this