Database storage

Keep Me Forever

So, databases huh? Not so interesting…WRONG. Databases are the key component of any real system that actually ‘does’ anything. Granted some real systems that ‘do’ something don’t have nor require a database so we’ll leave them to one side for the time being and focus on the more common type with a database as some part of it.

Database are inherently interesting because of their importance in a system; they’re interesting because there’s so many flavours and different paradigms out there that making the right choice is A) Critical in producing a great system and B) Hard.

So, in this post we’re not going to talk about a specific database technology per se, there’s no real point there because every project will be suited by a particular technology and drawing up a pros and cons table for a general scenario isn’t really much use. What we’re going to discuss is the RDBMS vs NoSQL / Non-relational databases.

Until about 10 – 15 years ago the answer to the question, “Which database technology?” was invariable, “SQL of course!”. Slowly but surely relational databases have shifted into the background in favour of less structured, non-relational databases.

So, why has this been happening you ask? Well, we believe that it’s down to the huge increase of the number of connected devices and therefore, the amount and type of data being produced. The amounts of data now flowing from device to device, application to application is huge. We map every facet of modern life to understand who our users are and what they want (or what they need…?). This data needs to be stored somewhere doesn’t it! This new level of data that we must store is, mostly, heterogeneous. Previously the data was far more uniform and structured but now it’s much more fluid so storing it in a structured, schema driven store where we might not know what the data looks like when we design the schema is going to be, well, a little tricky viewers.

Why has data tended to become more heterogeneous? Well, thinking about the world, data is just data. Think about how we describe things, data doesn’t always conveniently fit into the same definition. Think human beings – imagine two people, different ages, skin tone, number of limbs. There’s so much possibly variation that highly structured data formats don’t make sense. If we had to define a schema to accommodate all the potential variations things quickly get out of hand and we end up with hugely complicated schemas or tables just to accommodate the variability of life.

Enter the NoSQL database, far less schema driven allowing a more flexible and fluid approach to storing data. The result of this is somewhere where we can store bunch of stuff that describes some other stuff. This allows us to map the complicated nature of the data that describes the real world which we are collecting data about.

That’s it? Well, 23Squared why don’t you just shut your silly little mouths because we can cope with, and don’t actually mind complicated database schemas… Well, there’s another problem. Most of the technical world is agile (or at least aim, read ‘pretend’, to be. The problem with schemas is that they aren’t agile. In a world where everyone is all about incremental, non-breaking changes making incremental updates to a schema is a breaking change. This is bad during development as it requires developers to update their databases with every new release. It’s far, far worse in production – how do you manage the switch over to the new schema? How do you migrate all the old data? What about new fields and tables that legacy data doesn’t exist for?

One of the other benefits of NoSQL dbs is that they can store multiple types of data, at the same time. “So can a relational SQL database!” we hear you screaming from the cheap seats at the back. Well yes, they can. They can, but not in a way that allows flexibility (yep, we really are gonna harp on about inflexible schemas!). For example, if a database stores information about a woodland, that contains trees, other plants and animals, a document store would be more appropriate because there’s lots of data that follows completely different formats and inheritance trees (think about a plant versus an animal – they only share ‘living thing’ in their hierarchy). A RDBMS would struggle with this, normalisation would make it very complicated whereas a docstore would cope with it relatively easily.

One of the other things that puts us off relational and, in particular, SQL databases is how they are most commonly used. Yeah sure, hibernate & JPA provide great interfaces to query the database, get the data out into a nice Java object without having to worry about any of that nasty ‘ol SQL but wouldn’t it be nice to just be able to get data without having our codebase with extra framework code. Wouldn’t it just be cool to be able to do a REST call to get our data and return it in json? Yes, this is perfectly feasible in SQL but just seems to be a lot less of a common approach. The problem with tightly coupling your database to your implementation language is that you’re kinda stuck if suddenly, after writing the whole system, someone says you can’t use that particular implementation language anymore…? Wouldn’t it have been better to represent the data as just data in a format like json or xml which doesn’t force a particular implementation technology?

So, are we saying that nobody should use SQL / RDBMS databases? No, we aren’t. They certainly do have their place but it’s a much smaller one that people believe – because lots of people are comfortable with the technology or perhaps it fits into their company structure or ethos they simply assume that this is the correct answer – it just happens!

Hopefully the fear around using of NoSQL databases will subside because NoSQL databases have proven themselves again and again. With the advent of systems like ElasticSearch becoming near industry standard which base themselves on document stores hopefully we will see an end to the blind and ubiquitous use of RDBMS  databases.

Thanks for listening viewers, we hope that next time you think databases. Remember, you have the strength to say no to SQL!