Data storage solutions
By Shreeram Geedh
August 22, 2020
Datasets can become quite huge. Datasets can also have many different schemas. For generating meaningful insights, you might also have to store data for multiple months or even years. Therefore, I cover the different storage options available, ranging from SQL and NoSQL databases, as well as ObjectStorage, some sort of dynamically scalable, fault-tolerant remote file system.
1] SQL :
The most common type of data storage system is the SQL database, and it might be a valid storage option.
Advantages of SQL database:
SQL databases are well known and understood. SQL skills are readily available in the developer community, and there are many resources available to help developers learn and use the technology. They support high integrity and even possibilities of data normalization. Subsets of data can be accessed immediately and fast through indexes. Finally, SQL is an open standard, and at least in theory, changing the database system doesn't need to change any source code.
Dis-advantages of SQL database:
On the other hand, IoT data coming from a variety of sensors and devices might imply a broad spectrum of different schemas, which depending on the use case, rapidly changes. In SQL databases, this implies the generation of SQL statements in data definition language, to change and migrating schema and often the data as well. SQL databases are up to scale, and finally, storage cost is high because SQL databases need very reliable storage systems, generally not tolerating temporary inaccessible storage.
NoSQL database systems, in contrast, don't force data to comply with any pre-defined schema.
Advantages of NoSQL database:
Each entry can have its schema, and the table can have a mixture of those entries having different schemas. The change of schema doesn't need any further steps. SQL databases can cope with disk radius, therefore cheap disks can be used pushing storage costs down. Finally, NoSQL databases are linear, scalable, so in case you need to double the amount of storage, just doubled the number of disks, and if you need to double the amount of processing power, just doubled the number of servers.
Disadvantages of NoSQL database:
This flexibility comes at a price. So generally, NoSQL databases don't support either normalization or data integrity, so the programs interacting with them have to take care of those properties if needed. You will also find fewer developers killed in those database systems in the market, although the number of NoSQL developers is steadily increasing. Generally, access to data is slower than in SQL databases. The main reason is that data is start and chase documents, this means the database must read and pass every document to respond to a search query. Custom use and indexes can address that problem, but never reached the query performance of the SQL database.
ObjectStorage behaves like a remote file system with a virtually unlimited amount of storage capacity and built-in high availability.
Advantages of ObjectStorage:
Storage cost is very low, it is hard to find online storage cheaper than ObjectStorage. As with NoSQL databases, ObjectStorage is linearly scalable, so you just store objects of files without needing to worry about the amount of data you are storing, you just get billed for every gigabyte you consume each month. Since ObjectStorage behaves more like a file system, there is no explicit schema definition in place. Therefore, schema migration is as smooth as a NoSQL database since the application itself has to take care of it.
Decide according to this :
So this leads us to the following decision matrix.
- In case you have a low amount of data and a very stable schema, you can go for SQL.
- As the number of data increases, all you have to cope with continuously changing schemas, you should go for NoSQL.
- Finally, on very high amounts of data or very schemas, even including audio, images, and video data, the primary choice is ObjectStorage.
You have learned that there are three main options: SQL, NoSQL, and ObjectStorage.
The choice depends on the Amount of Data you need to store and process, the different Schema Types you have to cope with, and especially the frequency of schema changes in your dataset since every schema change multiply the generation of DDL statements and data migration for SQL databases. Accessing subsets of your data based on query predicates is a strength of relational databases, but NoSQL databases are catching up. IoT data is mostly time series data. So you can even enable subsetting data on ObjectStorage by defining folder hierarchies reflecting years, months, days, and hours for example. In case you have to store images, audio, and video, the primary choice is ObjectStorage, since all other options would imply unnecessary overhead.
Now you have your satiable database storage, every database has its characteristics, it's your choice,
after all “In any moment of decision, the best thing you can do is the right thing. The worst thing you can do is nothing.”.