Data, Big or Not

The contents page of Database Management book by Esen Ozkarahan: Databases were never merely relational ones.

Nowadays, technology vendors of various sizes are knocking the doors of the enterprises and offering them to use their "big data" solutions. These solutions are mostly appliance systems comprise a noSQL database, a data integration tool, a data analysis application and a data presentation application. People are bringing you boxes that are supposed to help you understand your business better so that you can react in a more proper way.

Beyond that aim, I think the industry has a strong tendency for locating big data concept which raises on pales of unstructured data such as voice, image, letters, e-mail, video, social media content etc. It's becoming a sort of fetishism. I know technology leaders and managers who may excrete considerable amount of adrenalin, that you can smell, while talking on big data solutions and the opportunities these solutions can bring.

However, what's the theory?
What do you mean by saying "unstructured"?
What questions would you like to answer for your instutition by using big data?
What are the theoretical limits?

The list of such questions may get longer easily and most of the questions are open ended usually.

My thoughts are...

I think, for the time being, it is not possible to extract a supreme meaning from the stored data flowing through different sources in an insane speed. It's like trying to reach the meaning of life. It's like uploading all Franz Kafka bibliography into a noSQL database and asking: "what is he talking about?". It does not make any sense to me.

I have a few words to say about the non existent "structure" of the data being mentioned. Actually, by naming data "unstructured", we're referring to the fact that data groups have got no known attributes to be stored in classified database entities and because of that, we cannot navigate and process the data easily. I think, in reality, all those data groups have special structures but we cannot apply a "one fits all" data pattern to the whole set. For instance, think of an e-mail you sent to your bank, it, of course, has a structure described by yourself intrinsically. Any one reading your e-mail can point which paragraph is the introduction, in which section you expressed your intention etc. On the other hand, it can be interpreted differently by various readers. Actually, nature and the essence of the text is a very deep phenomenon studied by a large number of philosophers such as Roland Barthes, Jacques Derrida, Claude Lévi-Strauss, Louis Althusser and Michel Foucault. There is no easy equation in this realm.

As meaning distillation from the non structured data sets is a complex process, most of the tools in the market are offering to get not the meanings but the sentiments out of the large chunks of the data gathered. It sounds good but there is a question here for the enterprises to answer: "do you wonder the sentiments of the people interacting with your institution?". If the answer is "yes", your corporations have been expected to save the sentiments of the people your companies are communicating thousands of times an hour through various channels and enrich the structured databases with their state of the feelings already. Moreover, your organizations are expected to analyze the sentiments collected and generate reports about them; and take some actions on the results of the sentiment analysis in practice. If your corporation is not taking these steps and expects to get the feelings of the customers after locating a big data solution, there is an ambiguity there. In addition, we have to keep in mind that sentiments are usually temporary oscillations. Therefore, if you are interested in sensing and processing them, you've got to do it in a timely manner which is another serious problem to solve.

Methods and the traditons of taking care of the data, defining the data and converting them into information and knowledge consecutively is transforming. We are not living in a world of black and white any more. Actually, the world has never been a black and white arena but we were living in an illusion after we have reduced the complexity of life by sacrifying some tastes of it. But the illusion ended. It's no more sufficient. The concept of the data has a weak point that data are very static when you store them in your databases. In contradiction, data usually flow in their natural habitat. The reflection of this natural flow to your database is the "model" you develop while you're defining the data. You can bring the dynamics of real world by using creative data modelling techniques. Models may be used to close the gap between your database and real life, where data generation happens constantly, but unfortunatelly, the models are static either. I'm sure that most of you have heard about the term of "slowly changing dimensions". In a similar approach, we have to devise slowly changing data models for chaging the form of the data in our databases for reflecting the life better in an automated way. To make the story short, I can say that storing the raw data in a noSQL database by omitting the model, that is a very powerful way to inject dynamism to your database, is a very weak approach. You can derive nothing from this pale.

What would be the proper approach?

To me, gathering the facts from a million data sources and putting them in a box for analyzing them later is an already dead method. You have to make your move in the right time, for the right people and you have to come with a meaningful message which is capable of showing the intellectual level of your company.

Simplicity is the key.

Sense the simple events properly and analyze them in a real-time fashion. Use easy to understant and "to the point" scenarios for evalution. Complexity of life is composed of simple but divergent transactions. Therefore, try to define the key transactions of your domain firstly. Then, try to inject the value to the transactions by assessing your institutional processes which are to distinguish you from your rivals.

Your process is your identity.

Data and the tools used for collecting and processing data are mostly the same but the processes of the organizations differ significantly. Let your customers be the actors of your authentic scenarios. People are irrational, but you always try to develop rational and mathematically consistent models for assessing your customers' behaviours, data etc. Try to be humane. Use human beings (your personnel) for communication, sales, giving meaning to the transactions. Some scalability problems may occur; use crowd sourcing or some other creative solutions when it happens.

No magic box can bring your organization money.


No comments: