My Data, Your Algorithm

 

Vacheron Constantin Calibre 3750, the most complicated watch in the world. 

I was skimming the future of jobs report by WEF which was released in October 2020. Like most of the WEF reports, it is lucid and comprehensive. I strongly recommend you to have a look if you have not done yet. 

My key take aways from the report are: 
  1. Data & AI jobs are increasing
  2. Cloud computing jobs are rising as well
  3. Information technology super cluster is being splitted to more specialized sub clusters such as Data & AI, Cloud Computing, Engineering etc. 
  4. For popular jobs in Data & AI family, there is a very big skill gap
  5. Critical thinking, problem solving and self management are the top desired professional skills 
If your professional activity is around software, cloud technology, data and AI; and if you are a good problem solver, you are going to have good news at least until 2025. Even under pandemic measures... However, notice the huge skill gaps emphasized in the report. Nothing is going to be so easy, you need to learn, develop, adapt new professional skills faster and faster, in a continuous manner.

Being a person who started programming computers by using Commodore 64 in early 90s and living on computers for more than 20 years, I am lucky enough to watch how software related industry has been evolving. So in this post, I am planning to touch a topic which is very important for me in terms of being aware of what data & AI professionals should consider.

In the good old days, when a software engineer was requried to build a system, the steps we all were following could be roughly listed as follows:
  1. Define the required outputs of the system
  2. Design the interaction model and process flow
  3. Design the data flow
  4. Design data structures
  5. Build the algorithms
  6. Integrate
When we were followig such a discipline, the systems developed and all the sub components were authentic entities in most of the times. The main difference in software development and data business is that nowadays, no one is developing authentic algorithms any more. The tendency is using the algorithms of others as much as possible. Software reuse, remote procedure call and fostering frameworks were always popular topics, even in the ancient times but today, through shared libraries and APIs, literally no one is developing algorithms. Therefore, no one is designing decent data structures. Flat data in, flat data out. And I find this strange.

Especially in analytical model development process, companies are only configuring the algorithms of others by using native company data. At the and of the process, the model developed is just another instance of the foreign function which was originated from the algorithms of some other company. It is like you impose your memories into the brain of somebody else.

In the traditional software development process, the meta equation is like below

Equation 1: INPUT + INTERACTION + ALGORITHM = OUTPUT

In analytical model development, more or less, the equation becomes the one below

Equation 2: INPUT + OUTPUT = ALGORITHM

Please remember that, ALGORITHM of equation 2 is just a re-configured form of the algorithm of others.

Let's analyze this a bit more. Chronologically thinking, all the company data we are to use for developing machine learning models were generated by the software systems, which were developed traditionally, for years. That means INPUT + OUTPUT part of equation 2 is coming from equation 1. Therefore, data have been shaped by the authentic algorithms and interaction models for many many years. Today, you are trying to have a look at the company data to extract a version of some one other's algorithmic function. Isn't it strange too? I think it is.

Moreover, the notion of algorithm itself is not sufficient enough to model real world because an algorithm is a closed symbolic system. It takes inputs, processes finite steps and produces outputs. However, life is composed of interactions. Many interactive computer systems, which contain different algorithms, are running simultaneously, generating many events, getting feedbacks, triggering other systems etc. This bigger interactive picture is not reducible to algorithms.

On the other hand, in most of the cases, analytical models are developed by using non-inteactive, low dimensional, batch, historic data. By using that form of historic data, some one other's algorithm is tried to be configured to handle real life situations. Where is the effect of interactions there? Of course, some analytical models are formed to analyze streaming data. There is a proximity in this area but even such models are trained by following the batch data load practices. The scope of the problems to be solved by using analytical models can be narrowed down to fit into the nature of real life situations. This may handle the shortcomings of non-interactive, pure algorithmic approach. But it deserves another discussion...    
  
To sum up, I have 2 questions:
  1. Can we survive unless we create our authectic algorithms?
  2. How can we add the interaction notion better to the analytical model develoment process?
One of my bosses, whom I respect a lot, were saying "build the clock". I think we should follow the advice.   

No comments: