I, Robot



I have been thinking of composing my ideas on robotic process automation for a while. My initial gut feeling was like “there is something wrong about it”. However, to be more serious, I've read about it, examined the current market and tools, learnt typical use cases and challenges so that I postponed the blog post until this day. And today, I am still saying that “something is wrong”.
I am living on enterprise software development and information systems automation for about 20 years. I have seen mainframe systems, client server systems, three tiered systems, n tiered systems, distributed ones, comet servers, cloud based systems, re-invention of virtualization and I am still experiencing new models of computation and automation.

Treat a macro as it is a macro

In my interpretation, RPA approach is a type of macro. We are familiar with macros since assembly programming language was dominating the realm of software development. Macros are good for improving a person's repetitive task performance and not good for enterprise automation.
People have been using macros for decades, solving their information processing problems in a, so called, IT independent way. Remember the times you faced big macro embedded spreadsheets, actively in use in a department, where you are conducting systems analysis for better automation. You must read the macro codes, extract the logic and after perfecting it, design the enterprise automation in a more manageable and accountable way. We were calling that hidden macro code silos as shadow IT. Our mission was illuminating the shadow.
Nowadays, people are calling those department based macro codes “robots”. Moreover, they are saying that they optimize and automate processes by using those robots. The only difference is, they don't want you to code your macros in spreadsheet applications but by using some RPA tools.
It is wrong. It is still the macro. And it must be in personal self-service use in departments no matter the department is in IT organization or not. You must not try to automate your enterprise wide processes by using macros. Full stop.

Record 'n Play

We are living in 21st century in the middle of information boost, hyper connectivity, remote cars on the surface of Mars but we call a record and play based fancy software a robot. Asimov must be aching in his grave. Please have a look at the video below. 



Those were the days we call macros as macros, not robots. Today, many RPA tools are still recording and playing. Very interesting, right?


Process engineering and automation

RPA approach proposing that if you have some rule based repetitive tasks handled by human beings in your processes, you can create a macro code for eliminating human task force in the operations given. So that you can reduce the operational expenses in a very fast way. Trivial jobs to be done by macro codes, faster than a human, more accurate than a human, cheaper than human manXhours. Sounds good. Some limitations of contemporary RPA projects are (a) no critical communications can be handled, (b) no critical decisions can be made, (c) no rules, no actions. That means, no intelligence.
Although, some AI algorithms are being tried to be embedded in RPA macro codes, companies cannot take the risk of millions of wrong messages sent to customers or delegation of critical decision making to algorithms because of possible legal consequences. So there is still time for achieving so called cognitive RPA. That means, no intelligence again.
When it comes to process automation. It's becoming more interesting. Imagine your are reviewing a company process and you found out that a team of people is checking a screen, controlling the values in some text boxes considering some rules and approve or reject based on those rules. And at the end of the shift, they are sending e-mails for reporting number of approved and rejected records to their bosses. It sound like this operation can be fully automated because structured digital data is used to execute some deterministic rules and finally number of processed records are reported to a parametrically known authority.
Healthy steps of automation can be listed as follows
  1. Investigate the software system carrying data to operators and find data sources that are displayed to operators
  2. Check data flows, assure data validation steps, define exceptions
  3. Define the rules of approval and rejection to run on identified data sources
  4. Develop a software module for processing approval and rejection rules with no human interaction.
  5. Develop a software module for consolidating and counting processed records in bullet 4 and sending automated e-mails to defined recipients for reporting
  6. Test the modules
  7. Launch the modules
  8. Discard the approval screens in the systems
  9. Discard the human operations in the process
  10. Document the jobs done
Of course, you need system analysts, programmers and testers to make this automation happen. The meta process regulating those actions is called SDLC, owned by CIOs. It is business as usual in IT world. By following this methodology, you are guaranteeing portability, performance, maintainability, usability and the functionality of the software systems used for process automation.
What RPA suggests for the same situation is GUI to GUI operator mimicking through macro coding and calling this a sort of process automation. This is a fast but weak way of integration. Because, software systems have different layers of data, business logic execution, data exchange, service integration and presentation (graphical user interface). If, in this scenario, you do not make the necessary changes in the proper layers of the software systems, you end up with non maintainable systems. Systems are not designed to be integrated through their graphical user interfaces. If you do this by recording and playing some GUI steps, it is in decent words, sub optimal. From software engineering perspective, I would call it ugly. Forget any ugliness but just think of the regression effect: GUIs will be changing in time and your macros will be obsolete. Since no software provider is considering their GUIs as system to system integration points, they will not be warning you regarding system integration complications after GUI upgrade they released. As a result, your so called automated process will be broken. Actually, if it is rule based, it will be broken; if you used cognitive RPA by including adaptive algorithms, you will never understand what's happening. And who will be the responsible?

Silos against central IT

RPA tool providers are using the argument of IT independent process automation. If the aim is non maintainable system to system glue application, it is OK. It is OK for a very short term indeed.
Imagine you have non IT RPA departments using RPA tools to glue systems. After 5 years, there will be piles of RPA macro codes that are not managed by IT but by RPA departments who are trying to synchronize their macros with enterprise software modules managed by IT. Those enterprise systems will be more likely developed in continuous delivery/continuous integration fashion. Therefore, massive software change will be inevitable. What will happen to RPA macro codes? What will happen to so called automated systems? Who will be the responsible party to keep things together? Are RPA departments going to be a part of IT organization? Then, is it still be an IT independent thing?

So what?

In short, RPA is a macro. Use it as a macro. Embed it into operating systems or make it a personal solution tool for empowering professionals who are to formulate their repetitive operations and save time. Actually, hundreds of thousand are using macros today: It is a part of programming languages, code editors and IDEs, spreadsheets, operating systems, BPM tools, software testing tools etc. It has been a commodity for years. The idea is just not to make it a sort of enterprise IT solution. Keep it as a self service feature.
Or maybe I am old fashioned.
No robots detected.

Tekne

garibanın ekmeğini yoğurur anası teknede
beş kardeş doluşurlar
bayram ziyareti komşu köye
sıska at yavaş çeker
dört tekerli
boyalı bir tekne
traktörün römorku
kamyonetin kasası
steyşın toros'un bagajı
altta serili battaniye
kara gözler hızlı mersedes'te
şeritler birer birer geçer
direklerin sayısı bin üç yüz elli beş eder
saçını sert yalar rüzgar
her tümsekte yüreği yola düşer
camdaki buğudur nefesler
rengi azalmış çocuksu hayaller
şafaktan evvel teker döner
reo'ya biner on yedi er
sımsıkı kavranmış tüfekler
aklı at arabasına gider
üç kahpe kurşun yakar
ah demeden yere düşer
bir ucu baba omzunda
köylüler dört kollu tekneyi çeker
nişanlısı inceden inler 
garibanın ömrü hep teknede geçer.

Datawarehouse, no more!


Introduction

I am sure you have heard the slogans circling around data and the related technologies: “Data is new oil”, “Data is the most important corporate asset”, “Data is new gold” so on and so forth. Following the data trends, companies are trying to modernize their data infrastructures, founding new data organizations, creating board awareness and making it a part of their digital journey. Moreover, regulators are pushing data management and data governance obligations on enterprises harder every day.

All those are true, and right.

However, I have doubts when I think whether we invest for the future in a proper way or not.  Do we know what we are doing? Are we capable enough to build the future?

I will explain my question marks in a structured way considering a number of dimensions and while discussing the problems, I will propose possible solutions in abstract forms as well.

Mother of all evils

Datawarehouses. Have you heard the stories about them? Many projects have failed while people were trying to build them. Many managers were fired. Many DBAs were cursed and many programmers forgot their professional notion while feeding data into them. A dark shadow, reaching today and struggling to expand tomorrow, coming from 1980s and 1990s.

A typical datawarehouse conventionally offers a number of fundamental values to the technological infrastructure of the corporations such as accumulating data historically, integrating high quality data in a single platform, bridging the application silos, providing cohesive data organised by business subjects, enabling data analytics and business intelligence, robust reporting etc.

Very attractive value propositions indeed. Therefore, all the universe invested in datawarehouses. Data architects, DBAs, data engineers and others started to work hard for finding the best way of copying data from the source systems, transforming them in an efficient manner, storing them in the correct target data models: Some built data marts, some went for great foundational layers in third normal form. Some picked the special purpose database appliance systems for overcoming underperforming queries and data integration software, some procured systems with fat cache feeding infiniband wires, some picked database systems specifically designed for data warehouses. Some failed, some ended up with the systems that are able to run.

Today, vendors are still playing the same game and companies are still building datawarehouses. Let’s analyse the picture of a typical datawarehouse habitat.


Figure 1: A typical datawarehouse scheme. Source: TDWI

Very familiar: a set of data sources comprises operational  systems such as core banking, CRM, ERP, accounting etc.; ETL modules for extracting data from the data structures of the source systems, transforming them and loading into the datawarehouse. Another set of ETL modules or some sort of views for feeding BI, reporting, analytics based or business reporting oriented data marts.

Problem #1: You always end up with data redundancy. It is in the nature of datawarehouses, they store copies of the real data. And no copy is as good as the genuine. For instance, if a court order requests the accounting data of your company, you deliver it from your accounting system or you validate the data you get from your datawahouse by using the data coming from your accounting system. Fresh, hot, operational and reliable as your blood. No cold fish needed!

Problem #2: ETL is guaranteed bad software engineering. Years ago, I highlightedthe characteristics of high quality software system in my blog. I will not jump into the details again but just want to remind two very core characteristics (a) high cohesion (b) low coupling. 

Your modules must be encapsulating functions and data that are strongly interrelated for being a highly cohesive software system. If you designed the module for sales management, all the routines, data structures, functions, assertions etc. in your module must be there for handling sales, nothing more, nothing less. 

In contrary, datawarehouse systems cannot stand for high cohesion because they are designed for covering all enterprise data coming from highly cohesive source systems. That’s one of the reasons a very strong metadata management layer is essential for datawarehouses. Otherwise, you naturally get lost. 

Low coupling refers to minimising the inter-module dependency of the software systems. For providing such an attribute, you need to hide the implementation details of a module from the other modules and you have to set the interfaces for must-have communications with other modules. 

By doing so, it is assured not to affect outer modules in a negative way like invalidating them, breaking their functionality etc. while you are changing the internal structure of your module. Every module is responsible for the interfaces they opened to other modules, keep the interface compatible to the original communication act then, enhance your internals independently. Be message oriented, no unknown or unwanted dependencies, no non-needed details of outer systems to manage. 

Low coupling is good, tight coupling is bad. 

Again, ETL is innately brings the attribute of tight coupling. All ETL tools and techniques force the data integration engineers to know the very internals of source systems: all tables, columns, meanings of the flags sored in the columns, all keys, lookup structures, everything. ETL is like a dagger you inserted into heart of the source systems. It hurts. It is not for good unless you are in the business of assassination. After your ETL systems are plugged into the depths of the ERP systems, CRM systems, Sales systems, Core Banking systems, your data flow towards the datawarehouse is always vulnerable to be invalidated every time a data structure change occurs in the source systems. You must be continuously trying to synchronize yourself with the new data structures and the new syntax and semantics attached to them.

That is mission impossible. That is bad software engineering. That is prying into others’ affairs. It is impossible to always be aware of every implementation details of outer systems for the sake of copying data. A total bad practice.

Perhaps, ETL developers don’t call themselves as software engineers because of this unhealthy heritage of the ETL approach. Most of the ETL professionals I’ve been interviewing are confident with the ETL tools they are familiar with, they know using the tool and they are not interested in the underlying infrastructures or the general idea. Some call themselves data integration specialists in a funny way because ETL is able to integrate “nothing”, just an ugly between-systems dependency technique.

Problem #3: The world is spending millions for enabling cloud foundry, open API, microservices, continuous delivery, continuous integration, hyper connectivity etc. meanwhile what’s datawarehuse world doing? Copying terabytes of data between systems in batch fashion for providing already old data, being affected by every change done in any source system, getting bigger and bigger and so on. 

What is it good for? 

How will IT managers assure coordinating super fast micro systems which are responsible for core operations running in private, public or hybrid clouds, communicating event based, deployed a number of times intraday by devised automations with “reactive, slow, cold and fat datawarehouses”? 

Imagine you devised the best impact  analysis system for alerting datawarehouse people on any sort of source system changes, what will happen in a continuous delivery world? You will get hundreds of alerts and will be thrashing while trying to cope with them in your slow and fragile datawarehouse hinterland.

Architecture for the future

All negative aspects were mainly declared so far. A dark picture, seems like there is no way to escape. Any solutions? Yes.

Firstly, stop investing for traditional datawarehouses. Vendors are motivating you in a false way: the more database instances you have, the more license fees they earn. This is just the legacy sales business. Paying for datawarehouses is like sending your money to 1980. Full stop.

Second, uninstall ETL tools. Re-educate your ETL programmers for extracting the high profile software engineer in them if any left.

Third, focus on the software module design. Classify your modules as data providers and data consumers. Make your modules provide and/or consume data in a responsible way through exposed, well defined services. It is possible to follow industry specific guidelines for identifying key data services (e.g. BIAN for banking).

Fourth, invest for enterprise data bus which is to provide robust access to any sort of identified data service of the modules you developed. There can be many options of data access: ad-hoc, continuous, record based, bulk fashion etc. Harness the necessary technologies for proper data use cases; in-memory layers, self-service data analysis platforms, embedded data validation layers, volatile and non-volatile data presentations etc. Rule of thumb is “produce data knowingly, consume data knowingly”. Awareness is the key. Sample illustration is below.


Figure 2: Enterprise data bus in the middle of highly cohesive software modules

Enterprise data bus and the data services published responsibly is just an idea, not the perfect solution but, unlike datawarehouse and ETL, bearing no intrinsic anomalies. I think, we should find better ways of data and information representation, non-hierarchical data structures, non-linear data processing and handling anyhow. This sort of thinking will yield better talents, better academy and industry. Not the dead sales business in 21st century.

Know the basics, derive for better.

di

everyday
you take your own photos too much
talking on the stages
telling the stories
to the curious looking
empty faces
oh girl,
you're still in the hazy sunny summer streets
tracing your childhood footsteps
to find'n hug her
look deep in the eye
only one drop of huge cry
and
give seven warm kisses
and
tell her a sweet lie.