“Data is not information, information is not knowledge, knowledge is not understanding, understanding is not wisdom..” – Clifford Stoll
That gap - from data to wisdom - will only grow as traditional, proprietary, prohibitively priced licensed Business Intelligence (BI) solutions fight a losing battle against the inexorable expansion of big data, estimated to grow 40 percent annually to exceed 100 zettabytes by 2020. Then there’s the matter of variety. With physical systems increasingly turning digital, new data sources and formats will keep getting added to an already eclectic mix. Clearly BI, which can almost only convert limited volumes of structured data in neatly ordered databases into predetermined analyses – and takes weeks to do it – isn’t cutting it when the expectation is for actionable big data insights on the fly. This is a major reason why enterprises are stuck with immense quantities of information simply lying idle.
But now, a viable solution is emerging from the realm of open source that can really make that silent data talk.
While open source has been around for a while, recent focus from the IT industry, academia, and developer community has sparked outstanding innovation in areas such as big data analytics. This has opened up a significant gap to BI: open source technologies can perform complex analytics on massive quantities of structured and unstructured information using commodity hardware at a fraction of the cost it takes traditional BI to process a fractional amount of data in a limited number of formats. In near-real time, to boot!
Although open source’s cost to performance position is unassailable, to see it merely as an efficiency play is to miss the forest for the trees. An open source-powered big data analytics platform is a valuable lever for today’s enterprise, which needs sharp insights to achieve its twin imperative of renewing existing ways of doing things to improve performance while embracing entirely new things to create new value. With its ability to process information from multiple external heterogeneous data sources besides enterprise transaction systems for an unlimited number of use cases, the analytics platform produces analyses that are comprehensive, multidimensional, and therefore potentially very valuable to the enterprise’s “renew and new” strategy. However, in order to create meaningful impact, the insights must be deployed within a definite window, which is often open only until real-time.
Which is why an enterprise must be careful to choose an analytics platform that has evolved sufficiently to minimize the lag between the time an external data event hits a transaction system and value is extracted from it. Apart from a lag tending to zero, the enterprise must also assess the platform based on where it is in the analytics journey: the diagnostic, predictive, or prescriptive stage. In the first, the platform merely diagnoses a problem, in the second it foretells the problem, and in the last, it suggests, or even triggers, a solution.
Having selected the right open source analytics platform, the enterprise is now ready to embark on its renew and new journey. Large organizations, which typically spend millions on BI applications, are caught in a situation where they cannot jettison those systems, nor fulfill their reporting and analytical needs through them. Here, they can deploy an analytics platform in augmentation of their existing solutions to renew their reporting capabilities at a very small marginal cost. Where the BI tool applies a laborious ETL (Extract Transform Load) methodology to its data warehouse to deliver a report, which could take as long as several weeks, the analytics platform will extract (voluminous and variegated) data very quickly from a vast data lake, employ the latest open source technology to transform it on the fly and answer queries almost instantaneously. Business users can keep throwing questions at the platform and it will answer in an unbroken seamless interaction, until they have the necessary insights to take a completely informed, timely decision.
In the course of time and expanding data, the enterprise can progressively shift its workloads to the highly cost-and-build-effort-efficient platform to bring down the unit cost of analysis and reporting. Eventually, the platform’s data lake must upstage the BI data warehouse as the primary data repository of the organization. For illustration, consider the American subsidiary of a Japanese multinational office automation company, which recently renewed financial and HR reporting by moving those systems from a proprietary BI data warehouse to a leading open source analytics platform. In another instance, the aforementioned platform was implemented in a pilot program at a manufacturing company, where it took a mere 6 minutes to load and report 19 million data records, compared to the incumbent solution, which took more than 2 hours to load and report based on 300,000 records.
Some organizations have gone beyond renewal to generate new value from their analytics platform. At a telecom major plagued by line faults, the platform was used to study ASDN line data patterns manifest as vibrations or poor signal quality, which were then compared with historical data patterns recorded during faults, to arrive at a model that predicted when, and how likely, a line would fail.
In the case of another company, new value was uncovered by the BPO partner who ran the analytics platform on the company’s accounts payables and receivables data to find they were being billed by certain vendors before the contracted date. The company is now investigating the matter, impacting approximately US$ 27 million worth of working capital, further.
Open source technologies undoubtedly have the potential to amplify the analytics capabilities of enterprises. But they also pose a few challenges – adopting an unstoppable flow of new solutions without disrupting business, enjoying seamless support across different solutions and their versions, and repeatedly upgrading skills to keep pace with open source evolution. The biggest challenge of all however, is that old bugbear called security. Enterprises not only want to control data access and rights, but do so at a highly granular level. And they want flexibility – flexibility to allow or restrict access by table, by column, and even by value. At present none of the open source analytic platforms, barring very few, possess this type of “cell-level” authorization capability.
But while security is an important part of platform credentials, so are agility and ease of use. Enterprises should therefore seek to maximize the analytics experience by choosing an end-to-end platform, which isolates business users from all the underlying complexity, even as it enables them to secure business outcomes quickly, with minimum fuss.
To a large organization, the open source analytics platform is a great way to augment and finally overtake traditional BI, but for a small enterprise, or one that’s just starting out, the platform is probably all it will ever need. To make big data talk more.
By Ganapathy Subramanian, Vice President and Unit Technology Officer, Big Data & Analytics, Infosys