Data pipeline and streams

Trend 4. Shift to streaming data from batch processing

In today’s ever-connected society, enterprises are being bombarded with huge amounts of data from sensors, machines, tracking devices, banking and trading sources, smartphones, social media content, and other “internet of things” devices. The need of the hour for businesses is to have the engine to decipher strategic value from this data in motion with minimal latency to deliver real-time insights, train algorithms or update the event-driven logic of mission-critical applications. This is different from traditional analytic tools operating on data at rest. Not all collected data can be accumulated for future use, as it may lose relevance and not reflect the business realities. This need for real-time analytics has seen enterprises shift from batch processing to real-time data streaming.

Kafka, Flume, Storm and Flink are a few popular realtime, data-streaming ETL tools.

Infosys partnered with a global footwear and apparel giant to build a near-real-time event stream-based solution to ingest and analyze a huge number of events generated through mobile apps and the internet for user activity analysis.

Infosys collaborated with a global financial services client to enable near-real-time movement of data from their core banking systems to the Google Cloud platform , which enabled accelerated onboarding of new clients in the small and medium enterprises (SME) banking segment.

Data pipeline and streams

Trend 5. Event processing of device or sensor data

Streaming data provides opportunities for interesting future use cases with AI and event-driven applications, most notably giving rise to various tools and frameworks for building and running scalable event stream processing.

The rise of event stream processing is a result of the growing volume of real-time data. Events are created throughout the enterprise. The rise of the internet of things has fueled the demand for event processing. At the edges, the events are detected by sensors or devices. When processes for a business either start, finish or fail, an event is created within the network. Based on the outcome of these events, the activity of an enterprise can be altered. It includes the capture, emission, subsequent routing and any further processing of emitted events and their consumption.

Infosys partnered with a Belgian courier company to exemplify this trend by implementing bar code scanners to track parcels in real time and to take corrective actions for parcel routing.

Another example includes Infosys collaborating with a Japanese multinational car manufacturer to use real-time processing of telemetry data received from connected cars to generate vehicle history reports instantly.

Data pipeline and streams

Trend 6. Metadata-driven ETL pipelines fueling agility

Enterprises have countless data sources and need scalable and robust pipelines to maintain data integrity. The ETL development tools generally require expertise with the tool set and can be time consuming and error prone.

A metadata-driven ETL pipeline provides an easy and flexible abstraction layer and simplifies the implementation process. It provides the required speed and agility and includes generating templates for exception handling, rules management and data migration controls. Data schemas, physical data source locations, job control parameters and errorhandling logic can be stored within configuration files. The framework can effortlessly process and maintain these files to generate ETL jobs that are executable. Loading data into a traditional enterprise data warehouse becomes a more streamlined process, which makes data more readily available for reporting, to be used by other applications and analytics. This framework produces code that is simple to maintain and review because it is standardized.

Infosys assisted a German pharma company achieve a 35%-40% reduction in the effort needed to onboard and process new or existing datasets for oncology and women’s health.

Infosys partnered with a global coffeehouse giant to accelerate onboarding of new data products through a multipoint, metadatadriven ingestion framework, with extreme automation-enabled auto generation of data flow pipelines.

Subscribe

To keep yourself updated on the latest technology and industry trends subscribe to the Infosys Knowledge Institute’s publications

Infosys TechCompass