Streamlined Data Refinery

Blend, enrich and refine any data source into secure, analytic data sets, on demand with a Streamlined Data Refinery. Using Hadoop as a big data processing hub, Pentaho Data Integration processes and refines specific data sets. With a single click, data sets are automatically modeled, published and delivered to users for immediate visual analytics.

Deliver Governed, Analytic Data Sets

With Pentaho’s data integration and analytics platform, Hadoop becomes a high performance, multi-source business information hub where you can stream data, blend it and then automatically publish refined data sets into one of the popular analytic databases (such as Amazon Redshift or HP Vertica). For the end-user, a rich set of data discovery, reports, dashboards and visualizations are immediately available for high performance analytics.

Analytics ready blended data sets at scale

  • A pragmatic approach for delivering analytic data sets at scale for immediate high performance analytics
  • A self service data integration process for blending and enriching vast volumes of highly diverse data
  • An agile data integration process that includes data transformation steps and tools for simplified in-cluster data processing in Hadoop
  • Provides a self-service analytics experience via an automated process including a high performance analytic database for high speed queries and visualizations

Example of how this may look within an IT landscape:

  • This electronic marketing firm has created a refinery architecture for delivering personalized offers
  • Online campaign, enrollment, and transaction data is ingested into Hadoop, processed via Pentaho Data Integration, automatically modeled and delivered to an analytic database
  • User driven execution of analytic data sets on demand
  • A business analytics front-end includes reporting and ad hoc analysis for business users

Streamlined Data Refinery Architecture

The Results:

  • Business users have access to reliable, highly governed data generated from diverse sources at high volume with limited support from IT
  • ETL and data management cost savings by utilizing the right technology for the most appropriate purpose
  • Engineer new data sets for predictive analytics more quickly thanks to rapid ingestion and powerful processing
  • Automatically model and publish governed data sets for immediate visualizations