Five New Pentaho Data Integration Enhancements, Including SQL on Spark, Deliver Value Faster and Future Proof Big Data Projects

New Spark and Kafka support, Metadata Injection enhancements and Hadoop security alleviate big data complexity

September 26, 2016, New York City (Strata + Hadoop World, Booth #533) —

Pentaho, a Hitachi Group Company, today announced five new improvements, including SQL on Spark, to help enterprises overcome big data complexity, skills shortages and integration challenges in complex, enterprise environments. These big data integration enhancements help IT teams deliver value from big data projects faster with existing resources, by eliminating the need for manual coding, providing tighter security and supporting more of the big data technology ecosystem

1. More Apache Spark integration

Pentaho expands its existing Spark integration in the Pentaho platform, for customers that want to incorporate this popular technology to:

  • Lower the skills barrier for Spark – data analysts can now query and process Spark data via Pentaho Data Integration (PDI) using SQL on Spark
  • Coordinate, schedule, reuse, and manage Spark applications in data pipelines more easily and flexibly – expanded PDI orchestration for Spark Streaming, Spark SQL and Spark machine learning (Spark MLlib and Spark ML) to support the growing number of developers who use multiple Spark libraries
  • Integrate Spark apps into larger data-driven processes and get more out of them – PDI Orchestration of Spark applications written in Python benefits developers writing Spark applications in this popular language

2. Expanded metadata injection capabilities

Pentaho’s unique metadata injection capability to onboard multiple data types faster allows data engineers to dynamically generate PDI transformations at runtime instead of having to hand-code each data source, reducing costs by 10X. Pentaho adds over 30 compatible PDI transformation steps, including operations related to Hadoop, Hbase, JSON, XML, Vertica, Greenplum, and other big data sources.   

3. Expanded Hadoop data security integrations

Securing big data environments can be extremely difficult because the technologies that define authentication and access are continuously evolving. Pentaho expands its Hadoop data security integration to promote better big data governance, protecting clusters from intruders. These include enhanced Kerberos integration for secure multi-user authentication and Apache Sentry integration to enforce rules that control access to specific Hadoop data assets

4. Apache Kafka support

Apache Kafka’s increasingly popular publish/subscribe messaging system handles large data volumes common in today’s big data and IoT solutions. Pentaho now provides Enterprise customer support to send and receive data from Kafka, to facilitate continuous data processing use cases in PDI. 

5. Enhanced support for popular Hadoop file formats

Pentaho now supports the output of files in Avro and Parquet formats in PDI, both popular for storing data in Hadoop in big data onboarding use cases.   

“Our latest enhancements reflect Pentaho’s continued mission to quickly make big data projects operational and deliver value by strengthening and supporting analytic data pipelines”, says Donna Prlich, Senior Vice President, Product Management, Product Marketing & Solutions, at Pentaho. “Enterprises can focus on their big data deployments, removing the complexity and time involved in data preparation by taking advantage of new, high potential technologies like Spark and Kafka in the big data ecosystem.”

Quote Sheet

“Veikkaus, the Finnish lottery, uses Pentaho Data Integration to rapidly consolidate and process both relational and semi-structured data to drive a better understanding of our customers and enhance loyalty,” said Harri Räsänen, Architect at Veikkaus. “Pentaho has helped us rapidly solve complex data problems and establish a future-proof data foundation in the face of an ever-evolving big data landscape.”

“Pentaho helps organizations create business competitive advantage with data by accelerating the incorporation of technologies like Spark into existing data environments by managing risk to facilitate alignment with big data security policies.” Said Tim Stevens, vice president of Corporate and Business Development, Cloudera.  “Cloudera’s partnership with Pentaho empowers joint customers to bring innovative, enterprise-grade Hadoop analytic applications to market more quickly.”

USAble Life of Little Rock, AR, was created in 1993 and is an independent life, disability, accident, and specialty insurance company. According to Jason Brannon, Supervisor of Data Architecture, USAble, "To synchronize the ongoing changes to enrolment information between our customers and partners, above all we need flexibility in our data architecture and as few bottlenecks as possible. PDI's metadata injection gives us unparalleled flexibility by automatically transforming customer data into outbound partner feeds. This means we can devote our time to analysing data and improving customer relationships." 

North American Bancard Holdings, a leader in the payments industry, processes and analyzes more than $34 billion per year in transactions in order to enhance its operations and improve customer service. According to Krishna Swargam, Business Intelligence Architect at North American Bancard Holdings.  “Pentaho plays a crucial role in orchestrating and automating this data pipeline, delivering analytic-ready data in a complex environment, and we are excited how Pentaho’s new big data enhancements will further drive business transformation throughout the organization.”

Resources

  • Learn more about Pentaho’s big data enhancements
  • Register for a webinar discussing Pentaho’s latest release, here.  
  • Visit us at booth #533 at Strata the week of September 26th
  • Join Pentaho for our Strata session “Filling the Data Lake” on Sept 28 at 2:05 pm E.T.  

About Hitachi Vantara

Hitachi Vantara, a wholly owned subsidiary of Hitachi, Ltd., helps data-driven leaders find and use the value in their data to innovate intelligently and reach outcomes that matter for business and society. We combine technology, intellectual property and industry knowledge to deliver data-managing solutions that help enterprises improve their customers’ experiences, develop new revenue streams, and lower the costs of business. Only Hitachi Vantara elevates your innovation advantage by combining deep information technology (IT), operational technology (OT) and domain expertise. We work with organizations everywhere to drive data to meaningful outcomes. Visit us at www.HitachiVantara.com.

About Pentaho

Pentaho data integration and analytics at Hitachi Vantara is an open-sourced based, enterprise-class platform for big data deployments. The unified data integration and analytics platform is comprehensive, completely embeddable and delivers governed data to power any analytics in any environment. Pentaho has enabled early big data and emerging IoT deployments, making our customers some of the most innovative in the industry – connecting people, things, and data to drive their digital transformation. Learn more at www.pentaho.com.

About Hitachi, Ltd.

Hitachi, Ltd. (TSE: 6501), headquartered in Tokyo, Japan, delivers innovations that answer society’s challenges. The company’s consolidated revenues for fiscal 2016 (ended March 31, 2017) totaled 9,162.2 billion yen ($81.8 billion). The Hitachi Group is a global leader in Social Innovation and has approximately 304,000 employees worldwide. Through collaborative creation, Hitachi is providing solutions to customers in a broad range of sectors, including Power / Energy, Industry / Distribution / Water, Urban Development, and Finance / Government & Public / Healthcare. For more information, please visit www.hitachi.com.