Pentaho Big Data Capabilities
Modern, Fully Integrated Big Data Analytics Platform
Pentaho provides a complete set of data preparation, data discovery and predictive analytics capabilities that provide deep and native support for all leading big data sources, traditional relational sources and many other data stores critical to business.
Data preparation and modeling
For IT and developers, Pentaho provides a complete visual design environment to simplify and accelerate data preparation and modeling. The rich visual interfaces support for loading, extracting, integrating and transforming data within big data stores.
-
Load and extract – Visual input steps provide an easy way to load data into and extract data from any kind of big data store including Hadoop, NoSQL databases and analytic databases.
This includes native interfaces to HDFS, MapReduce and Hive for Hadoop, and native parallel bulk loader utilities for many of the leading analytic databases.
-
Integrating – Visual steps provide a fast and easy way to merge data from multiple sources, both big data sources and traditional sources.A rich library of transformation logic, data consistency and performance optimizations, and the ability to cache lookup data into memory is available.
- Transforming – Pentaho provides an extensive library of data transformation capabilities, including calculations, string substitution, splitting fields, mapping values and more. These visual transforms can be used to meet the needs of even the most complex data processing requirements.
Job orchestration
Pentaho provides an intuitive visual user interface for orchestration of data processing and data integration jobs for all big data stores (Hadoop, NoSQL, analytic databases) as well as traditional relational databases and other data stores.
In addition, Pentaho’s job orchestration and workflow capabilities interoperate with solution-specific tools, leveraging previous investment in these tools.
Learn more about big data integration
Instant and interactive reporting and dashboards
Pentaho can be directly connected to any big data store to provide instant ‘friction-less’ reporting, without having to extract data and load it anywhere else, use proprietary data sampling techniques or use the lowest common denominator connection methods (i.e. Hive for Hadoop) used by other tools.
The Pentaho report designer provides a rich intuitive graphical interface for designing even the most complex and sophisticated reports.
Interactive visualization and exploration of big data
Pentaho provides interactive analysis and visualization of large volumes of data at a single glance and the ability to explore data to find valuable patterns and anomalies.
Visualization types include geo-maps, heat grids and scatter/bubble charts. Interactive capabilities enable drill down into supporting reports and dashboards, as well as extreme-scale in-memory data caching for speed-of-thought analysis with large data volumes.
Learn more about big data analytics
Instaview
Instaview, Pentaho’s big data analytics application, dramatically reduces the time required for data analysts to discover, visualize and explore large volumes of diverse data. With Instaview, data scientists and data analysts can move from data to analytics in three simple steps:
Step 1 - Choose Your Big Data Source

Instaview connects to new and diverse data sources as well as traditional transactional data, giving data analysts a complete view of customers, business operations and performance. Data sources include:
- Hadoop Data: HDFS, Hive
- NoSQL Data: HBase, Cassandra, MongoDB
- Social and Web Data: Twitter, Facebook, Log Files, Web Logs
Step 2 - Auto Prepare Your Data

Instaview automatically turns raw and unstructured data into self-service, analytic-ready data sets.
Instaview simplifies, groups, sorts and aggregates large volumes of unruly data without requiring the help of IT or developers.
Step 3 - Interactively Visualize and Explore

Instaview provides an interactive user interface for exploration and analysis of big data. Instaview’s interactive visualizations include:
- Geo-mapping
- Heat grids
- Scatter/bubble charts
- Bar/column
Learn more: Pentaho Instaview
Technical Benefits for IT and Developers
Instaview not only provides the fastest and most engaging big data exploration and visualization for data analysts, but also improves the effectiveness of IT and developers working with big data sources to:
- View leading big data stores in an instant with out-of-the-box templates
- Create or refine big data templates and enrich them with data from multiple sources
- Ensure optimal security and performance against big data sources with managed data access
