Cloudera Delivers Enterprise-Grade Real-Time Streaming and Machine Learning with Apache Spark 2.0 and Drives Community Innovation with Apache Kudu 1.0

STRATA+HADOOP WORLD NEW YORK, NY, Sept. 29, 2016 -- Cloudera, the global provider of the fastest, easiest, and most secure data management and analytics platform built on Apache Hadoop and the latest open source technologies, today announced its release built on the Apache Spark 2.0 (Beta), with enhancements to the API experience, performance improvements, and enhanced machine learning capabilities. In addition, Cloudera is working with the community to continue developing Apache Kudu 1.0, recently released by the Apache Software Foundation. Cloudera’s latest contributions to these open source projects, alongside deeper integration for its platform, recognize the growing need for streaming and analyzing real-time data in high-demand workloads, including machine learning models deployed in production by Cloudera’s enterprise customers.

Apache Spark

Cloudera’s commitment to open source innovation is evidenced by its strong leadership to drive the features and capabilities that enterprises demand, particularly around security, stability, and broad integration. These capabilities are critical to making projects a reality for enterprise adoption. Cloudera was the first Hadoop big data analytics vendor to deliver a commercially supported version of Spark, and has participated actively in the open source community to enhance Spark for the enterprise through its One Platform Initiative. With Spark 2.0, organizations are better able to take advantage of streaming data, develop richer machine learning models, and deploy them in real time, enabling more workloads to go into production.

Spark 2.0 features include:

Better performance and enhanced usability with the new Dataset API
Structured Steaming for better performance and easier ingest of traditional structured data for time series, tabular, and Internet of Things (IoT) data
Compile-time type safety for user-defined functions for improved reliability in mission-critical applications
Machine learning model, pipeline persistence, and newly supported machine learning libraries to take on new data sets and analytic applications

"Cloudera was the first vendor to offer a commercially supported version of Apache Spark in our big data platform. In the years since then, Spark has become a standard for stream processing and machine learning workloads across the industry," said Mike Olson, founder and chief strategy officer at Cloudera. "As a component of a Cloudera enterprise data hub, Spark benefits from the security, manageability, data governance, and compliance services that customers demand. It can handle high-scale, high-performance workloads reliably. Being a part of the global Spark community, and committed to continued enhancements for demanding enterprises."

Apache Kudu

In September of 2015, Cloudera announced the public beta release of Apache Kudu, its high performance columnar store for Hadoop that enabled the powerful combination of fast analytics on fast data. Two months later, Cloudera donated Kudu to the Apache Software Foundation (ASF) to open it to the broader developer community to expand the type and variety of fast analytic use cases. While Spark 2.0 will give businesses better access to streaming data, Kudu 1.0 will enable enterprises to adopt real-time use cases at a greater pace.

“Kudu is a response to the increase in prevalence of real-time analytic use cases in the market,” said Charles Zedlewski, vice president, Products at Cloudera. “As far back as 2012, Cloudera recognized the analytic gap in the Hadoop ecosystem that was leading architects to create complex hybrid architectures for real-time analytics. With the Apache Kudu 1.0 launch, the original vision is coming to fruition as users can now rely on a single, simplified project for fast analytics on fast data. We’ve seen the community quickly adopt Kudu and apply it to numerous high-scale, real-time analytic use cases.”

Kudu offers fast scans across data for analytics, and instant read/write capabilities for frequent updates and searches. Kudu also enables enterprises to adopt real-time use cases at a greater rate. Along with its integration with Spark, Kudu 1.0 is also tightly integrated with MapReduce and Impala to enable best-in-class processing.

Kudu 1.0 features include:

A simplified architecture that enables very fast batch and stream processing
Fault tolerance and scalability into the hundreds of nodes
A columnar structure that enables analytic analysis on the latest data, for real-time use cases such as time series data, machine data analytics and online reporting

Additional information

About Cloudera

Cloudera delivers the modern data management and analytics platform built on Apache Hadoop and the latest open source technologies. The world’s leading organizations trust Cloudera to help solve their most challenging business problems with Cloudera Enterprise, the fastest, easiest and most secure data platform available for the modern world. Our customers efficiently capture, store, process and analyze vast amounts of data, empowering them to use advanced analytics to drive business decisions quickly, flexibly and at lower cost than has been possible before. To ensure our customers are successful, we offer comprehensive support, training and professional services. Learn more at http://cloudera.com.

Connect with Cloudera

About Cloudera: cloudera.com/about-cloudera.html

Read our blogs: cloudera.com/blog and vision.cloudera.com

Visit us on Facebook: facebook.com/cloudera

Join the Cloudera Community: cloudera.com/community

Cloudera, Cloudera's Platform for Big Data, Cloudera Enterprise Data Hub Edition, Cloudera Enterprise Flex Edition, Cloudera Enterprise Basic Edition, Cloudera Navigator Optimizer and CDH are trademarks or registered trademarks of Cloudera Inc. in the United States, and in jurisdictions throughout the world. All other company and product names may be trademarks of their respective owners.

Press Contact:

Deborah Wiltshire

Cloudera

[email protected]

+1 (650) 644-3900

Cloudera Delivers Enterprise-Grade Real-Time Streaming and Machine Learning with Apache Spark 2.0 and Drives Community Innovation with Apache Kudu 1.0

Editor's Picks

Welcome to EconoTimes