The Big Data (Hadoop) ecosystem has evolved over the years from batch processing (Hadoop 1.0) to streaming and near real-time analytics (Hadoop 2.0) to Hadoop meets AI (Hadoop 3.0). These technical capabilities continue to evolve, delivering the data lake as a private cloud with separation of storage and compute. Future enhancements include support for a hybrid cloud (and multi-cloud) enablement.
Cloudera And NVIDIA Partnerships
Cloudera released the following two software platforms in the second half of 2020, which, together, enables the data lake as a private cloud:
- Cloudera Data Platform Private Cloud Base – Provides storage and supports traditional data lake environments; introduced Apache Ozone, the next generation filesystem for data lake
- Cloudera Data Platform Private Cloud Experiences – Allows experience- or persona-based processing of workloads (such as data analyst, data scientist, data engineer) for data stored in the CDP Private Cloud Base.
Last year, cisco declared their cooperation with NVIDIA to speed up Apache Spark 3.0. at the point when NVIDIA declared their commitment of GPU-local speed increase support for Apache Spark 3.0.
Cisco Data Intelligence Platform (CDIP)
Cisco Data Intelligence Platform (CDIP) is a thoughtfully designed private cloud for data lake requirements, supporting data-intensive workloads with the Cloudera Data Platform (CDP) Private Cloud Base and compute-rich (AI/ML) and compute-intensive workloads with the Cloudera Data Platform Private Cloud Experiences — all the while providing storage consolidation with Apache Ozone on the Cisco UCS infrastructure. And it is all fully managed through Cisco Intersight. Cisco Intersight simplifies hybrid cloud management, and, among other things, moves the management of servers from the network into the cloud.
CDIP as a private cloud is based on the new Cisco UCS M6 family of servers that support NVIDIA GPUs and 3rd Gen Intel Xeon Scalable family processors with PCIe Gen 4 capabilities. These servers include the following:
- Cisco UCS C240 M6 Server for Storage (Apache Ozone and HDFS) with CDP Private Cloud Base — extends the capabilities of the Cisco UCS rack server portfolio with 3rd Gen Intel Xeon Scalable Processors, supporting more than 43% more cores per socket and 33% more memory than the previous generation.
- Cisco UCS X-Series for CDP Private Cloud Experiences — a modular system managed from the cloud (Cisco Intersight). Its adaptable, future-ready, modular design meets the needs of modern applications and improves operational efficiency, agility, and scale.
CDIP is designed for hybrid clouds to help customers address the needs of modern apps and extensible data platforms. They can further accelerate their AI/ML and ETL workloads on their data lake with GA of Apache Spark 3.0 enabling GPU-accelerated workloads powered by NVIDIA RAPIDS data science libraries in the CDP Private Cloud Base 7.1.6.
The NVIDIA RAPIDS suite of open-source software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. RAPIDS uses NVIDIA CUDA and exposes GPU parallelism to accelerate ETL and machine-learning workloads. NVIDIA RAPIDS Accelerator for Apache Spark leverages GPUs to accelerate data processing in Apache Spark 3.0 using the RAPIDS libraries. This allows users to run existing Apache Applications ten times faster with no code changes.
On the AI/ML side, NVIDIA GPUs integrates with libraries like Tens or Flow and PyTorch to accelerate the training of Neural Networks for various use cases, such as Computer Vision and Natural Language processing, on a single GPU node or on multiple nodes, reducing the training time from weeks to days (or hours). This saves our customers valuable time.
The Cisco, NVIDIA, and Cloudera three-way partnership brings our joint customers a much richer data lake experience through solution technology advancements, validated designs, and it all comes with full product support.