搜档网
当前位置:搜档网 › Conquer Big Data through Spark

Conquer Big Data through Spark

Course Background:

Apache Spark?is a fast and general engine for large-scale data processing.Spark has an advanced DAG execution engine that supports cyclic data flow and in-memory computing. You can run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.:

Spark powers a stack of high-level tools including Spark SQL,MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application:

You can run Spark readily using its standalone cluster mode, on EC2, or run it on Hadoop YARN or Apache Mesos. It can read from HDFS, HBase, Cassandra, and any Hadoop data source:

Write applications quickly in Java, Scala or Python.Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala and Python shells.

Apache Spark has seen phenomenal adoption, being widely slated as the successor to Hadoop MapReduce, and being deployed in clusters from a handful to thousands of nodes.

In the past few years ,Databricks, with the help of the Spark community, has contributed many improvements to Apache Spark to improve its performance, stability, and scalability. This enabled Databricks to use Apache Spark to sort 100 TB of data on 206 machines in 23 minutes, which is 3X faster than the previous Hadoop 100TB result on 2100 machines. Similarly, Databricks sorted 1 PB of data on 190 machines in less than 4 hours, which is over 4X faster than the previous Hadoop 1PB result on 3800 machines.

Spark is fulfilling its promise to serve as a faster and more scalable engine for data processing of all sizes. Spark enables equally dramatic improvements in time and cost for all Big Data users.

Course Introduction:

This course almost covers everything for Application Developer to build diverse Spark applications to fulfill all kinds of business requirements: Architecture of Spark、the programming model in Spark、internals of Spark、Spark SQL、MLlib、GraphX、Spark Streaming、Testing、Tuning、Spark on Yarn、JobServer and SparkR.

Additional,this course also covers the very necessary skills you need to write Scala code in Spark, to help whom is not familiar with Scala.

Who Needs to Attend

Anyone who is interested in Big Data Development;

Hadoop Developer;

Other Big Data Developer;

王家林老师(联系邮箱186********@https://www.sodocs.net/doc/385320774.html, 电话:186******** QQ:1740415547 微信号:186********)

Spark亚太研究院院长和首席专家,中国目前唯一的移动互联网和云计算大数据集大成者。

Spark亚太研究院院长和首席专家,移动互联网、云计算和大数据技术领域集大成者。

当今云计算领域最火爆的技术Docker源码级专家和Docker技术在中国的最早实践者之一。

在Spark、Hadoop、Android、Docker等方面有丰富的源码、实务和性能优化经验。彻底研究了Spark 从0.5.0到1.1.0共18个版本的Spark源码。

Hadoop源码级专家,曾负责某知名公司的类Hadoop框架开发工作,专注于Hadoop一站式解决方案的提供,同时也是云计算分布式大数据处理的最早实践者之一,Hadoop的狂热爱好者,不断的在实践中用Hadoop解决不同领域的大数据的高效处理和存储,现在正负责Hadoop在搜索引擎中的研发等,著有《云计算分布式大数据Hadoop实战高手之路---从零开始》《云计算分布式大数据Hadoop实战高手之路---高手崛起》《云计算分布式大数据Hadoop。实战高手之路---高手之巅》等;

多款浏览器定制者,中国大陆HTML5的技术引领者。

为超过50家公司提供了基于Linux和Android的软硬整合解决方案。

擅长构建系统和打造框架,特别精通于Java与C/C++混合的框架实现。

Android架构师、高级工程师、咨询顾问、培训专家;

通晓Android、HTML5、Hadoop,迷恋英语播音和健美;

致力于Android、HTML5、Hadoop的软、硬、云整合的一站式解决方案;

国内最早(2007年)从事于Android系统移植、软硬整合、框架修改、应用程序软件开发以及Android 系统测试和应用软件测试的技术专家和技术创业人员之一。

HTML5技术领域的最早实践者(2009年)之一,成功为多个机构实现多款自定义HTML5浏览器,参与某知名的HTML5浏览器研发;

超过10本的IT畅销书作者;

决胜大数据时代100期公益大讲堂:

https://www.sodocs.net/doc/385320774.html,/course/course_id-1659.html Spark实战高手之路完整系列课程: https://www.sodocs.net/doc/385320774.html,/pack/view/id-144.html

Spark实战高手之路:

https://www.sodocs.net/doc/385320774.html,/art/201408/448416.htm

Spark专刊:

https://www.sodocs.net/doc/385320774.html,/tag-spark%E4%B8%93

%E5%88%8A.html

Spark中文文档:

https://www.sodocs.net/doc/385320774.html,/tag-spark%E7%BF%BB

%E8%AF%91.html

Prerequisites

Be familiar with the basics of object-oriented programming; Course Outline

相关主题