from spark site
Apache Spark⢠is a fast and general engine for large-scale data processing
- 
    speed - 
        run programs up to 100x faster than hadoop mapreduce in memory, or 10x faster on disk 
- 
        spark has an advanced DAGexecution engine that supports cyclic data flow and in-memory computing
 
- 
        
- 
    ease of use - 
        write apps quickly in java, scala or python 
- 
        spark offers over 80 high-level operators that make it easy to build parallel apps 
- 
        and you can use it interactively from the scala and python shells 
 
- 
        
- 
    generality - 
        combine sql, streaming, and complex analytics 
- 
        spark powers a stack of high-level tools including Spark SQL, MLlib for machine learning, GraphX, and Spark Streaming 
- 
        you can combine these libraries seamlessly in the same app +-----------+ +-----------+ +-----------+ +-----------+ | | | | | | | | | spark | | spark | | MLlib | | GraphX | | sql | | streaming | | machine | | graph | | | | | | learning | | | | | | | | | | | +-----------+ +-----------+ +-----------+ +-----------+ +-----------------------------------------------------+ | | | apache spark | | | +-----------------------------------------------------+
 
- 
        
- 
    runs everywhere - 
        runs on - 
            hadoop 
- 
            mesos 
- 
            standalone 
- 
            in the cloud 
 
- 
            
- 
        it can access diverse data sources including - 
            hdfs 
- 
            cassandra 
- 
            hbase 
- 
            s3 
 
- 
            
- 
        you can run spark readily using its 
- 
            on ec2 
- 
            or run it on hadoop yarn or apache mesos 
- 
            it can read from hdfs, hbase, cassandra, and any hadoop data source 
 
 
-