Spark作为图数据计算组件的集成案例

 

Here’s the table of contents:

  1. 在Spark集群安装neo4j-spark插件
  2. 基础组件依赖信息
  3. 创建测试数据
  4. 备注
  5. 报错
    1. 报错一

@TOC

如果需要在ONgDB集群集成Spark作图计算相关的任务,可以参考这个ongdb-spark-java-scala-example项目,避免踩坑:)

在Spark集群安装neo4j-spark插件

  • 下载组件
    https://github.com/ongdb-contrib/neo4j-spark-connector/releases/tag/2.4.1-M1
    
  • 下载组件放在spark安装目录的jars文件夹
    E:\software\ongdb-spark\spark-2.4.0-bin-hadoop2.7\jars
    

基础组件依赖信息

  • 版本信息
    Spark 2.4.0  http://archive.apache.org/dist/spark/spark-2.4.0/
    ONgDB 3.5.x
    Neo4j-Java-Driver 1.7.5
    Scala 2.11
    JDK 1.8
    hadoop-2.7.7
    https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/
    neo4j-spark-connector-full-2.4.1-M1 https://github.com/neo4j-contrib/neo4j-spark-connector
    
  • 下载的安装包
    hadoop-2.7.7
    spark-2.4.0-bin-hadoop2.7
    winutils
    neo4j-spark-connector-full-2.4.1-M1 【把jar包放到spark/jars文件夹里】
    scala-2.11.12
    

创建测试数据

UNWIND range(1,100) as id
CREATE (p:Person {id:id}) WITH collect(p) as people
UNWIND people as p1
UNWIND range(1,10) as friend
WITH p1, people[(p1.id + friend) % size(people)] as p2
CREATE (p1)-[:KNOWS {years: abs(p2.id - p2.id)}]->(p2)
FOREACH (x in range(1,1000000) | CREATE (:Person {name:"name"+x, age: x%100}));
UNWIND range(1,1000000) as x
MATCH (n),(m) WHERE id(n) = x AND id(m)=toInt(rand()*1000000)
CREATE (n)-[:KNOWS]->(m);

备注

  • 案例项目【为了避免踩坑下面这个Java-Scala混编案例项目可以参考一下】
    https://github.com/ongdb-contrib/ongdb-spark-java-scala-example
    

    下载依赖包如果出现问题请检查下面网址是否可以正常下载Spark相关的JAR包

    http://dl.bintray.com/spark-packages/maven
    
  • 案例项目截图【使用前在本地启动Spark】 Spark运行界面 案例程序运行界面

  • 阅读原文
    https://crazyyanchao.github.io/blog/2020/12/16/Spark%E4%BD%9C%E4%B8%BA%E5%9B%BE%E6%95%B0%E6%8D%AE%E8%AE%A1%E7%AE%97%E7%BB%84%E4%BB%B6%E7%9A%84%E9%9B%86%E6%88%90%E6%A1%88%E4%BE%8B.html
    
  • 相关组件安装的其它参考资料
    https://crazyyanchao.github.io/blog/2020/11/10/ONgDB%E9%9B%86%E6%88%90%E5%9B%BE%E8%AE%A1%E7%AE%97%E7%BB%84%E4%BB%B6Spark.html
    

报错

报错一

  • 测试代码编译错误
    Error:(6, 12) object neo4j is not a member of package org
    import org.neo4j.spark.Neo4j
    Error:(31, 15) not found: value Neo4j
      val neo = Neo4j(sc)
    
  • 发现是graphframes-JAR包冲突 ``` omitted for conflict with
graphframes graphframes 0.8.0-spark2.4-s_2.11 graphframes graphframes 0.7.0-spark2.4-s_2.11
### 报错二
- 报错现象

Error running ‘GraphTest.runMatrixQuery’: Command line is too long. Shorten command line for GraphTest.runMatrixQuery or also for JUnit default configuration.

- 设置Run/Debug Configurations

Shorten command line:JAR mainifest ```