股权网络弱连通子图分析

 

Here’s the table of contents:

  1. 弱连通子图分析过程
  2. 弱连通子图分析结果
  3. 高并发求最大团测试
    1. 2个并发寻找最大团
    2. 4个并发寻找最大团
    3. 8个并发寻找最大团
    4. 16个并发寻找最大团
    5. 24个并发寻找最大团
    6. 32个并发寻找最大团
    7. 40个并发寻找最大团
    8. 48个并发寻找最大团
    9. 56个并发寻找最大团
    10. 160个并发寻找最大团
  4. 在超大团【1189万的团】上执行一百层路径穿透测试
    1. 案例一【返回开始结束节点数据】
    2. 案例二【返回全部数据】
    3. 穿透一百层返回开始结束节点数据使用SKIP参数

弱连通子图分析过程

CALL algo.unionFind.stream(label:String, relationship:String, {weightProperty:'propertyName', threshold:0.42, defaultValue:1.0,concurrency:4)
YIELD nodeId, setId - yields a setId to each node id

弱连通子图分析结果

  • WCC-弱连通分量计算【使用默认权重参数配置】
    CALL algo.unionFind.stream('HORGShareHoldV002','持股',{concurrency:8}) YIELD nodeId,setId
    RETURN nodeId,algo.asNode(nodeId).name AS name,setId
    
  • WCC-弱连通分量计算并找到最大团【使用默认权重参数配置】
    CALL algo.unionFind.stream('HORGShareHoldV002','持股',{concurrency:8}) YIELD nodeId,setId
    WITH nodeId,algo.asNode(nodeId).name AS name,setId
    WITH COLLECT(setId) AS setIdList
    WITH apoc.coll.sortMaps(apoc.coll.frequencies(setIdList),'count') AS sortMaps
    RETURN sortMaps[0] AS maxClique
    
    // 运行结果:最大团包含1189万节点;耗时228823毫秒,228.823秒
    {
    "count": 11891440,
    "item": 14739839
    }
    
  • WCC-弱连通分量计算并找到最小团【使用默认权重参数配置】
    CALL algo.unionFind.stream('HORGShareHoldV002','持股',{concurrency:8}) YIELD nodeId,setId
    WITH nodeId,algo.asNode(nodeId).name AS name,setId
    WITH COLLECT(setId) AS setIdList
    WITH apoc.coll.sortMaps(apoc.coll.frequencies(setIdList),'^count') AS sortMaps
    WITH FILTER(map IN sortMaps WHERE map.count>1) AS reSortMaps
    RETURN reSortMaps[0] AS minClique
    
    // 运行结果:最小团包含1189万节点;耗时67478毫秒,67.478秒
    {
    "count": 2,
    "item": 133559
    }
    
  • WCC-弱连通分量计算并找到最大团,从最大团找到一个实体进行穿透测试 【使用默认权重参数配置】
    CALL algo.unionFind.stream('HORGShareHoldV002','持股',{concurrency:8}) YIELD nodeId,setId
    WITH nodeId,setId
    WITH COLLECT(nodeId) AS nodeIdList,setId
    WITH SIZE(nodeIdList) AS size,nodeIdList[0] AS oneId,setId
    RETURN size,oneId,setId ORDER BY size DESC LIMIT 1
    
    // 运行结果:耗时29682毫秒,29.682秒
    ╒════════╤═════════╤═══════╕
    │"size"  │"oneId"  │"setId"│
    ╞════════╪═════════╪═══════╡
    │11891440│396614326│6022290│
    └────────┴─────────┴───────┘
    

高并发求最大团测试

关于并发参数的设置:并发数concurrency通常设置成分配给数据库服务运行的CPU内核数的整数倍。例如,图数据库服务运行在8个CPU内核的虚拟或物理主机上,那么concurrency可以是8、16、24等值。

  • 查看服务器CPU配置
    // 可以看到CPU为四核八线程
    [ongdb@ip200 ongdb-enterprise-3.5.22]$ grep 'core id' /proc/cpuinfo | sort -u | wc -l
    4
    [ongdb@ip200 ongdb-enterprise-3.5.22]$ grep 'processor' /proc/cpuinfo | sort -u | wc -l
    8
    [ongdb@ip200 ongdb-enterprise-3.5.22]$
    

    2个并发寻找最大团

    CALL algo.unionFind.stream('HORGShareHoldV002','持股',{concurrency:2}) YIELD nodeId,setId
    WITH nodeId,algo.asNode(nodeId).name AS name,setId
    WITH COLLECT(setId) AS setIdList
    WITH apoc.coll.sortMaps(apoc.coll.frequencies(setIdList),'count') AS sortMaps
    RETURN sortMaps[0] AS maxClique
    // 运行结果:耗时68652毫秒,68.652秒
    

    4个并发寻找最大团

    CALL algo.unionFind.stream('HORGShareHoldV002','持股',{concurrency:4}) YIELD nodeId,setId
    WITH nodeId,algo.asNode(nodeId).name AS name,setId
    WITH COLLECT(setId) AS setIdList
    WITH apoc.coll.sortMaps(apoc.coll.frequencies(setIdList),'count') AS sortMaps
    RETURN sortMaps[0] AS maxClique
    // 运行结果:耗时68186毫秒,68.186秒
    

    8个并发寻找最大团

    CALL algo.unionFind.stream('HORGShareHoldV002','持股',{concurrency:8}) YIELD nodeId,setId
    WITH nodeId,algo.asNode(nodeId).name AS name,setId
    WITH COLLECT(setId) AS setIdList
    WITH apoc.coll.sortMaps(apoc.coll.frequencies(setIdList),'count') AS sortMaps
    RETURN sortMaps[0] AS maxClique
    // 运行结果:耗时66595毫秒,66.595秒
    

    16个并发寻找最大团

    CALL algo.unionFind.stream('HORGShareHoldV002','持股',{concurrency:16}) YIELD nodeId,setId
    WITH nodeId,algo.asNode(nodeId).name AS name,setId
    WITH COLLECT(setId) AS setIdList
    WITH apoc.coll.sortMaps(apoc.coll.frequencies(setIdList),'count') AS sortMaps
    RETURN sortMaps[0] AS maxClique
    // 运行结果:耗时64600毫秒,64.600秒
    

    24个并发寻找最大团

    CALL algo.unionFind.stream('HORGShareHoldV002','持股',{concurrency:24}) YIELD nodeId,setId
    WITH nodeId,algo.asNode(nodeId).name AS name,setId
    WITH COLLECT(setId) AS setIdList
    WITH apoc.coll.sortMaps(apoc.coll.frequencies(setIdList),'count') AS sortMaps
    RETURN sortMaps[0] AS maxClique
    // 运行结果:耗时67082毫秒,67.082秒
    

    32个并发寻找最大团

    CALL algo.unionFind.stream('HORGShareHoldV002','持股',{concurrency:32}) YIELD nodeId,setId
    WITH nodeId,algo.asNode(nodeId).name AS name,setId
    WITH COLLECT(setId) AS setIdList
    WITH apoc.coll.sortMaps(apoc.coll.frequencies(setIdList),'count') AS sortMaps
    RETURN sortMaps[0] AS maxClique
    // 运行结果:耗时65864毫秒,65.864秒
    

    40个并发寻找最大团

    CALL algo.unionFind.stream('HORGShareHoldV002','持股',{concurrency:40}) YIELD nodeId,setId
    WITH nodeId,algo.asNode(nodeId).name AS name,setId
    WITH COLLECT(setId) AS setIdList
    WITH apoc.coll.sortMaps(apoc.coll.frequencies(setIdList),'count') AS sortMaps
    RETURN sortMaps[0] AS maxClique
    // 运行结果:耗时65841毫秒,65.841秒
    

    48个并发寻找最大团

    CALL algo.unionFind.stream('HORGShareHoldV002','持股',{concurrency:48}) YIELD nodeId,setId
    WITH nodeId,algo.asNode(nodeId).name AS name,setId
    WITH COLLECT(setId) AS setIdList
    WITH apoc.coll.sortMaps(apoc.coll.frequencies(setIdList),'count') AS sortMaps
    RETURN sortMaps[0] AS maxClique
    // 运行结果:耗时65807毫秒,65.807秒
    

    56个并发寻找最大团

    CALL algo.unionFind.stream('HORGShareHoldV002','持股',{concurrency:56}) YIELD nodeId,setId
    WITH nodeId,algo.asNode(nodeId).name AS name,setId
    WITH COLLECT(setId) AS setIdList
    WITH apoc.coll.sortMaps(apoc.coll.frequencies(setIdList),'count') AS sortMaps
    RETURN sortMaps[0] AS maxClique
    // 运行结果:耗时66567毫秒,66.567秒
    

    160个并发寻找最大团

    CALL algo.unionFind.stream('HORGShareHoldV002','持股',{concurrency:160}) YIELD nodeId,setId
    WITH nodeId,algo.asNode(nodeId).name AS name,setId
    WITH COLLECT(setId) AS setIdList
    WITH apoc.coll.sortMaps(apoc.coll.frequencies(setIdList),'count') AS sortMaps
    RETURN sortMaps[0] AS maxClique
    // 运行结果:耗时67828毫秒,67.828秒
    

在超大团【1189万的团】上执行一百层路径穿透测试

穿透起点为国资委国务院国有资产监督管理委员会,向下穿透100层。

  • 一百层以内团数量统计
    MATCH p=(n:HORGShareHoldV002)-[*..100]-(m) WITH NODES(p) AS nodes
    UNWIND nodes AS node
    WITH COLLECT(DISTINCT ID(node)) AS disIds
    RETURN SIZE(disIds) AS size
    // 节点数量为:
    
    MATCH p=(n:HORGShareHoldV002)-[*..100]-(m) WITH RELATIONSHIPS(p) AS rels
    UNWIND rels AS rel
    WITH COLLECT(DISTINCT ID(rel)) AS disIds
    RETURN SIZE(disIds) AS size
    // 关系数据量为:
    

案例一【返回开始结束节点数据】

  • 入参
    {
      "statements": [
          {
              "statement": "MATCH p=(n:HORGShareHoldV002)-[r:`持股`*100]->(m) WHERE n.name='国务院国有资产监督管理委员会' RETURN n,m SKIP 0 LIMIT 1",
              "resultDataContents": [
                "graph"
              ]
          }
      ]
    }
    
  • 结果
    1:58ms
    2:60ms
    3:60ms
    

    案例二【返回全部数据】

  • 入参
    {
      "statements": [
          {
              "statement": "MATCH p=(n:HORGShareHoldV002)-[r:`持股`*100]->(m) WHERE n.name='国务院国有资产监督管理委员会' RETURN p SKIP 0 LIMIT 1",
              "resultDataContents": [
                "graph"
              ]
          }
      ]
    }
    
  • 结果
    1:90ms
    2:90ms
    3:86ms
    

穿透一百层返回开始结束节点数据使用SKIP参数

  • SKIP 100 LIMIT 1
    MATCH p=(n:HORGShareHoldV002)-[r:`持股`*100]->(m) WHERE n.name='国务院国有资产监督管理委员会' RETURN n,m SKIP 100 LIMIT 1
    // 结果:耗时66毫秒
    
  • SKIP 1000 LIMIT 1
    MATCH p=(n:HORGShareHoldV002)-[r:`持股`*100]->(m) WHERE n.name='国务院国有资产监督管理委员会' RETURN n,m SKIP 1000 LIMIT 1
    // 结果:耗时72毫秒
    
  • SKIP 10000 LIMIT 1
    MATCH p=(n:HORGShareHoldV002)-[r:`持股`*100]->(m) WHERE n.name='国务院国有资产监督管理委员会' RETURN n,m SKIP 10000 LIMIT 1
    // 结果:耗时160ms
    
  • SKIP 100000 LIMIT 1
    MATCH p=(n:HORGShareHoldV002)-[r:`持股`*100]->(m) WHERE n.name='国务院国有资产监督管理委员会' RETURN n,m SKIP 100000 LIMIT 1
    // 结果:耗时1.098秒
    
  • SKIP 1000000 LIMIT 1
    MATCH p=(n:HORGShareHoldV002)-[r:`持股`*100]->(m) WHERE n.name='国务院国有资产监督管理委员会' RETURN n,m SKIP 1000000 LIMIT 1
    // 结果:耗时10.50秒
    
  • SKIP 10000000 LIMIT 1
    MATCH p=(n:HORGShareHoldV002)-[r:`持股`*100]->(m) WHERE n.name='国务院国有资产监督管理委员会' RETURN n,m SKIP 10000000 LIMIT 1
    // 结果:耗时1分钟50.64秒
    
  • SKIP 100000000 LIMIT 1
    MATCH p=(n:HORGShareHoldV002)-[r:`持股`*100]->(m) WHERE n.name='国务院国有资产监督管理委员会' RETURN n,m SKIP 100000000 LIMIT 1
    // 结果:耗时17分钟40.74秒