deepgraph.deepgraph.DeepGraph.partition_graph

DeepGraph.partition_graph(features, feature_funcs=None, relation_funcs=None, n_nodes=True, n_edges=True, return_gve=False)

Return supergraph DataFrames sv and se.

This method allows partitioning of the graph represented by v and e into a supergraph, sv and se. It creates a (intersection) partition of the nodes in v by the type(s) of feature(s) features, together with the (intersection) partition’s corresponding partition of the edges in e.

Essentially, this method is a wrapper around pandas groupby methods: sv = v.groupby(features).agg(feature_funcs) and se = e.groupby(features_s+features_t).agg(relation_funcs). In order to group e by features_s and features_t, the features of type features are transferred to e, appending ‘_s’ and ‘_t’ to the corresponding column names of e, indicating source and target features, respectively (if they are not already present).

By passing a dictionary of functions on the features (relations) of v (e), feature_funcs (relation_funcs), one may aggregate user-defined values of the partition’s elements, the supernodes’ (superedges’) features (relations). If n_nodes (n_edges) is True, create a column with the number of each supernode’s (superedge’s) constituent nodes (edges).

If return_gve is True, return the created groupby objects to facilitate additional operations, such as gv.apply(func, *args, **kwargs) or ge.apply(func, *args, **kwargs).

For details, type help(g.v.groupby), and/or inspect the available methods of gv.

For examples, see below. For an in-depth description and mathematical details of graph partitioning, see https://arxiv.org/pdf/1604.00971v1.pdf, in particular Sec. III C, E and F.

Parameters:
  • features (str, int or array_like) – Column name(s) of v, indicating the type(s) of feature(s) used to induce a (intersection) partition of v, and its corresponding partition of the edges in e. Creates pandas groupby objects, gv and ge.
  • feature_funcs (dict, optional (default=None)) – Each key must be a column name of v, each value either a function, or a list of functions, working when passed a pandas.DataFrame or when passed to pandas.DataFrame.apply. See the docstring of gv.agg for details: help(gv.agg).
  • relation_funcs (dict, optional (default=None)) – Each key must be a column name of e, each value either a function, or a list of functions, working when passed a pandas.DataFrame or when passed to pandas.DataFrame.apply. See the docstring of ge.agg for details: help(ge.agg).
  • n_nodes (bool, optional (default=True)) – Whether to create a n_nodes column in sv, indicating the number of nodes in each supernode.
  • n_edges (bool, optional (default=True)) – Whether to create a n_edges column in se, indicating the number of edges in each superedge.
  • return_gve (bool, optional (default=False)) – If True, also return the pandas groupby objects, gv and ge.
Returns:

  • sv (pd.DataFrame) – The aggreated DataFrame of supernodes, sv.
  • se (pd.DataFrame) – The aggregated DataFrame of superedges, se.
  • gv (pandas.core.groupby.DataFrameGroupBy) – The pandas groupby object, v.groupby(features).
  • ge (pandas.core.groupby.DataFrameGroupBy) – The pandas groupby object, e.groupby(features_i+feaures_j).

Notes

Currently, NA groups in GroupBy are automatically excluded (silently). One workaround is to use a placeholder (e.g., -1, ‘none’) for NA values before doing the groupby (calling this method). See http://stackoverflow.com/questions/18429491/groupby-columns-with-nan-missing-values and https://github.com/pydata/pandas/issues/3729.

Examples

First, we need to create a graph in order to demonstrate its partitioning into a supergraph.

Create a node table:

>>> import pandas as pd
>>> import deepgraph as dg
>>> v = pd.DataFrame({'x': [-3.4,2.1,-1.1,0.9,2.3],
...                   'time': [0,1,2,5,9],
...                   'color': ['g','g','b','g','r'],
...                   'size': [1,3,2,3,1]})
>>> g = dg.DeepGraph(v)
>>> g.v
  color  size  time    x
0     g     1     0 -3.4
1     g     3     1  2.1
2     b     2     2 -1.1
3     g     3     5  0.9
4     r     1     9  2.3

Create an edge table:

>>> def some_relations(ft_r, x_s,x_t,color_s,color_t,size_s,size_t):
...     dx = x_t - x_s
...     v = dx / ft_r
...     same_color = color_s == color_t
...     larger_than = size_s > size_t
...     return dx, v, same_color, larger_than
>>> g.create_edges_ft(('time', 5), connectors=some_relations)
>>> g.e.rename(columns={'ft_r': 'dt'}, inplace=True)
>>> g.e['inds'] = g.e.index.values  # to ease the eyes
>>> g.e
      dx  dt larger_than same_color         v    inds
s t
0 1  5.5   1       False       True  5.500000  (0, 1)
  2  2.3   2       False      False  1.150000  (0, 2)
  3  4.3   5       False       True  0.860000  (0, 3)
1 2 -3.2   1        True      False -3.200000  (1, 2)
  3 -1.2   4       False       True -0.300000  (1, 3)
2 3  2.0   3       False      False  0.666667  (2, 3)
3 4  1.4   4        True      False  0.350000  (3, 4)

Create a supergraph by partitioning by the type of feature ‘color’:

>>> sv, se = g.partition_graph('color')
>>> sv
       n_nodes
color
b            1
g            3
r            1
>>> se
                 n_edges
color_s color_t
b       g              1
g       b              2
        g              3
        r              1

Create intersection partitions by the types of features ‘color’ and ‘size’ (which are further refinements of the last partitions):

>>> sv, se = g.partition_graph(
...     ['color', 'size'],
...     relation_funcs={'inds': lambda x: tuple(x)})
>>> sv
            n_nodes
color size
b     2           1
g     1           1
      3           2
r     1           1
>>> se
                               n_edges              inds
color_s size_s color_t size_t
b       2      g       3             1         ((2, 3),)
g       1      b       2             1         ((0, 2),)
               g       3             2  ((0, 1), (0, 3))
        3      b       2             1         ((1, 2),)
               g       3             1         ((1, 3),)
               r       1             1         ((3, 4),)

Partition by ‘color’ and aggregate some properties:

>>> sv, se = g.partition_graph('color',
...     feature_funcs={'time': lambda x: list(x)},
...     relation_funcs={'larger_than': 'sum', 'same_color': 'sum'})
>>> sv
       n_nodes       time
color
b            1        [2]
g            3  [0, 1, 5]
r            1        [9]
>>> se
                 n_edges larger_than  same_color
color_s color_t
b       g              1       False           0
g       b              2        True           0
        g              3       False           3
        r              1        True           0