deepgraph.deepgraph.DeepGraph.partition_edges

DeepGraph.partition_edges(relations=None, source_features=None, target_features=None, relation_funcs=None, n_edges=True, return_ge=False)[source]

Return a superedge DataFrame se.

This method allows you to partition the edges in e by their types of relations, but also by the types of features of their incident source and target nodes, and any combination of the three.

Essentially, this method is a wrapper around the pandas groupby method: se = e.groupby(relations + features_s + features_t).agg(relation_funcs), where relations are column names of e, and in order to group e by features_s and/or features_t, the features of type source_features and/or target_features (column names of v) are transferred to e, appending ‘_s’ and/or ‘_t’ to the corresponding column names of e (if they are not already present). The only requirement on the combination of relations, source_features and target_features is that at least on of the lists has to be of length >= 1.

By passing a dictionary of functions on the relations of e, relation_funcs, one may aggregate user-defined values of the partition’s elements, the superedges’ relations. If n_edges is True, create a column with the number of each superedge’s constituent edges. If return_ge is True, return the created groupby object to facilitate additional operations, such as ge.apply(func, *args, **kwargs).

For details, type help(g.e.groupby), and/or inspect the available methods of ge.

For examples, see below. For an in-depth description and mathematical details of graph partitioning, see https://arxiv.org/pdf/1604.00971v1.pdf, in particular Sec. III B, E and F.

Parameters:
  • relations (str, int or array_like, optional (default=None)) – Column name(s) of e, indicating the type(s) of relation(s) used to induce a (intersection) partition of e (in conjunction with source_features and target_features).
  • source_features (str, int or array_like, optional (default=None)) – Column name(s) of v, indicating the type(s) of feature(s) of the edges’ incident source nodes used to induce a (intersection) partition of e (in conjunction with relations and target_features).
  • target_features (str, int or array_like, optional (default=None)) – Column name(s) of v, indicating the type(s) of feature(s) of the edges’ incident target nodes used to induce a (intersection) partition of e (in conjunction with relations and source_features).
  • relation_funcs (dict, optional (default=None)) – Each key must be a column name of e, each value a (list of) function(s), working when passed a pandas.DataFrame or when passed to pandas.DataFrame.apply. See the docstring of ge.agg for details: help(ge.agg).
  • n_edges (bool, optional (default=True)) – Whether to create a n_edges column in se, indicating the number of edges in each superedge.
  • return_ge (bool, optional (default=False)) – If True, also return the pandas groupby object, ge.
Returns:

  • se (pd.DataFrame) – The aggreated DataFrame of superedges, se.
  • ge (pandas.core.groupby.DataFrameGroupBy) – The pandas groupby object, ge.

Notes

Currently, NA groups in GroupBy are automatically excluded (silently). One workaround is to use a placeholder (e.g., -1, ‘none’) for NA values before doing the groupby (calling this method). See http://stackoverflow.com/questions/18429491/groupby-columns-with-nan-missing-values and https://github.com/pydata/pandas/issues/3729.

Examples

First, we need to create a graph in order to demonstrate how to partition its edge set.

Create a node table:

>>> import pandas as pd
>>> import deepgraph as dg
>>> v = pd.DataFrame({'x': [-3.4,2.1,-1.1,0.9,2.3],
...                   'time': [0,1,2,5,9],
...                   'color': ['g','g','b','g','r'],
...                   'size': [1,3,2,3,1]})
>>> g = dg.DeepGraph(v)
>>> g.v
  color  size  time    x
0     g     1     0 -3.4
1     g     3     1  2.1
2     b     2     2 -1.1
3     g     3     5  0.9
4     r     1     9  2.3

Create an edge table:

>>> def some_relations(ft_r, x_s,x_t,color_s,color_t,size_s,size_t):
...     dx = x_t - x_s
...     v = dx / ft_r
...     same_color = color_s == color_t
...     larger_than = size_s > size_t
...     return dx, v, same_color, larger_than
>>> g.create_edges_ft(('time', 5), connectors=some_relations)
>>> g.e.rename(columns={'ft_r': 'dt'}, inplace=True)
>>> g.e['inds'] = g.e.index.values  # to ease the eyes
>>> g.e
      dx  dt larger_than same_color         v    inds
s t
0 1  5.5   1       False       True  5.500000  (0, 1)
  2  2.3   2       False      False  1.150000  (0, 2)
  3  4.3   5       False       True  0.860000  (0, 3)
1 2 -3.2   1        True      False -3.200000  (1, 2)
  3 -1.2   4       False       True -0.300000  (1, 3)
2 3  2.0   3       False      False  0.666667  (2, 3)
3 4  1.4   4        True      False  0.350000  (3, 4)

Partitioning by the type of relation ‘larger_than’:

>>> g.partition_edges(relations='larger_than',
...                   relation_funcs={'dx': ['mean', 'std'],
...                                   'same_color': 'sum'})
             n_edges  same_color_sum  dx_mean    dx_std
larger_than
False              5               3     2.58  2.558711
True               2               0    -0.90  3.252691

A refinement of the last partition by the type of relation ‘same_color’:

>>> g.partition_edges(relations=['larger_than', 'same_color'],
...                   relation_funcs={'dx': ['mean', 'std'],
...                                   'dt': lambda x: tuple(x)})
                        n_edges dt_<lambda>   dx_mean    dx_std
larger_than same_color
False       False             2      (2, 3)  2.150000  0.212132
            True              3   (1, 5, 4)  2.866667  3.572581
True        False             2      (1, 4) -0.900000  3.252691

Partitioning by the type of source feature ‘color’:

>>> g.partition_edges(source_features='color',
...                   relation_funcs={'same_color': 'sum'})
         n_edges  same_color
color_s
b              1           0
g              6           3

As one can see, the type of feature ‘color’ of the source nodes has been transferred to e:

>>> g.e
      dx  dt larger_than same_color         v    inds color_s
s t
0 1  5.5   1       False       True  5.500000  (0, 1)       g
  2  2.3   2       False      False  1.150000  (0, 2)       g
  3  4.3   5       False       True  0.860000  (0, 3)       g
1 2 -3.2   1        True      False -3.200000  (1, 2)       g
  3 -1.2   4       False       True -0.300000  (1, 3)       g
2 3  2.0   3       False      False  0.666667  (2, 3)       b
3 4  1.4   4        True      False  0.350000  (3, 4)       g

A further refinement of the last partition by the type of source feature ‘size’:

>>> g.partition_edges(source_features=['color', 'size'],
...                   relation_funcs={'same_color': 'sum',
...                                   'inds': lambda x: tuple(x)})
                n_edges  same_color                      inds
color_s size_s
b       2             1           0                 ((2, 3),)
g       1             3           2  ((0, 1), (0, 2), (0, 3))
        3             3           1  ((1, 2), (1, 3), (3, 4))

Partitioning by the types of target features (‘color’, ‘size’):

>>> g.partition_edges(target_features=['color', 'size'],
...                   relation_funcs={'same_color': 'sum',
...                                   'inds': lambda x: tuple(x)})
                n_edges  same_color                              inds
color_t size_t
b       2             2           0                  ((0, 2), (1, 2))
g       3             4           3  ((0, 1), (0, 3), (1, 3), (2, 3))
r       1             1           0                         ((3, 4),)

Partitioning by the type of source feature ‘color’ and the type of target feature ‘size’:

>>> g.partition_edges(source_features='color', target_features='size',
...                   relation_funcs={'same_color': 'sum',
...                                   'inds': lambda x: tuple(x)})
                n_edges  same_color                      inds
color_s size_t
b       3             1           0                 ((2, 3),)
g       1             1           0                 ((3, 4),)
        2             2           0          ((0, 2), (1, 2))
        3             3           3  ((0, 1), (0, 3), (1, 3))

A further refinement of the last partition by the type of relation ‘larger_than’:

>>> g.partition_edges(relations='larger_than',
...                   source_features='color', target_features='size',
...                   relation_funcs={'inds': lambda x: tuple(x)})
                            n_edges                      inds
larger_than color_s size_t
False       b       3             1                 ((2, 3),)
            g       2             1                 ((0, 2),)
                    3             3  ((0, 1), (0, 3), (1, 3))
True        g       1             1                 ((3, 4),)
                    2             1                 ((1, 2),)