deepgraph.deepgraph.DeepGraph.partition_edges¶
-
DeepGraph.
partition_edges
(relations=None, source_features=None, target_features=None, relation_funcs=None, n_edges=True, return_ge=False)[source]¶ Return a superedge DataFrame
se
.This method allows you to partition the edges in
e
by their types of relations, but also by the types of features of their incident source and target nodes, and any combination of the three.Essentially, this method is a wrapper around the pandas groupby method:
se
=e
.groupby(relations
+ features_s + features_t).agg(relation_funcs
), whererelations
are column names ofe
, and in order to groupe
by features_s and/or features_t, the features of typesource_features
and/ortarget_features
(column names ofv
) are transferred toe
, appending ‘_s’ and/or ‘_t’ to the corresponding column names ofe
(if they are not already present). The only requirement on the combination ofrelations
,source_features
andtarget_features
is that at least on of the lists has to be of length >= 1.By passing a dictionary of functions on the relations of
e
,relation_funcs
, one may aggregate user-defined values of the partition’s elements, the superedges’ relations. Ifn_edges
is True, create a column with the number of each superedge’s constituent edges. Ifreturn_ge
is True, return the created groupby object to facilitate additional operations, such asge
.apply(func, *args, **kwargs).For details, type help(
g.e
.groupby), and/or inspect the available methods ofge
.For examples, see below. For an in-depth description and mathematical details of graph partitioning, see https://arxiv.org/pdf/1604.00971v1.pdf, in particular Sec. III B, E and F.
Parameters: - relations (str, int or array_like, optional (default=None)) – Column name(s) of
e
, indicating the type(s) of relation(s) used to induce a (intersection) partition ofe
(in conjunction withsource_features
andtarget_features
). - source_features (str, int or array_like, optional (default=None)) – Column name(s) of
v
, indicating the type(s) of feature(s) of the edges’ incident source nodes used to induce a (intersection) partition ofe
(in conjunction withrelations
andtarget_features
). - target_features (str, int or array_like, optional (default=None)) – Column name(s) of
v
, indicating the type(s) of feature(s) of the edges’ incident target nodes used to induce a (intersection) partition ofe
(in conjunction withrelations
andsource_features
). - relation_funcs (dict, optional (default=None)) – Each key must be a column name of
e
, each value a (list of) function(s), working when passed apandas.DataFrame
or when passed topandas.DataFrame.apply
. See the docstring ofge
.agg for details: help(ge
.agg). - n_edges (bool, optional (default=True)) – Whether to create a
n_edges
column inse
, indicating the number of edges in each superedge. - return_ge (bool, optional (default=False)) – If True, also return the pandas groupby object,
ge
.
Returns: - se (pd.DataFrame) – The aggreated DataFrame of superedges,
se
. - ge (pandas.core.groupby.DataFrameGroupBy) – The pandas groupby object,
ge
.
See also
Notes
Currently, NA groups in GroupBy are automatically excluded (silently). One workaround is to use a placeholder (e.g., -1, ‘none’) for NA values before doing the groupby (calling this method). See http://stackoverflow.com/questions/18429491/groupby-columns-with-nan-missing-values and https://github.com/pydata/pandas/issues/3729.
Examples
First, we need to create a graph in order to demonstrate how to partition its edge set.
Create a node table:
>>> import pandas as pd >>> import deepgraph as dg >>> v = pd.DataFrame({'x': [-3.4,2.1,-1.1,0.9,2.3], ... 'time': [0,1,2,5,9], ... 'color': ['g','g','b','g','r'], ... 'size': [1,3,2,3,1]}) >>> g = dg.DeepGraph(v)
>>> g.v color size time x 0 g 1 0 -3.4 1 g 3 1 2.1 2 b 2 2 -1.1 3 g 3 5 0.9 4 r 1 9 2.3
Create an edge table:
>>> def some_relations(ft_r, x_s,x_t,color_s,color_t,size_s,size_t): ... dx = x_t - x_s ... v = dx / ft_r ... same_color = color_s == color_t ... larger_than = size_s > size_t ... return dx, v, same_color, larger_than >>> g.create_edges_ft(('time', 5), connectors=some_relations) >>> g.e.rename(columns={'ft_r': 'dt'}, inplace=True) >>> g.e['inds'] = g.e.index.values # to ease the eyes
>>> g.e dx dt larger_than same_color v inds s t 0 1 5.5 1 False True 5.500000 (0, 1) 2 2.3 2 False False 1.150000 (0, 2) 3 4.3 5 False True 0.860000 (0, 3) 1 2 -3.2 1 True False -3.200000 (1, 2) 3 -1.2 4 False True -0.300000 (1, 3) 2 3 2.0 3 False False 0.666667 (2, 3) 3 4 1.4 4 True False 0.350000 (3, 4)
Partitioning by the type of relation ‘larger_than’:
>>> g.partition_edges(relations='larger_than', ... relation_funcs={'dx': ['mean', 'std'], ... 'same_color': 'sum'}) n_edges same_color_sum dx_mean dx_std larger_than False 5 3 2.58 2.558711 True 2 0 -0.90 3.252691
A refinement of the last partition by the type of relation ‘same_color’:
>>> g.partition_edges(relations=['larger_than', 'same_color'], ... relation_funcs={'dx': ['mean', 'std'], ... 'dt': lambda x: tuple(x)}) n_edges dt_<lambda> dx_mean dx_std larger_than same_color False False 2 (2, 3) 2.150000 0.212132 True 3 (1, 5, 4) 2.866667 3.572581 True False 2 (1, 4) -0.900000 3.252691
Partitioning by the type of source feature ‘color’:
>>> g.partition_edges(source_features='color', ... relation_funcs={'same_color': 'sum'}) n_edges same_color color_s b 1 0 g 6 3
As one can see, the type of feature ‘color’ of the source nodes has been transferred to
e
:>>> g.e dx dt larger_than same_color v inds color_s s t 0 1 5.5 1 False True 5.500000 (0, 1) g 2 2.3 2 False False 1.150000 (0, 2) g 3 4.3 5 False True 0.860000 (0, 3) g 1 2 -3.2 1 True False -3.200000 (1, 2) g 3 -1.2 4 False True -0.300000 (1, 3) g 2 3 2.0 3 False False 0.666667 (2, 3) b 3 4 1.4 4 True False 0.350000 (3, 4) g
A further refinement of the last partition by the type of source feature ‘size’:
>>> g.partition_edges(source_features=['color', 'size'], ... relation_funcs={'same_color': 'sum', ... 'inds': lambda x: tuple(x)}) n_edges same_color inds color_s size_s b 2 1 0 ((2, 3),) g 1 3 2 ((0, 1), (0, 2), (0, 3)) 3 3 1 ((1, 2), (1, 3), (3, 4))
Partitioning by the types of target features (‘color’, ‘size’):
>>> g.partition_edges(target_features=['color', 'size'], ... relation_funcs={'same_color': 'sum', ... 'inds': lambda x: tuple(x)}) n_edges same_color inds color_t size_t b 2 2 0 ((0, 2), (1, 2)) g 3 4 3 ((0, 1), (0, 3), (1, 3), (2, 3)) r 1 1 0 ((3, 4),)
Partitioning by the type of source feature ‘color’ and the type of target feature ‘size’:
>>> g.partition_edges(source_features='color', target_features='size', ... relation_funcs={'same_color': 'sum', ... 'inds': lambda x: tuple(x)}) n_edges same_color inds color_s size_t b 3 1 0 ((2, 3),) g 1 1 0 ((3, 4),) 2 2 0 ((0, 2), (1, 2)) 3 3 3 ((0, 1), (0, 3), (1, 3))
A further refinement of the last partition by the type of relation ‘larger_than’:
>>> g.partition_edges(relations='larger_than', ... source_features='color', target_features='size', ... relation_funcs={'inds': lambda x: tuple(x)}) n_edges inds larger_than color_s size_t False b 3 1 ((2, 3),) g 2 1 ((0, 2),) 3 3 ((0, 1), (0, 3), (1, 3)) True g 1 1 ((3, 4),) 2 1 ((1, 2),)
- relations (str, int or array_like, optional (default=None)) – Column name(s) of