deepgraph.DeepGraph.partition_edges
- DeepGraph.partition_edges(relations=None, source_features=None, target_features=None, relation_funcs=None, n_edges=True, return_ge=False)[source]
Return a superedge DataFrame
se.This method allows you to partition the edges in
eby their types of relations, but also by the types of features of their incident source and target nodes, and any combination of the three.Essentially, this method is a wrapper around the pandas groupby method:
se=e.groupby(relations+ features_s + features_t).agg(relation_funcs), whererelationsare column names ofe, and in order to groupeby features_s and/or features_t, the features of typesource_featuresand/ortarget_features(column names ofv) are transferred toe, appending ‘_s’ and/or ‘_t’ to the corresponding column names ofe(if they are not already present). The only requirement on the combination ofrelations,source_featuresandtarget_featuresis that at least on of the lists has to be of length >= 1.By passing a dictionary of functions on the relations of
e,relation_funcs, one may aggregate user-defined values of the partition’s elements, the superedges’ relations. Ifn_edgesis True, create a column with the number of each superedge’s constituent edges. Ifreturn_geis True, return the created groupby object to facilitate additional operations, such asge.apply(func, *args, **kwargs).For details, type help(
g.e.groupby), and/or inspect the available methods ofge.For examples, see below. For an in-depth description and mathematical details of graph partitioning, see https://arxiv.org/pdf/1604.00971v1.pdf, in particular Sec. III B, E and F.
- Parameters:
relations (str, int or array_like, optional (default=None)) – Column name(s) of
e, indicating the type(s) of relation(s) used to induce a (intersection) partition ofe(in conjunction withsource_featuresandtarget_features).source_features (str, int or array_like, optional (default=None)) – Column name(s) of
v, indicating the type(s) of feature(s) of the edges’ incident source nodes used to induce a (intersection) partition ofe(in conjunction withrelationsandtarget_features).target_features (str, int or array_like, optional (default=None)) – Column name(s) of
v, indicating the type(s) of feature(s) of the edges’ incident target nodes used to induce a (intersection) partition ofe(in conjunction withrelationsandsource_features).relation_funcs (dict, optional (default=None)) – Each key must be a column name of
e, each value a (list of) function(s), working when passed apandas.DataFrameor when passed topandas.DataFrame.apply. See the docstring ofge.agg for details: help(ge.agg).n_edges (bool, optional (default=True)) – Whether to create a
n_edgescolumn inse, indicating the number of edges in each superedge.return_ge (bool, optional (default=False)) – If True, also return the pandas groupby object,
ge.
- Returns:
se (pd.DataFrame) – The aggreated DataFrame of superedges,
se.ge (pandas.core.groupby.DataFrameGroupBy) – The pandas groupby object,
ge.
See also
Notes
Currently, NA groups in GroupBy are automatically excluded (silently). One workaround is to use a placeholder (e.g., -1, ‘none’) for NA values before doing the groupby (calling this method). See http://stackoverflow.com/questions/18429491/groupby-columns-with-nan-missing-values and https://github.com/pydata/pandas/issues/3729.
Examples
First, we need to create a graph in order to demonstrate how to partition its edge set.
Create a node table:
>>> import pandas as pd >>> import deepgraph as dg >>> v = pd.DataFrame({'x': [-3.4,2.1,-1.1,0.9,2.3], ... 'time': [0,1,2,5,9], ... 'color': ['g','g','b','g','r'], ... 'size': [1,3,2,3,1]}) >>> g = dg.DeepGraph(v)
>>> g.v color size time x 0 g 1 0 -3.4 1 g 3 1 2.1 2 b 2 2 -1.1 3 g 3 5 0.9 4 r 1 9 2.3
Create an edge table:
>>> def some_relations(ft_r, x_s,x_t,color_s,color_t,size_s,size_t): ... dx = x_t - x_s ... v = dx / ft_r ... same_color = color_s == color_t ... larger_than = size_s > size_t ... return dx, v, same_color, larger_than >>> g.create_edges_ft(('time', 5), connectors=some_relations) >>> g.e.rename(columns={'ft_r': 'dt'}, inplace=True) >>> g.e['inds'] = g.e.index.values # to ease the eyes
>>> g.e dx dt larger_than same_color v inds s t 0 1 5.5 1 False True 5.500000 (0, 1) 2 2.3 2 False False 1.150000 (0, 2) 3 4.3 5 False True 0.860000 (0, 3) 1 2 -3.2 1 True False -3.200000 (1, 2) 3 -1.2 4 False True -0.300000 (1, 3) 2 3 2.0 3 False False 0.666667 (2, 3) 3 4 1.4 4 True False 0.350000 (3, 4)
Partitioning by the type of relation ‘larger_than’:
>>> g.partition_edges(relations='larger_than', ... relation_funcs={'dx': ['mean', 'std'], ... 'same_color': 'sum'}) n_edges same_color_sum dx_mean dx_std larger_than False 5 3 2.58 2.558711 True 2 0 -0.90 3.252691
A refinement of the last partition by the type of relation ‘same_color’:
>>> g.partition_edges(relations=['larger_than', 'same_color'], ... relation_funcs={'dx': ['mean', 'std'], ... 'dt': lambda x: tuple(x)}) n_edges dt_<lambda> dx_mean dx_std larger_than same_color False False 2 (2, 3) 2.150000 0.212132 True 3 (1, 5, 4) 2.866667 3.572581 True False 2 (1, 4) -0.900000 3.252691
Partitioning by the type of source feature ‘color’:
>>> g.partition_edges(source_features='color', ... relation_funcs={'same_color': 'sum'}) n_edges same_color color_s b 1 0 g 6 3
As one can see, the type of feature ‘color’ of the source nodes has been transferred to
e:>>> g.e dx dt larger_than same_color v inds color_s s t 0 1 5.5 1 False True 5.500000 (0, 1) g 2 2.3 2 False False 1.150000 (0, 2) g 3 4.3 5 False True 0.860000 (0, 3) g 1 2 -3.2 1 True False -3.200000 (1, 2) g 3 -1.2 4 False True -0.300000 (1, 3) g 2 3 2.0 3 False False 0.666667 (2, 3) b 3 4 1.4 4 True False 0.350000 (3, 4) g
A further refinement of the last partition by the type of source feature ‘size’:
>>> g.partition_edges(source_features=['color', 'size'], ... relation_funcs={'same_color': 'sum', ... 'inds': lambda x: tuple(x)}) n_edges same_color inds color_s size_s b 2 1 0 ((2, 3),) g 1 3 2 ((0, 1), (0, 2), (0, 3)) 3 3 1 ((1, 2), (1, 3), (3, 4))
Partitioning by the types of target features (‘color’, ‘size’):
>>> g.partition_edges(target_features=['color', 'size'], ... relation_funcs={'same_color': 'sum', ... 'inds': lambda x: tuple(x)}) n_edges same_color inds color_t size_t b 2 2 0 ((0, 2), (1, 2)) g 3 4 3 ((0, 1), (0, 3), (1, 3), (2, 3)) r 1 1 0 ((3, 4),)
Partitioning by the type of source feature ‘color’ and the type of target feature ‘size’:
>>> g.partition_edges(source_features='color', target_features='size', ... relation_funcs={'same_color': 'sum', ... 'inds': lambda x: tuple(x)}) n_edges same_color inds color_s size_t b 3 1 0 ((2, 3),) g 1 1 0 ((3, 4),) 2 2 0 ((0, 2), (1, 2)) 3 3 3 ((0, 1), (0, 3), (1, 3))
A further refinement of the last partition by the type of relation ‘larger_than’:
>>> g.partition_edges(relations='larger_than', ... source_features='color', target_features='size', ... relation_funcs={'inds': lambda x: tuple(x)}) n_edges inds larger_than color_s size_t False b 3 1 ((2, 3),) g 2 1 ((0, 2),) 3 3 ((0, 1), (0, 3), (1, 3)) True g 1 1 ((3, 4),) 2 1 ((1, 2),)