deepgraph.deepgraph.DeepGraph.partition_graph¶
-
DeepGraph.
partition_graph
(features, feature_funcs=None, relation_funcs=None, n_nodes=True, n_edges=True, return_gve=False)[source]¶ Return supergraph DataFrames
sv
andse
.This method allows partitioning of the graph represented by
v
ande
into a supergraph,sv
andse
. It creates a (intersection) partition of the nodes inv
by the type(s) of feature(s)features
, together with the (intersection) partition’s corresponding partition of the edges ine
.Essentially, this method is a wrapper around pandas groupby methods:
sv
=v
.groupby(features
).agg(feature_funcs
) andse
=e
.groupby(features_s+features_t).agg(relation_funcs
). In order to groupe
by features_s and features_t, the features of typefeatures
are transferred toe
, appending ‘_s’ and ‘_t’ to the corresponding column names ofe
, indicating source and target features, respectively (if they are not already present).By passing a dictionary of functions on the features (relations) of
v
(e
),feature_funcs
(relation_funcs
), one may aggregate user-defined values of the partition’s elements, the supernodes’ (superedges’) features (relations). Ifn_nodes
(n_edges
) is True, create a column with the number of each supernode’s (superedge’s) constituent nodes (edges).If
return_gve
is True, return the created groupby objects to facilitate additional operations, such asgv
.apply(func, *args, **kwargs) orge
.apply(func, *args, **kwargs).For details, type help(
g.v
.groupby), and/or inspect the available methods ofgv
.For examples, see below. For an in-depth description and mathematical details of graph partitioning, see https://arxiv.org/pdf/1604.00971v1.pdf, in particular Sec. III C, E and F.
Parameters: - features (str, int or array_like) – Column name(s) of
v
, indicating the type(s) of feature(s) used to induce a (intersection) partition ofv
, and its corresponding partition of the edges ine
. Creates pandas groupby objects,gv
andge
. - feature_funcs (dict, optional (default=None)) – Each key must be a column name of
v
, each value either a function, or a list of functions, working when passed apandas.DataFrame
or when passed topandas.DataFrame.apply
. See the docstring ofgv
.agg for details: help(gv
.agg). - relation_funcs (dict, optional (default=None)) – Each key must be a column name of
e
, each value either a function, or a list of functions, working when passed apandas.DataFrame
or when passed topandas.DataFrame.apply
. See the docstring ofge
.agg for details: help(ge
.agg). - n_nodes (bool, optional (default=True)) – Whether to create a
n_nodes
column insv
, indicating the number of nodes in each supernode. - n_edges (bool, optional (default=True)) – Whether to create a
n_edges
column inse
, indicating the number of edges in each superedge. - return_gve (bool, optional (default=False)) – If True, also return the pandas groupby objects,
gv
andge
.
Returns: - sv (pd.DataFrame) – The aggreated DataFrame of supernodes,
sv
. - se (pd.DataFrame) – The aggregated DataFrame of superedges,
se
. - gv (pandas.core.groupby.DataFrameGroupBy) – The pandas groupby object,
v
.groupby(features
). - ge (pandas.core.groupby.DataFrameGroupBy) – The pandas groupby object,
e
.groupby(features_i+feaures_j).
See also
Notes
Currently, NA groups in GroupBy are automatically excluded (silently). One workaround is to use a placeholder (e.g., -1, ‘none’) for NA values before doing the groupby (calling this method). See http://stackoverflow.com/questions/18429491/groupby-columns-with-nan-missing-values and https://github.com/pydata/pandas/issues/3729.
Examples
First, we need to create a graph in order to demonstrate its partitioning into a supergraph.
Create a node table:
>>> import pandas as pd >>> import deepgraph as dg >>> v = pd.DataFrame({'x': [-3.4,2.1,-1.1,0.9,2.3], ... 'time': [0,1,2,5,9], ... 'color': ['g','g','b','g','r'], ... 'size': [1,3,2,3,1]}) >>> g = dg.DeepGraph(v)
>>> g.v color size time x 0 g 1 0 -3.4 1 g 3 1 2.1 2 b 2 2 -1.1 3 g 3 5 0.9 4 r 1 9 2.3
Create an edge table:
>>> def some_relations(ft_r, x_s,x_t,color_s,color_t,size_s,size_t): ... dx = x_t - x_s ... v = dx / ft_r ... same_color = color_s == color_t ... larger_than = size_s > size_t ... return dx, v, same_color, larger_than >>> g.create_edges_ft(('time', 5), connectors=some_relations) >>> g.e.rename(columns={'ft_r': 'dt'}, inplace=True) >>> g.e['inds'] = g.e.index.values # to ease the eyes
>>> g.e dx dt larger_than same_color v inds s t 0 1 5.5 1 False True 5.500000 (0, 1) 2 2.3 2 False False 1.150000 (0, 2) 3 4.3 5 False True 0.860000 (0, 3) 1 2 -3.2 1 True False -3.200000 (1, 2) 3 -1.2 4 False True -0.300000 (1, 3) 2 3 2.0 3 False False 0.666667 (2, 3) 3 4 1.4 4 True False 0.350000 (3, 4)
Create a supergraph by partitioning by the type of feature ‘color’:
>>> sv, se = g.partition_graph('color')
>>> sv n_nodes color b 1 g 3 r 1
>>> se n_edges color_s color_t b g 1 g b 2 g 3 r 1
Create intersection partitions by the types of features ‘color’ and ‘size’ (which are further refinements of the last partitions):
>>> sv, se = g.partition_graph( ... ['color', 'size'], ... relation_funcs={'inds': lambda x: tuple(x)})
>>> sv n_nodes color size b 2 1 g 1 1 3 2 r 1 1
>>> se n_edges inds color_s size_s color_t size_t b 2 g 3 1 ((2, 3),) g 1 b 2 1 ((0, 2),) g 3 2 ((0, 1), (0, 3)) 3 b 2 1 ((1, 2),) g 3 1 ((1, 3),) r 1 1 ((3, 4),)
Partition by ‘color’ and aggregate some properties:
>>> sv, se = g.partition_graph('color', ... feature_funcs={'time': lambda x: list(x)}, ... relation_funcs={'larger_than': 'sum', 'same_color': 'sum'})
>>> sv n_nodes time color b 1 [2] g 3 [0, 1, 5] r 1 [9]
>>> se n_edges larger_than same_color color_s color_t b g 1 False 0 g b 2 True 0 g 3 False 3 r 1 True 0
- features (str, int or array_like) – Column name(s) of