deepgraph.DeepGraph.partition_graph
- DeepGraph.partition_graph(features, feature_funcs=None, relation_funcs=None, n_nodes=True, n_edges=True, return_gve=False)[source]
Return supergraph DataFrames
svandse.This method allows partitioning of the graph represented by
vandeinto a supergraph,svandse. It creates a (intersection) partition of the nodes invby the type(s) of feature(s)features, together with the (intersection) partition’s corresponding partition of the edges ine.Essentially, this method is a wrapper around pandas groupby methods:
sv=v.groupby(features).agg(feature_funcs) andse=e.groupby(features_s+features_t).agg(relation_funcs). In order to groupeby features_s and features_t, the features of typefeaturesare transferred toe, appending ‘_s’ and ‘_t’ to the corresponding column names ofe, indicating source and target features, respectively (if they are not already present).By passing a dictionary of functions on the features (relations) of
v(e),feature_funcs(relation_funcs), one may aggregate user-defined values of the partition’s elements, the supernodes’ (superedges’) features (relations). Ifn_nodes(n_edges) is True, create a column with the number of each supernode’s (superedge’s) constituent nodes (edges).If
return_gveis True, return the created groupby objects to facilitate additional operations, such asgv.apply(func, *args, **kwargs) orge.apply(func, *args, **kwargs).For details, type help(
g.v.groupby), and/or inspect the available methods ofgv.For examples, see below. For an in-depth description and mathematical details of graph partitioning, see https://arxiv.org/pdf/1604.00971v1.pdf, in particular Sec. III C, E and F.
- Parameters:
features (str, int or array_like) – Column name(s) of
v, indicating the type(s) of feature(s) used to induce a (intersection) partition ofv, and its corresponding partition of the edges ine. Creates pandas groupby objects,gvandge.feature_funcs (dict, optional (default=None)) – Each key must be a column name of
v, each value either a function, or a list of functions, working when passed apandas.DataFrameor when passed topandas.DataFrame.apply. See the docstring ofgv.agg for details: help(gv.agg).relation_funcs (dict, optional (default=None)) – Each key must be a column name of
e, each value either a function, or a list of functions, working when passed apandas.DataFrameor when passed topandas.DataFrame.apply. See the docstring ofge.agg for details: help(ge.agg).n_nodes (bool, optional (default=True)) – Whether to create a
n_nodescolumn insv, indicating the number of nodes in each supernode.n_edges (bool, optional (default=True)) – Whether to create a
n_edgescolumn inse, indicating the number of edges in each superedge.return_gve (bool, optional (default=False)) – If True, also return the pandas groupby objects,
gvandge.
- Returns:
sv (pd.DataFrame) – The aggreated DataFrame of supernodes,
sv.se (pd.DataFrame) – The aggregated DataFrame of superedges,
se.gv (pandas.core.groupby.DataFrameGroupBy) – The pandas groupby object,
v.groupby(features).ge (pandas.core.groupby.DataFrameGroupBy) – The pandas groupby object,
e.groupby(features_i+feaures_j).
See also
Notes
Currently, NA groups in GroupBy are automatically excluded (silently). One workaround is to use a placeholder (e.g., -1, ‘none’) for NA values before doing the groupby (calling this method). See http://stackoverflow.com/questions/18429491/groupby-columns-with-nan-missing-values and https://github.com/pydata/pandas/issues/3729.
Examples
First, we need to create a graph in order to demonstrate its partitioning into a supergraph.
Create a node table:
>>> import pandas as pd >>> import deepgraph as dg >>> v = pd.DataFrame({'x': [-3.4,2.1,-1.1,0.9,2.3], ... 'time': [0,1,2,5,9], ... 'color': ['g','g','b','g','r'], ... 'size': [1,3,2,3,1]}) >>> g = dg.DeepGraph(v)
>>> g.v color size time x 0 g 1 0 -3.4 1 g 3 1 2.1 2 b 2 2 -1.1 3 g 3 5 0.9 4 r 1 9 2.3
Create an edge table:
>>> def some_relations(ft_r, x_s,x_t,color_s,color_t,size_s,size_t): ... dx = x_t - x_s ... v = dx / ft_r ... same_color = color_s == color_t ... larger_than = size_s > size_t ... return dx, v, same_color, larger_than >>> g.create_edges_ft(('time', 5), connectors=some_relations) >>> g.e.rename(columns={'ft_r': 'dt'}, inplace=True) >>> g.e['inds'] = g.e.index.values # to ease the eyes
>>> g.e dx dt larger_than same_color v inds s t 0 1 5.5 1 False True 5.500000 (0, 1) 2 2.3 2 False False 1.150000 (0, 2) 3 4.3 5 False True 0.860000 (0, 3) 1 2 -3.2 1 True False -3.200000 (1, 2) 3 -1.2 4 False True -0.300000 (1, 3) 2 3 2.0 3 False False 0.666667 (2, 3) 3 4 1.4 4 True False 0.350000 (3, 4)
Create a supergraph by partitioning by the type of feature ‘color’:
>>> sv, se = g.partition_graph('color')
>>> sv n_nodes color b 1 g 3 r 1
>>> se n_edges color_s color_t b g 1 g b 2 g 3 r 1
Create intersection partitions by the types of features ‘color’ and ‘size’ (which are further refinements of the last partitions):
>>> sv, se = g.partition_graph( ... ['color', 'size'], ... relation_funcs={'inds': lambda x: tuple(x)})
>>> sv n_nodes color size b 2 1 g 1 1 3 2 r 1 1
>>> se n_edges inds color_s size_s color_t size_t b 2 g 3 1 ((2, 3),) g 1 b 2 1 ((0, 2),) g 3 2 ((0, 1), (0, 3)) 3 b 2 1 ((1, 2),) g 3 1 ((1, 3),) r 1 1 ((3, 4),)
Partition by ‘color’ and aggregate some properties:
>>> sv, se = g.partition_graph('color', ... feature_funcs={'time': lambda x: list(x)}, ... relation_funcs={'larger_than': 'sum', 'same_color': 'sum'})
>>> sv n_nodes time color b 1 [2] g 3 [0, 1, 5] r 1 [9]
>>> se n_edges larger_than same_color color_s color_t b g 1 False 0 g b 2 True 0 g 3 False 3 r 1 True 0