deepgraph.DeepGraph.partition_nodes
- DeepGraph.partition_nodes(features, feature_funcs=None, n_nodes=True, return_gv=False)[source]
Return a supernode DataFrame
sv.This is essentially a wrapper around the pandas groupby method:
sv=v.groupby(features).agg(feature_funcs). It creates a (intersection) partition of the nodes invby the type(s) of feature(s)features, resulting in a supernode DataFramesv. By passing a dictionary of functions on the features ofv,feature_funcs, one may aggregate user-defined values of the partition’s elements, the supernodes’ features. Ifn_nodesis True, create a column with the number of each supernode’s constituent nodes. Ifreturn_gvis True, return the created groupby object to facilitate additional operations, such asgv.apply(func, *args, **kwargs).For details, type help(
v.groupby), and/or inspect the available methods ofgv.For examples, see below. For an in-depth description and mathematical details of graph partitioning, see https://arxiv.org/pdf/1604.00971v1.pdf, in particular Sec. III A, E and F.
- Parameters:
features (str, int or array_like) – Column name(s) of
v, indicating the type(s) of feature(s) used to induce a (intersection) partition. Creates a pandas groupby object,gv=v.groupby(features).feature_funcs (dict, optional (default=None)) – Each key must be a column name of
v, each value either a function, or a list of functions, working when passed apandas.DataFrameor when passed topandas.DataFrame.apply. See the docstring ofgv.agg for details: help(gv.agg).n_nodes (bool, optional (default=True)) – Whether to create a
n_nodescolumn insv, indicating the number of nodes in each supernode.return_gv (bool, optional (default=False)) – If True, also return the
v.groupby(features) object,gv.
- Returns:
sv (pd.DataFrame) – The aggreated DataFrame of supernodes,
sv.gv (pandas.core.groupby.DataFrameGroupBy) – The pandas groupby object,
v.groupby(features).
See also
Notes
Currently, NA groups in GroupBy are automatically excluded (silently). One workaround is to use a placeholder (e.g., -1, ‘none’) for NA values before doing the groupby (calling this method). See http://stackoverflow.com/questions/18429491/groupby-columns-with-nan-missing-values and https://github.com/pydata/pandas/issues/3729.
Examples
First, we need a node table, in order to demonstrate its partitioning:
>>> import pandas as pd >>> import deepgraph as dg >>> v = pd.DataFrame({'x': [-3.4,2.1,-1.1,0.9,2.3], ... 'time': [0,0,2,2,9], ... 'color': ['g','g','b','g','r'], ... 'size': [1,3,2,3,1]}) >>> g = dg.DeepGraph(v) >>> g.v color size time x 0 g 1 0 -3.4 1 g 3 0 2.1 2 b 2 2 -1.1 3 g 3 2 0.9 4 r 1 9 2.3
Create a partition by the type of feature ‘color’:
>>> g.partition_nodes('color') n_nodes color b 1 g 3 r 1
Create an intersection partition by the types of features ‘color’ and ‘size’ (which is a further refinement of the last partition):
>>> g.partition_nodes(['color', 'size']) n_nodes color size b 2 1 g 1 1 3 2 r 1 1
Partition by ‘color’ and collect x values:
>>> g.partition_nodes('color', {'time': lambda x: list(x)}) n_nodes time color b 1 [2] g 3 [0, 0, 2] r 1 [9]
Partition by ‘color’ and aggregate with different functions:
>>> g.partition_nodes('color', {'time': [lambda x: list(x), np.max], ... 'x': [np.mean, np.sum, np.std]}) n_nodes x_mean x_sum x_std time_<lambda> time_amax color b 1 -1.100000 -1.1 NaN [2] 2 g 3 -0.133333 -0.4 2.891943 [0, 0, 2] 2 r 1 2.300000 2.3 NaN [9] 9