deepgraph.deepgraph.DeepGraph

class DeepGraph(v=None, e=None, supernode_labels_by=None, superedge_labels_by=None)[source]

The core class of DeepGraph (dg).

This class encapsulates the graph representation as pandas.DataFrame objects in its attributes v and e. It can be initialized with a node table v, whose rows represent the nodes of the graph, as well as an edge table e, whose rows represent edges between the nodes.

Given a node table v, it provides methods to iteratively compute pairwise relations between the nodes using arbitrary, user-defined functions. These methods provide arguments to parallelize the computation and control memory consumption (see create_edges and create_edges_ft).

Also provides methods to partition nodes, edges or an entire graph by the graph’s properties and labels, and to create common network representations and graph objects of popular Python network packages.

Furthermore, it provides methods to visualize graphs and their properties and to benchmark the graph construction parameters.

Optionally, the convenience parameter supernode_labels_by can be passed, creating supernode labels by enumerating all distinct (tuples of) values of a (multiple) column(s) of v . Superedge labels can be created analogously, by passing the parameter superedge_labels_by.

Parameters:
  • v (pandas.DataFrame or pandas.HDFStore, optional (default=None)) – The node table, a table representation of the nodes of a graph. The index of v must be unique and represents the node indices. The column names of v represent the types of features of the nodes, and each cell represents a feature of a node. Only a reference to the input DataFrame is created, not a copy. May also be a pandas.HDFStore, but only create_edges and create_edges_ft may then be used (so far).
  • e (pandas.DataFrame, optional (default=None)) – The edge table, a table representation of the edges between the nodes given by v. Its index has to be a pandas.core.index.MultiIndex, whose first level contains the indices of the source nodes, and the second level contains the indices of the target nodes. Each row of e represents an edge, column names of e represent the types of relations of the edges, and each cell in e represents a relation of an edge. Only a reference to the input DataFrame is created, not a copy.
  • supernode_labels_by (dict, optional (default=None)) – A dictionary whose keys are strings and their values are (lists of) column names of v. Appends a column to v for each key, whose values correspond to supernode labels, enumerating all distinct (tuples of) values of the column(s) given by the dict’s value.
  • superedge_labels_by (dict, optional (default=None)) – A dictionary whose keys are strings and their values are (lists of) column names of e. Appends a column to e for each key, whose values correspond to superedge labels enumerating all distinct (tuples of) values of the column(s) given by the dict’s value.
v

See Parameters.

Type:pandas.DataFrame
e

See Parameters.

Type:pandas.DataFrame
n

Property: Number of nodes.

Type:int
m

Property: Number of edges.

Type:int
f

Property: types of features and number of features of corresponding type.

Type:pd.DataFrame
r

Property: types of relations and number of relations of corresponding type.

Type:pd.DataFrame
__init__(v=None, e=None, supernode_labels_by=None, superedge_labels_by=None)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__([v, e, supernode_labels_by, …]) Initialize self.
append_binning_labels_v(col, col_name[, …]) Append a column with binning labels of the values in v[col].
append_cp([directed, connection, col_name, …]) Append a component membership column to v.
append_datetime_categories_v([col, …]) Append datetime categories to v.
create_edges([connectors, selectors, …]) Create an edge table e linking the nodes in v.
create_edges_ft(ft_feature[, connectors, …]) Create (ft) an edge table e linking the nodes in v.
filter_by_interval_e(col, interval[, endpoint]) Keep only edges in e with relations of type col in interval.
filter_by_interval_v(col, interval[, endpoint]) Keep only nodes in v with features of type col in interval.
filter_by_values_e(col, values) Keep only edges in e with relations of type col in values.
filter_by_values_v(col, values) Keep only nodes in v with features of type col in values.
partition_edges([relations, …]) Return a superedge DataFrame se.
partition_graph(features[, feature_funcs, …]) Return supergraph DataFrames sv and se.
partition_nodes(features[, feature_funcs, …]) Return a supernode DataFrame sv.
plot_2d(x, y[, edges, C, C_split_0, …]) Plot nodes and corresponding edges in 2 dimensions.
plot_2d_generator(x, y, by[, edges, C, …]) Plot nodes and corresponding edges by groups.
plot_3d(x, y, z[, edges, kwds_scatter, …]) Work in progress!
plot_hist(x[, bins, log_bins, density, …]) Plot a histogram (or pdf) of x.
plot_logfile(logfile) Plot a logfile.
plot_map(lon, lat[, edges, C, C_split_0, …]) Plot nodes and corresponding edges on a basemap.
plot_map_generator(lon, lat, by[, edges, C, …]) Plot nodes and corresponding edges by groups, on basemaps.
plot_raster(label[, time, ax]) Work in progress!
plot_rects_label_numeric(label, xl, xr[, …]) Work in progress!
plot_rects_numeric_numeric(yb, yt, xl, xr[, …]) Work in progress!
return_cs_graph([relations, dropna]) Return scipy.sparse.coo_matrix representation(s).
return_gt_graph([features, relations, …]) Return a graph_tool.Graph representation.
return_nx_graph([features, relations, dropna]) Return a networkx.DiGraph representation.
return_nx_multigraph([features, relations, …]) Return a networkx.MultiDiGraph representation.
update_edges() After removing nodes in v, update e.

Attributes

f Types of features and number of features of corresponding type.
m The number of edges
n The number of nodes
r Types of relations and number of relations of corresponding type.