DeepGraph - Analyze Data with Pandas-based Networks¶
Release: 0.2.3 Date: Jun 14, 2021 Code: GitHub
Contents¶
What is DeepGraph¶
DeepGraph is an open source Python implementation of a new network representation introduced here. Its purpose is to facilitate data analysis by interpreting data in terms of network theory.
The basis of this software package is Pandas, a fast and flexible data analysis tool for the Python programming language. Utilizing one of its primary data structures, the DataFrame, we represent objects (i.e. the nodes of a network) by one DataFrame, and their pairwise relations (i.e. the edges of a network) by another DataFrame.
One of the main features of DeepGraph is an efficient and scalable creation of
edges. Given a set of nodes in the form of a DataFrame (or an on disc
HDFStore),
DeepGraph’s core class
provides
methods to iteratively compute pairwise relations
between the nodes (e.g. similarity/distance measures) using arbitrary, user-defined
functions on the nodes’ features. These methods provide arguments to
parallelize the computation and control memory consumption, making them
suitable for very large data-sets and adjustable to whatever hardware you have
at hand (from netbooks to cluster architectures).
Furthermore, once a graph is constructed, DeepGraph allows you to partition its
nodes
,
edges
or the entire
graph
by the
graph’s properties and labels, enabling the aggregation, computation and
allocation of information on and between arbitrary groups of nodes. These
methods also let you express elaborate queries on the information contained in
a deep graph.
DeepGraph is not meant to replace or compete with already existing Python network libraries, such as NetworkX or graph_tool, but rather to combine and extend their capabilities with the merits of Pandas. For that matter, the core class of DeepGraph provides interfacing methods to convert to common network representations and graph objects of popular Python network packages.
Deepgraph also implements a number of useful plotting methods, including drawings on geographical map projections.
It’s also possible to represent multilayer networks by deep graphs. We’re thinking of implementing an interface to a suitable package dedicated to the analysis of multilayer networks.
Note
Please acknowledge and cite the use of this software and its authors when results are used in publications or published elsewhere. You can use the following BibTex entry
@Article{traxl-2016-deep, author = {Dominik Traxl AND Niklas Boers AND J\"urgen Kurths}, title = {Deep Graphs - A general framework to represent and analyze heterogeneous complex systems across scales}, journal = {Chaos}, year = {2016}, volume = {26}, number = {6}, eid = {065303}, doi = {http://dx.doi.org/10.1063/1.4952963}, eprinttype = {arxiv}, eprintclass = {physics.data-an, cs.SI, physics.ao-ph, physics.soc-ph}, eprint = {http://arxiv.org/abs/1604.00971v1}, version = {1}, date = {2016-04-04}, url = {http://arxiv.org/abs/1604.00971v1} }
To get started, have a look at
Want to share feedback, or contribute?
So far the package has only been developed by me, a fact that I would like to change very much. So if you feel like contributing in any way, shape or form, please feel free to contact me, report bugs, create pull requestes, milestones, etc. You can contact me via email: dominik.traxl@posteo.org
Note
This documentation assumes general familiarity with NumPy and Pandas. If you haven’t used these packages, do invest some time in learning about them first.
Note
DeepGraph is free software; you can redistribute it and/or modify it under the terms of the BSD License. We highly welcome contributions from the community.
Installation¶
Quick Install¶
DeepGraph can be installed via pip from PyPI
$ pip install deepgraph
Depending on your system, you may need root privileges. On UNIX-based operating systems (Linux, Mac OS X etc.) this is achieved with sudo
$ sudo pip install deepgraph
Alternatively, if you’re using Conda, install with
$ conda install -c conda-forge deepgraph
Installing from Source¶
Alternatively, you can install DeepGraph from source by downloading a source archive file (tar.gz or zip).
Source Archive File¶
- Download the source (tar.gz or zip file) from https://pypi.python.org/pypi/deepgraph/ or https://github.com/deepgraph/deepgraph/
- Unpack and change directory to the source directory (it should have the files README.rst and setup.py).
- Run
python setup.py install
to build and install. As a developer, you may want to install using cython:python setup.py install --use-cython
.- (Optional) Run
py.test
to execute the tests if you have pytest installed.
GitHub¶
Clone the deepgraph repostitory
Change directory to
deepgraph
Run
python setup.py install
to build and install. As a developer, you may want to install using cython:python setup.py install --use-cython
.(Optional) Run
py.test
to execute the tests if you have pytest installed.
Installing without Root Privileges¶
If you don’t have permission to install software on your system, you can
install into another directory using the --user
, --prefix
,
or --home
flags to setup.py.
For example
$ python setup.py install --prefix=/home/username/python
or
$ python setup.py install --home=~
or
$ python setup.py install --user
Note: If you didn’t install in the standard Python site-packages directory you will need to set your PYTHONPATH variable to the alternate location. See here for further details.
Requirements¶
The easiest way to get Python and the required/optional packages is to use Conda (or Miniconda), a cross-platform (Linux, Mac OS X, Windows) Python distribution for data analytics and scientific computing.
Pandas¶
Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
Pandas is the core dependency of DeepGraph, and it is highly recommended to install the recommended and optional dependencies of Pandas as well.
Recommended Packages¶
The following are recommended packages that DeepGraph can use to provide additional functionality.
Matplotlib¶
Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms.
Allows you to use the plotting methods of DeepGraph.
Matplotlib Basemap Toolkit¶
basemap is an add-on toolkit for matplotlib that lets you plot data on map projections with coastlines, lakes, rivers and political boundaries. See the basemap tutorial for documentation and examples of what it can do.
Used by plot_map
and
plot_map_generator
to plot networks on map
projections.
PyTables¶
PyTables is a package for managing hierarchical datasets and designed to efficiently and easily cope with extremely large amounts of data.
Necessary for HDF5-based storage of pandas DataFrames. DeepGraph’s
core class
may be initialized with a HDFStore
containing a node table in order to iteratively create edges directly from disc
(see create_edges
and
create_edges_ft
).
SciPy¶
SciPy is a Python-based ecosystem of open-source software for mathematics, science, and engineering.
Allows you to convert from DeepGraph’s network representation to sparse adjacency
matrices (see return_cs_graph
).
NetworkX¶
NetworkX is a Python language software package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.
Allows you to convert from DeepGraph’s network representation to NetworkX’s network
representation (see return_nx_graph
).
Graph-Tool¶
graph_tool is an efficient Python module for manipulation and statistical analysis of graphs (a.k.a. networks).
Allows you to convert from DeepGraph’s network representation to Graph-Tool’s
network representation (see return_gt_graph
).
Conda users can install graph_tool by adding the following channels to their ~/.condarc
$ conda config --add channels conda-forge
$ conda config --add channels ostrokach-forge
Then, install graph-tool
$ conda install graph-tool
You can test your graph-tool installation by
$ python -c "from graph_tool.all import *"
Optional Packages¶
The following packages are considered to provide very useful tools and methods.
Scikit-Learn¶
sklearn is a Python module integrating classical machine learning algorithms in the tightly-knit world of scientific Python packages (numpy, scipy, matplotlib).
Sklearn-pandas¶
sklearn-pandas provides a bridge between Scikit-Learn’s machine learning methods and pandas-style Data Frames.
Tutorials¶
10 Minutes to DeepGraph¶
[ipython notebook
] [python script
] [data
]
This is a short introduction to DeepGraph. In the following, we demonstrate DeepGraph’s core functionalities by a toy data-set, “flying balls”.
First of all, we need to import some packages
# for plots
import matplotlib.pyplot as plt
# the usual
import numpy as np
import pandas as pd
import deepgraph as dg
# notebook display
%matplotlib inline
plt.rcParams['figure.figsize'] = 8, 6
pd.options.display.max_rows = 10
pd.set_option('expand_frame_repr', False)
Loading Toy Data
Then, we need data in the form of a pandas DataFrame, representing the nodes of our graph
v = pd.read_csv('flying_balls.csv', index_col=0)
print(v)
time x y ball_id
0 0 1692.000000 0.000000 0
1 0 8681.000000 0.000000 1
2 0 490.000000 0.000000 2
3 0 7439.000000 0.000000 3
4 0 4998.000000 0.000000 4
... ... ... ... ...
1163 45 2812.552734 16.503178 39
1164 46 5686.915998 14.161693 10
1165 46 3161.729086 19.381823 14
1166 46 5594.233413 57.701712 37
1167 47 5572.216748 20.588750 37
[1168 rows x 4 columns]
The data consists of 1168 space-time measurements of 50 different toy
balls in two-dimensional space. Each space-time measurement (i.e. row of
v
) represents a node.
Let’s plot the data such that each ball has it’s own color
plt.scatter(v.x, v.y, s=v.time, c=v.ball_id)

Creating Edges¶
In order to create edges between these nodes, we now initiate a dg.DeepGraph
instance
g = dg.DeepGraph(v)
g
<DeepGraph object, with n=1168 node(s) and m=0 edge(s) at 0x7facf3b35dd8>
and use it to create edges between the nodes given by g.v
. For that matter, we may define a connector function
def x_dist(x_s, x_t):
dx = x_t - x_s
return dx
and pass it to g.create_edges
in order to compute the distance in the x-coordinate of each pair of nodes
g.create_edges(connectors=x_dist)
g
<DeepGraph object, with n=1168 node(s) and m=681528 edge(s) at 0x7facf3b35dd8>
print(g.e)
dx
s t
0 1 6989.000000
2 -1202.000000
3 5747.000000
4 3306.000000
5 2812.000000
... ...
1164 1166 -92.682585
1167 -114.699250
1165 1166 2432.504327
1167 2410.487662
1166 1167 -22.016665
[681528 rows x 1 columns]
Let’s say we’re only interested in creating edges between nodes with a x-distance smaller than 1000. Then we may additionally define a selector
def x_dist_selector(dx, sources, targets):
dxa = np.abs(dx)
sources = sources[dxa <= 1000]
targets = targets[dxa <= 1000]
return sources, targets
and pass both the connector and selector to g.create_edges
g.create_edges(connectors=x_dist, selectors=x_dist_selector)
g
<DeepGraph object, with n=1168 node(s) and m=156938 edge(s) at 0x7facf3b35dd8>
print(g.e)
dx
s t
0 6 416.000000
7 848.000000
19 -973.000000
24 437.000000
38 778.000000
... ...
1162 1167 -44.033330
1163 1165 349.176351
1164 1166 -92.682585
1167 -114.699250
1166 1167 -22.016665
[156938 rows x 1 columns]
There is, however, a much more efficient way of creating edges that involve a simple distance threshold such as the one above
Creating Edges on a FastTrack¶
In order to efficiently create edges including a selection of edges via a simple distance threshold as above, one should use the create_edges_ft
method. It relies on a sorted DataFrame, so we need to sort g.v
first
g.v.sort_values('x', inplace=True)
g.create_edges_ft(ft_feature=('x', 1000))
g
<DeepGraph object, with n=1168 node(s) and m=156938 edge(s) at 0x7facf3b35dd8>
Let’s compare the efficiency
%timeit -n3 -r3 g.create_edges(connectors=x_dist, selectors=x_dist_selector)
3 loops, best of 3: 557 ms per loop
%timeit -n3 -r3 g.create_edges_ft(ft_feature=('x', 1000))
3 loops, best of 3: 167 ms per loop
The create_edges_ft
method also accepts connectors and selectors as input. Let’s connect only those measurements that are close in space and time
def y_dist(y_s, y_t):
dy = y_t - y_s
return dy
def time_dist(time_t, time_s):
dt = time_t - time_s
return dt
def y_dist_selector(dy, sources, targets):
dya = np.abs(dy)
sources = sources[dya <= 100]
targets = targets[dya <= 100]
return sources, targets
def time_dist_selector(dt, sources, targets):
dta = np.abs(dt)
sources = sources[dta <= 1]
targets = targets[dta <= 1]
return sources, targets
g.create_edges_ft(ft_feature=('x', 100),
connectors=[y_dist, time_dist],
selectors=[y_dist_selector, time_dist_selector])
g
<DeepGraph object, with n=1168 node(s) and m=1899 edge(s) at 0x7facf3b35dd8>
print(g.e)
dt dy ft_r
s t
890 867 -1 19.311136 33.415831
867 843 -1 17.678482 33.415831
843 818 -1 16.045829 33.415831
818 792 -1 14.413176 33.415831
792 766 -1 12.780523 33.415831
... .. ... ...
244 203 -1 -10.825226 15.455612
203 159 -1 -12.457879 15.455612
159 114 -1 -14.090532 15.455612
114 65 -1 -15.723185 15.455612
65 16 -1 -17.355838 15.455612
[1899 rows x 3 columns]
We can now plot the flying balls and the edges we just created with the plot_2d
method
obj = g.plot_2d('x', 'y', edges=True,
kwds_scatter={'c': g.v.ball_id, 's': g.v.time})
obj['ax'].set_xlim(1000,3000)

Graph Partitioning¶
The DeepGraph
class also offers methods to partition nodes
, edges
and an entire graph
. See the docstrings and the other tutorials for details and examples.
Graph Interfaces¶
Furthermore, you may inspect the docstrings of return_cs_graph
, return_nx_graph
and return_gt_graph
to see how to convert from DeepGraph’s DataFrame representation of a network to sparse adjacency matrices, NetworkX’s network representation and graph_tool’s network representation.
Plotting Methods¶
DeepGraph also offers a number of useful Plotting methods. See plotting methods for details and have a look at the other tutorials for examples.
Computing Very Large Correlation Matrices in Parallel¶
[ipython notebook
] [python script
]
In this short tutorial, we’ll demonstrate how DeepGraph can be used to efficiently compute very large correlation matrices in parallel, with full control over RAM usage.
Assume you have a set of n_samples
samples, each comprised of n_features
features and you want to compute the Pearson correlation coefficients between all pairs of features (for the Spearman’s rank correlation coefficients, see the Note-box below). If your data is small enough, you may use scipy.stats.pearsonr or numpy.corrcoef, but for large data, neither of these methods is feasible. Scipy’s pearsonr would be very slow, since you’d have to compute pair-wise correlations in a double loop, and numpy’s corrcoef would most likely blow your RAM.
Using DeepGraph’s create_edges
method, you can compute all pair-wise correlations efficiently. In this tutorial, the data is stored on disc and only the relevant subset of features for each iteration will be loaded into memory by the computing nodes. Parallelization is achieved by using python’s standard library multiprocessing, but it should be straight-forward to modify the code to accommodate other parallelization libraries. It should also be straight-forward to modify the code in order to compute other correlation/distance/similarity-measures between a set of features.
First of all, we need to import some packages
# data i/o
import os
# compute in parallel
from multiprocessing import Pool
# the usual
import numpy as np
import pandas as pd
import deepgraph as dg
Let’s create a set of variables and store it as a 2d-matrix X
(shape=(n_features, n_samples)
) on disc. To speed up the computation
of the correlation coefficients later on, we whiten each variable.
# create observations
from numpy.random import RandomState
prng = RandomState(0)
n_features = int(5e3)
n_samples = int(1e2)
X = prng.randint(100, size=(n_features, n_samples)).astype(np.float64)
# uncomment the next line to compute ranked variables for Spearman's correlation coefficients
# X = X.argsort(axis=1).argsort(axis=1)
# whiten variables for fast parallel computation later on
X = (X - X.mean(axis=1, keepdims=True)) / X.std(axis=1, keepdims=True)
# save in binary format
np.save('samples', X)
Note
On the computation of the Spearman’s rank correlation coefficients: Since the Spearman correlation coefficient is defined as the Pearson correlation coefficient between the ranked variables, it suffices to uncomment the indicated line in the above code-block in order to compute the Spearman’s rank correlation coefficients in the following.
Now we can compute the pair-wise correlations using DeepGraph’s create_edges
method. Note that the node table v
only stores references to the mem-mapped array containing the samples.
# parameters (change these to control RAM usage)
step_size = 1e5
n_processes = 100
# load samples as memory-map
X = np.load('samples.npy', mmap_mode='r')
# create node table that stores references to the mem-mapped samples
v = pd.DataFrame({'index': range(X.shape[0])})
# connector function to compute pairwise pearson correlations
def corr(index_s, index_t):
features_s = X[index_s]
features_t = X[index_t]
corr = np.einsum('ij,ij->i', features_s, features_t) / n_samples
return corr
# index array for parallelization
pos_array = np.array(np.linspace(0, n_features*(n_features-1)//2, n_processes), dtype=int)
# parallel computation
def create_ei(i):
from_pos = pos_array[i]
to_pos = pos_array[i+1]
# initiate DeepGraph
g = dg.DeepGraph(v)
# create edges
g.create_edges(connectors=corr, step_size=step_size,
from_pos=from_pos, to_pos=to_pos)
# store edge table
g.e.to_pickle('tmp/correlations/{}.pickle'.format(str(i).zfill(3)))
# computation
if __name__ == '__main__':
os.makedirs("tmp/correlations", exist_ok=True)
indices = np.arange(0, n_processes - 1)
p = Pool()
for _ in p.imap_unordered(create_ei, indices):
pass
Let’s collect the computed correlation values and store them in an hdf file.
# store correlation values
files = os.listdir('tmp/correlations/')
files.sort()
store = pd.HDFStore('e.h5', mode='w')
for f in files:
et = pd.read_pickle('tmp/correlations/{}'.format(f))
store.append('e', et, format='t', data_columns=True, index=False)
store.close()
Let’s have a quick look at the correlations.
# load correlation table
e = pd.read_hdf('e.h5')
print(e)
corr
s t
0 1 -0.006066
2 0.094063
3 -0.025529
4 0.074080
5 0.035490
6 0.005221
7 0.032064
8 0.000378
9 -0.049318
10 -0.084853
11 0.026407
12 0.028543
13 -0.013347
14 -0.180113
15 0.151164
16 -0.094398
17 -0.124582
18 -0.000781
19 -0.044138
20 -0.193609
21 0.003877
22 0.048305
23 0.006477
24 -0.021291
25 -0.070756
26 -0.014906
27 -0.197605
28 -0.103509
29 0.071503
30 0.120718
... ...
4991 4998 -0.012007
4999 -0.252836
4992 4993 0.202024
4994 -0.046088
4995 -0.028314
4996 -0.052319
4997 -0.010797
4998 -0.025321
4999 -0.093721
4993 4994 -0.027568
4995 0.045602
4996 -0.102075
4997 0.035370
4998 -0.069946
4999 -0.031208
4994 4995 0.108063
4996 0.144441
4997 0.078353
4998 -0.024799
4999 -0.026432
4995 4996 -0.019991
4997 -0.178458
4998 -0.162406
4999 0.102835
4996 4997 0.115812
4998 -0.061167
4999 0.018606
4997 4998 -0.151932
4999 -0.271358
4998 4999 0.106453
[12497500 rows x 1 columns]
And finally, let’s see where most of the computation time is spent.
g = dg.DeepGraph(v)
p = %prun -r g.create_edges(connectors=corr, step_size=step_size)
p.print_stats(20)
244867 function calls (239629 primitive calls) in 14.193 seconds
Ordered by: internal time
List reduced from 541 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
250 9.355 0.037 9.361 0.037 memmap.py:334(__getitem__)
125 1.584 0.013 1.584 0.013 {built-in method numpy.core.multiarray.c_einsum}
125 1.012 0.008 12.013 0.096 deepgraph.py:4558(map)
2 0.581 0.290 0.581 0.290 {method 'get_labels' of 'pandas._libs.hashtable.Int64HashTable' objects}
1 0.301 0.301 0.414 0.414 multi.py:795(_engine)
5 0.157 0.031 0.157 0.031 {built-in method numpy.core.multiarray.concatenate}
250 0.157 0.001 0.170 0.001 internals.py:5017(_stack_arrays)
2 0.105 0.053 0.105 0.053 {pandas._libs.algos.take_1d_int64_int64}
889 0.094 0.000 0.094 0.000 {method 'reduce' of 'numpy.ufunc' objects}
125 0.089 0.001 12.489 0.100 deepgraph.py:5294(_select_and_return)
125 0.074 0.001 0.074 0.001 {deepgraph._triu_indices._reduce_triu_indices}
125 0.066 0.001 0.066 0.001 {built-in method deepgraph._triu_indices._triu_indices}
4 0.038 0.009 0.038 0.009 {built-in method pandas._libs.algos.ensure_int16}
125 0.033 0.000 10.979 0.088 <ipython-input-3-26c4f59cd911>:12(corr)
2 0.028 0.014 0.028 0.014 function_base.py:4703(delete)
1 0.027 0.027 14.163 14.163 deepgraph.py:4788(_matrix_iterator)
1 0.027 0.027 0.113 0.113 multi.py:56(_codes_to_ints)
45771/45222 0.020 0.000 0.043 0.000 {built-in method builtins.isinstance}
1 0.019 0.019 14.193 14.193 deepgraph.py:183(create_edges)
2 0.012 0.006 0.700 0.350 algorithms.py:576(factorize)
As you can see, most of the time is spent by getting the requested features in the corr-function, followed by computing the correlation values themselves.
Building a DeepGraph of Extreme Precipitation¶
[ipython notebook
] [python script
] [data
]
In the following we build a deep graph of a high-resolution dataset of precipitation measurements.
The goal is to first detect spatiotemporal clusters of extreme precipitation events and then to create families of these clusters based on a spatial correlation measure. Finally, we create and plot some informative (intersection) partitions of the deep graph.
For further details see section V of the original paper: https://arxiv.org/abs/1604.00971
First of all, we need to import some packages
# data i/o
import os
import xarray
# for plots
import matplotlib.pyplot as plt
# the usual
import numpy as np
import pandas as pd
import deepgraph as dg
# notebook display
from IPython.display import HTML
%matplotlib inline
plt.rcParams['figure.figsize'] = 8, 6
pd.options.display.max_rows = 10
pd.set_option('expand_frame_repr', False)
Selecting and Preprocessing the Precipitation Data¶
Selection¶
If you want to select your own spatiotemporal box of precipitation events, you may follow the instructions below and change the filename in the next box of code.
- Go to https://disc.gsfc.nasa.gov/datasets/TRMM_3B42_V7/summary?keywords=TRMM_3B42_V7
- click on “Simple Subset Wizard”
- select the “Date Range” (and if desired a “Spatial Bounding Box”) you’re interested in
- click on “Search for Data Sets”
- expand the list by clicking on the “+” symbol
- mark the check box “precipitation”
- (optional, but recommended) click on the selector to change from “netCDF” to “gzipped netCDF”
- click on “Subset Selected Data Sets”
- click on “View Subset Results”
- right click on the “Get list of URLs for this subset in a file” link, and choose “Save Link As…”
- the downloaded file will have a name similar to “SSW_download_2016-05-03T20_19_28_23621_2oIe06xp.inp”. Note which directory the downloaded file is saved to, and in your Unix shell, set your current working directory to that directory.
- register an account to get authentication credentials using these instructions: https://disc.gsfc.nasa.gov/information/howto/5761bc6a5ad5a18811681bae?keywords=wget
- get the files via
os.system("wget --content-disposition --directory-prefix=tmp --load-cookies ~/.urs_cookies --save-cookies ~/.urs_cookies --auth-no-challenge=on --keep-session-cookies -i SSW_download_2016-05-03T20_19_28_23621_2oIe06xp.inp")
Preprocessing¶
Next, we need to convert the downloaded netCDF files to a pandas DataFrame, which we can then use to initiate a dg.DeepGraph
# choose "wet times" threshold
r = .1
# choose "extreme" precipitation threshold
p = .9
v_list = []
for file in os.listdir('tmp'):
if file.startswith('3B42.'):
# open the downloaded netCDF file
# unfortunately, we have to decode times ourselves, since
# the format of the downloaded files doesn't work
# see also: https://github.com/pydata/xarray/issues/521
f = xarray.open_dataset('tmp/{}'.format(file), decode_times=False)
# create integer-based (x,y) coordinates
f['x'] = (('longitude'), np.arange(len(f.longitude)))
f['y'] = (('latitude'), np.arange(len(f.latitude)))
# convert to pd.DataFrame
vt = f.to_dataframe()
# we only consider "wet times", pcp >= 0.1mm/h
vt = vt[vt.pcp >= r]
# reset index
vt.reset_index(inplace=True)
# add correct times
ftime = f.time.units.split()[2:]
year, month, day = ftime[0].split('-')
hour = ftime[1]
time = pd.datetime(int(year), int(month), int(day), int(hour))
vt['time'] = time
# compute "area" for each event
vt['area'] = 111**2 * .25**2 * np.cos(2*np.pi*vt.latitude / 360.)
# compute "volume of water precipitated" for each event
vt['vol'] = vt.pcp * 3 * vt.area
# set dtypes -> economize ram
vt['pcp'] = vt['pcp'] * 100
vt['pcp'] = vt['pcp'].astype(np.uint16)
vt['latitude'] = vt['latitude'].astype(np.float16)
vt['longitude'] = vt['longitude'].astype(np.float16)
vt['area'] = vt['area'].astype(np.uint16)
vt['vol'] = vt['vol'].astype(np.uint32)
vt['x'] = vt['x'].astype(np.uint16)
vt['y'] = vt['y'].astype(np.uint16)
# append to list
v_list.append(vt)
f.close()
# concatenate the DataFrames
v = pd.concat(v_list)
# append a column indicating geographical locations (i.e., supernode labels)
v['g_id'] = v.groupby(['longitude', 'latitude']).grouper.group_info[0]
v['g_id'] = v['g_id'].astype(np.uint32)
# select `p`th percentile of precipitation events for each geographical location
v = v.groupby('g_id').apply(lambda x: x[x.pcp >= x.pcp.quantile(p)])
# append integer-based time
dtimes = pd.date_range(v.time.min(), v.time.max(), freq='3H')
dtdic = {dtime: itime for itime, dtime in enumerate(dtimes)}
v['itime'] = v.time.apply(lambda x: dtdic[x])
v['itime'] = v['itime'].astype(np.uint16)
# sort by time
v.sort_values('time', inplace=True)
# set unique node index
v.set_index(np.arange(len(v)), inplace=True)
# shorten column names
v.rename(columns={'pcp': 'r',
'latitude': 'lat',
'longitude': 'lon',
'time': 'dtime',
'itime': 'time'},
inplace=True)
The created DataFrame of extreme precipitation measurements looks like this
print(v)
lat lon dtime r x y area vol g_id time
0 15.125 -118.125 2004-08-20 1084 28 101 743 24174 5652 0
1 44.875 -30.625 2004-08-20 392 378 220 545 6433 85341 0
2 45.125 -30.625 2004-08-20 454 378 221 543 7416 85342 0
3 45.375 -30.625 2004-08-20 909 378 222 540 14767 85343 0
4 45.625 -30.625 2004-08-20 907 378 223 538 14669 85344 0
... ... ... ... ... ... ... ... ... ... ...
382306 26.875 -46.625 2004-09-27 503 314 148 686 10385 70380 304
382307 38.375 -37.125 2004-09-27 453 352 194 603 8222 79095 304
382308 8.125 -105.125 2004-09-27 509 80 73 762 11663 17007 304
382309 21.875 -42.875 2004-09-27 260 329 128 714 5595 73875 304
382310 6.625 -111.125 2004-09-27 192 56 67 764 4428 11790 304
[382311 rows x 10 columns]
We identify each row of this table as a node of our DeepGraph
g = dg.DeepGraph(v)
Plot the Data¶
Let’s take a look at the data by creating a video of the time-evolution of precipitation measurements. Using the plot_map_generator
method, this is straight forward.
# configure map projection
kwds_basemap = {'llcrnrlon': v.lon.min() - 1,
'urcrnrlon': v.lon.max() + 1,
'llcrnrlat': v.lat.min() - 1,
'urcrnrlat': v.lat.max() + 1,
'resolution': 'i'}
# configure scatter plots
kwds_scatter = {'s': 1.5,
'c': g.v.r.values / 100.,
'edgecolors': 'none',
'cmap': 'viridis_r'}
# create generator of scatter plots on map
objs = g.plot_map_generator('lon', 'lat', 'dtime',
kwds_basemap=kwds_basemap,
kwds_scatter=kwds_scatter)
# plot and store frames
for i, obj in enumerate(objs):
# configure plots
cb = obj['fig'].colorbar(obj['pc'], fraction=0.025, pad=0.01)
cb.set_label('[mm/h]')
obj['m'].fillcontinents(color='0.2', zorder=0, alpha=.4)
obj['ax'].set_title('{}'.format(obj['group']))
# store and close
obj['fig'].savefig('tmp/pcp_{:03d}.png'.format(i),
dpi=300, bbox_inches='tight')
plt.close(obj['fig'])
# create video with ffmpeg
cmd = "ffmpeg -y -r 5 -i tmp/pcp_%03d.png -c:v libx264 -r 20 -vf scale=2052:1004 {}.mp4"
os.system(cmd.format('precipitation_files/pcp'))
# embed video
HTML("""
<video width="700" height="350" controls>
<source src="precipitation_files/pcp.mp4" type="video/mp4">
</video>
""")
Detecting SpatioTemporal Clusters of Extreme Precipitation¶
In this tutorial, we’re interested in local formations of spatiotemporal clusters of extreme precipitation events. For that matter, we now use DeepGraph to identify such clusters and track their temporal evolution.
Create Edges¶
We now use DeepGraph to create edges between the nodes given by g.v
.
The edges of g
will be utilized to detect spatiotemporal clusters in
the data, or in more technical terms: to partition the set of all nodes
into subsets of connected grid points. One can imagine the nodes to be
elements of a \(3\) dimensional grid box (x,y,time), where we allow
every node to have \(26\) possible neighbours (\(8\) neighbours
in the time slice of the measurement, \(t_i\), and \(9\)
neighbours in each the time slice \(t_i − 1\) and \(t_i + 1\)).
For that matter, we pass the following connectors
def grid_2d_dx(x_s, x_t):
dx = x_t - x_s
return dx
def grid_2d_dy(y_s, y_t):
dy = y_t - y_s
return dy
and selectors
def s_grid_2d_dx(dx, sources, targets):
dxa = np.abs(dx)
sources = sources[dxa <= 1]
targets = targets[dxa <= 1]
return sources, targets
def s_grid_2d_dy(dy, sources, targets):
dya = np.abs(dy)
sources = sources[dya <= 1]
targets = targets[dya <= 1]
return sources, targets
to the create_edges_ft
method
g.create_edges_ft(ft_feature=('time', 1),
connectors=[grid_2d_dx, grid_2d_dy],
selectors=[s_grid_2d_dx, s_grid_2d_dy],
r_dtype_dic={'ft_r': np.bool,
'dx': np.int8,
'dy': np.int8},
logfile='create_e',
max_pairs=1e7)
# rename fast track relation
g.e.rename(columns={'ft_r': 'dt'}, inplace=True)
To see how many nodes and edges our graph’s comprised of, one may simply type
g
<DeepGraph object, with n=382311 node(s) and m=567225 edge(s) at 0x7f7a4c3de160>
The edges we just created look like this
print(g.e)
dx dy dt
s t
0 1362 0 1 False
1432 1 0 False
1433 1 1 False
1696 1 0 True
1699 1 1 True
... .. .. ...
382284 382291 0 1 False
382295 382296 0 1 False
382296 382299 0 1 False
382299 382309 0 1 False
382304 382306 0 1 False
[567225 rows x 3 columns]
Logfile Plot
To see how long it took to create the edges, one may use the plot_logfile
method
g.plot_logfile('create_e')

Find the Connected Components¶
Having linked all neighbouring nodes, the spatiotemporal clusters can be identified as the connected components of the graph. For practical reasons, DeepGraph
directly implements a method to find the connected components of a graph, append_cp
# all singular components (components comprised of one node only)
# are consolidated under the label 0
g.append_cp(consolidate_singles=True)
# we don't need the edges any more
del g.e
the node table now has a component membership column appended
print(g.v)
lat lon dtime r x y area vol g_id time cp
0 15.125 -118.125 2004-08-20 1084 28 101 743 24174 5652 0 865
1 44.875 -30.625 2004-08-20 392 378 220 545 6433 85341 0 5079
2 45.125 -30.625 2004-08-20 454 378 221 543 7416 85342 0 5079
3 45.375 -30.625 2004-08-20 909 378 222 540 14767 85343 0 5079
4 45.625 -30.625 2004-08-20 907 378 223 538 14669 85344 0 5079
... ... ... ... ... ... ... ... ... ... ... ...
382306 26.875 -46.625 2004-09-27 503 314 148 686 10385 70380 304 609
382307 38.375 -37.125 2004-09-27 453 352 194 603 8222 79095 304 0
382308 8.125 -105.125 2004-09-27 509 80 73 762 11663 17007 304 174
382309 21.875 -42.875 2004-09-27 260 329 128 714 5595 73875 304 8
382310 6.625 -111.125 2004-09-27 192 56 67 764 4428 11790 304 15610
[382311 rows x 11 columns]
Let’s see how many spatiotemporal clusters g
is comprised of
(discarding singular components)
g.v.cp.max()
33169
and how many nodes there are in the components
print(g.v.cp.value_counts())
0 64678
1 16460
2 8519
3 6381
4 3403
...
29601 2
27554 2
25507 2
23460 2
20159 2
Name: cp, dtype: int64
Partition the Nodes Into a Component Supernode Table¶
In order to aggregate and compute some information about the precipitiation clusters, we now partition the nodes by the type of feature cp
, the component membership labels of the nodes just created. This can be done with the partition_nodes
method
# feature functions, will be applied to each component of g
feature_funcs = {'dtime': [np.min, np.max],
'time': [np.min, np.max],
'vol': [np.sum],
'lat': [np.mean],
'lon': [np.mean]}
# partition the node table
cpv, gv = g.partition_nodes('cp', feature_funcs, return_gv=True)
# append geographical id sets
cpv['g_ids'] = gv['g_id'].apply(set)
# append cardinality of g_id sets
cpv['n_unique_g_ids'] = cpv['g_ids'].apply(len)
# append time spans
cpv['dt'] = cpv['dtime_amax'] - cpv['dtime_amin']
# append spatial coverage
def area(group):
return group.drop_duplicates('g_id').area.sum()
cpv['area'] = gv.apply(area)
The clusters look like this
print(cpv)
n_nodes dtime_amin dtime_amax time_amin time_amax lat_mean vol_sum lon_mean g_ids n_unique_g_ids dt area
cp
0 64678 2004-08-20 00:00:00 2004-09-27 00:00:00 0 304 17.609375 627097323 -63.40625 {0, 1, 2, 6, 7, 10, 12, 13, 14, 22, 23, 24, 25... 49808 38 days 00:00:00 34781178
1 16460 2004-09-01 06:00:00 2004-09-17 18:00:00 98 230 17.281250 351187150 -65.12500 {65536, 65537, 65538, 65539, 65540, 65541, 655... 6629 16 days 12:00:00 4803624
2 8519 2004-09-17 03:00:00 2004-09-24 15:00:00 225 285 26.906250 133698579 -44.62500 {73728, 73729, 73730, 73731, 73732, 73733, 737... 3730 7 days 12:00:00 2507350
3 6381 2004-08-26 09:00:00 2004-09-06 03:00:00 51 137 21.062500 113782748 -64.12500 {65555, 65556, 65557, 65558, 65559, 65560, 655... 2442 10 days 18:00:00 1749673
4 3403 2004-08-21 21:00:00 2004-08-24 12:00:00 15 36 10.578125 66675326 -111.93750 {8141, 14654, 11805, 16363, 8142, 11806, 20490... 1294 2 days 15:00:00 978604
... ... ... ... ... ... ... ... ... ... ... ... ...
33165 2 2004-08-23 18:00:00 2004-08-23 18:00:00 30 30 15.500000 20212 -103.87500 {18115, 18116} 2 0 days 00:00:00 1483
33166 2 2004-09-05 18:00:00 2004-09-05 18:00:00 134 134 27.250000 9366 -121.87500 {2688, 2687} 2 0 days 00:00:00 1368
33167 2 2004-08-30 15:00:00 2004-08-30 15:00:00 85 85 9.250000 43096 0.62500 {112116, 112117} 2 0 days 00:00:00 1519
33168 2 2004-09-09 03:00:00 2004-09-09 03:00:00 161 161 6.750000 24156 -13.62500 {100613, 100614} 2 0 days 00:00:00 1528
33169 2 2004-09-11 03:00:00 2004-09-11 03:00:00 177 177 15.500000 46798 -16.12500 {98523, 98524} 2 0 days 00:00:00 1483
[33170 rows x 12 columns]
Plot the Largest Component¶
Let’s see how the largest cluster of extreme precipitation evolves over time, again using the plot_map_generator
method
# temporary DeepGraph instance containing
# only the largest component
gt = dg.DeepGraph(g.v)
gt.filter_by_values_v('cp', 1)
# configure map projection
from mpl_toolkits.basemap import Basemap
m1 = Basemap(projection='ortho',
lon_0=cpv.loc[1].lon_mean + 12,
lat_0=cpv.loc[1].lat_mean + 8,
resolution=None)
width = (m1.urcrnrx - m1.llcrnrx) * .65
height = (m1.urcrnry - m1.llcrnry) * .45
kwds_basemap = {'projection': 'ortho',
'lon_0': cpv.loc[1].lon_mean + 12,
'lat_0': cpv.loc[1].lat_mean + 8,
'llcrnrx': -0.5 * width,
'llcrnry': -0.5 * height,
'urcrnrx': 0.5 * width,
'urcrnry': 0.5 * height,
'resolution': 'i'}
# configure scatter plots
kwds_scatter = {'s': 2,
'c': np.log(gt.v.r.values / 100.),
'edgecolors': 'none',
'cmap': 'viridis_r'}
# create generator of scatter plots on map
objs = gt.plot_map_generator('lon', 'lat', 'dtime',
kwds_basemap=kwds_basemap,
kwds_scatter=kwds_scatter)
# plot and store frames
for i, obj in enumerate(objs):
# configure plots
obj['m'].fillcontinents(color='0.2', zorder=0, alpha=.4)
obj['m'].drawparallels(range(-50, 50, 20), linewidth=.2)
obj['m'].drawmeridians(range(0, 360, 20), linewidth=.2)
obj['ax'].set_title('{}'.format(obj['group']))
# store and close
obj['fig'].savefig('tmp/cp1_ortho_{:03d}.png'.format(i),
dpi=300, bbox_inches='tight')
plt.close(obj['fig'])
# create video with ffmpeg
cmd = "ffmpeg -y -r 5 -i tmp/cp1_ortho_%03d.png -c:v libx264 -r 20 -vf scale=1919:1406 {}.mp4"
os.system(cmd.format('precipitation_files/cp1_ortho'))
# embed video
HTML("""
<video width="700" height="500" controls>
<source src="precipitation_files/cp1_ortho.mp4" type="video/mp4">
</video>
""")
Create and Plot Informative (Intersection) Partitions¶
In this last section, we create some useful (intersection) partitions of the deep graph, which we then use to create some plots.
Geographical Locations¶
# how many components have hit a certain
# geographical location (discarding singular cps)
def count(cp):
return len(set(cp[cp != 0]))
# feature functions, will be applied to each g_id
feature_funcs = {'cp': [count],
'vol': [np.sum],
'lat': np.min,
'lon': np.min}
gv = g.partition_nodes('g_id', feature_funcs)
gv.rename(columns={'lat_amin': 'lat',
'lon_amin': 'lon'}, inplace=True)
print(gv)
n_nodes cp_count lat vol_sum lon
g_id
0 2 1 -10.125 10142 -125.125
1 2 1 -9.875 8716 -125.125
2 2 0 -9.625 4372 -125.125
3 2 2 -9.375 5310 -125.125
4 2 2 -9.125 6409 -125.125
... ... ... ... ... ...
115618 2 1 48.875 14319 5.125
115619 1 1 49.125 10129 5.125
115620 2 1 49.375 12826 5.125
115621 2 2 49.625 9117 5.125
115622 2 1 49.875 12101 5.125
[115623 rows x 5 columns]
cols = {'n_nodes': gv.n_nodes,
'vol sum': gv.vol_sum,
'cp count': gv.cp_count}
for name, col in cols.items():
# for easy filtering, we create a new DeepGraph instance for
# each component
gt = dg.DeepGraph(gv)
# configure map projection
kwds_basemap = {'llcrnrlon': v.lon.min() - 1,
'urcrnrlon': v.lon.max() + 1,
'llcrnrlat': v.lat.min() - 1,
'urcrnrlat': v.lat.max() + 1}
# configure scatter plots
kwds_scatter = {'s': 1,
'c': col.values,
'cmap': 'viridis_r',
'alpha': .5,
'edgecolors': 'none'}
# create scatter plot on map
obj = gt.plot_map(lon='lon', lat='lat',
kwds_basemap=kwds_basemap,
kwds_scatter=kwds_scatter)
# configure plots
obj['m'].drawcoastlines(linewidth=.8)
obj['m'].drawparallels(range(-50, 50, 20), linewidth=.2)
obj['m'].drawmeridians(range(0, 360, 20), linewidth=.2)
obj['ax'].set_title(name)
# colorbar
cb = obj['fig'].colorbar(obj['pc'], fraction=.022, pad=.02)
cb.set_label('{}'.format(name), fontsize=15)



Geographical Locations and Families¶
In order to create the intersection partition of geographical locations
and families, we first need to append a family membership column to
v
# create F col
v['F'] = np.ones(len(v), dtype=int) * -1
gcpv = cpv.groupby('F')
it = gcpv.apply(lambda x: x.index.values)
for F in range(len(it)):
cp_index = v.cp.isin(it.iloc[F])
v.loc[cp_index, 'F'] = F
Then we create the intersection partition
# feature funcs
def n_cp_nodes(cp):
return len(cp.unique())
feature_funcs = {'vol': [np.sum],
'lat': np.min,
'lon': np.min,
'cp': n_cp_nodes}
# create family-g_id intersection graph
fgv = g.partition_nodes(['F', 'g_id'], feature_funcs=feature_funcs)
fgv.rename(columns={'lat_amin': 'lat',
'lon_amin': 'lon',
'cp_n_cp_nodes': 'n_cp_nodes'}, inplace=True)
which looks like this
print(fgv)
n_nodes n_cp_nodes lat vol_sum lon
F g_id
-1 0 2 2 -10.125 10142 -125.125
1 2 2 -9.875 8716 -125.125
2 2 1 -9.625 4372 -125.125
3 2 2 -9.375 5310 -125.125
4 2 2 -9.125 6409 -125.125
... ... ... ... ... ...
998 26685 1 1 -8.875 593 -93.625
26686 1 1 -8.625 411 -93.625
26887 1 1 -9.375 364 -93.375
26888 1 1 -9.125 478 -93.375
26889 1 1 -8.875 456 -93.375
[186903 rows x 5 columns]
families = [0,1,2,3]
for F in families:
# for easy filtering, we create a new DeepGraph instance for
# each component
gt = dg.DeepGraph(fgv.loc[F])
# configure map projection
kwds_basemap = {'llcrnrlon': v.lon.min() - 1,
'urcrnrlon': v.lon.max() + 1,
'llcrnrlat': v.lat.min() - 1,
'urcrnrlat': v.lat.max() + 1}
# configure scatter plots
kwds_scatter = {'s': 1,
'c': gt.v.n_cp_nodes.values,
'cmap': 'viridis_r',
'edgecolors': 'none'}
# create scatter plot on map
obj = gt.plot_map(
lat='lat', lon='lon',
kwds_basemap=kwds_basemap, kwds_scatter=kwds_scatter)
# configure plots
obj['m'].drawcoastlines(linewidth=.8)
obj['m'].drawparallels(range(-50, 50, 20), linewidth=.2)
obj['m'].drawmeridians(range(0, 360, 20), linewidth=.2)
cb = obj['fig'].colorbar(obj['pc'], fraction=.022, pad=.02)
cb.set_label('n_cps', fontsize=15)
obj['ax'].set_title('Family {}'.format(F))




Geographical Locations and Components¶
# feature functions, will be applied on each [g_id, cp] group of g
feature_funcs = {'vol': [np.sum],
'lat': np.min,
'lon': np.min}
# create gcpv
gcpv = g.partition_nodes(['cp', 'g_id'], feature_funcs)
gcpv.rename(columns={'lat_amin': 'lat',
'lon_amin': 'lon'}, inplace=True)
print(gcpv)
n_nodes lat vol_sum lon
cp g_id
0 0 1 -10.125 5071 -125.125
1 1 -9.875 4415 -125.125
2 2 -9.625 4372 -125.125
6 3 -8.375 1026 -125.125
7 1 -8.125 594 -125.125
... ... ... ... ...
33167 112117 1 9.375 24618 0.625
33168 100613 1 6.625 11450 -13.625
100614 1 6.875 12706 -13.625
33169 98523 1 15.375 31057 -16.125
98524 1 15.625 15741 -16.125
[287301 rows x 4 columns]
# select the components to plot
comps = [1, 2, 3, 4]
fig, axs = plt.subplots(2, 2, figsize=[10,8])
axs = axs.flatten()
for comp, ax in zip(comps, axs):
# for easy filtering, we create a new DeepGraph instance for
# each component
gt = dg.DeepGraph(gcpv[gcpv.index.get_level_values('cp') == comp])
# configure map projection
kwds_basemap = {'projection': 'ortho',
'lon_0': cpv.loc[comp].lon_mean,
'lat_0': cpv.loc[comp].lat_mean,
'resolution': 'c'}
# configure scatter plots
kwds_scatter = {'s': .5,
'c': gt.v.vol_sum.values,
'cmap': 'viridis_r',
'edgecolors': 'none'}
# create scatter plot on map
obj = gt.plot_map(lon='lon', lat='lat',
kwds_basemap=kwds_basemap,
kwds_scatter=kwds_scatter,
ax=ax)
# configure plots
obj['m'].fillcontinents(color='0.2', zorder=0, alpha=.2)
obj['m'].drawparallels(range(-50, 50, 20), linewidth=.2)
obj['m'].drawmeridians(range(0, 360, 20), linewidth=.2)
obj['ax'].set_title('cp {}'.format(comp))

From Multilayer Networks to Deep Graphs¶
[ipython notebook
] [python script
]
In this tutorial we exemplify the representation of multilayer networks (MLNs) by deep graphs and demonstrate some of the advantages of deepgraph’s network representation.
We start by converting the Noordin Top Terrorist MLN into a graph g
- comprised of two DataFrames, a node table g.v
and an edge table g.e
- that corresponds to the supra-graph representation of the multilayer network.
We then partition the graph g
by the information attributed to its layers, resulting in different supergraphs on the partition lattice of g
that correpsond to different representations of a MLN (including its tensor representation).
In the next part, we demonstrate how additional information that might be at hand or computed during the analysis can be used to induce further supergraphs, or metaphorically speaking, how additional information corresponds to “hidden layers” of a MLN.
Finally, we briefly show how to use the nodes’ properties to partition the edges of a MLN.
** References **
For a short summary of the multilayer network representation, see Appendix C of the Deep Graphs paper.
For a more in-depth introduction to MLNs, I recommend the following papers:
- Multilayer Networks (review paper of MLNs)
- The Structure and Dynamics of Multilayer Networks (review paper of MLNs)
- Mathematical Formulation of Multilayer Networks (tensor formalism for MLNs)
For a discussion of how Deep Graphs relates to the multilayer network representation, see Sec. IV B and Appendix D of the Deep Graphs paper.
The Noordin Top Terrorist Data¶

[high-res version
] [python plot script
]
The data we use in this tutorial is the Noordin Top Terrorist Network, which has previously been represented as a multilayer network (e.g., http://arxiv.org/abs/1308.3182)
It includes relational data on 79 Indonesian terrorists belonging to the so-called Noordin Top Terrorist Network.
For information about the individual’s attributes and their relations, see http://www.thearda.com/archive/files/codebooks/origCB/Noordin%20Subset%20Codebook.pdf and http://arxiv.org/pdf/1308.3182v3.pdf.
Preprocessing¶
We download the data from here, and process it into two pandas DataFrames, a node table and an edge table. The preprocessing is quite lengthy, so you might want to proceed directly to the next section.
First of all, we need to import some packages
# data i/o
import os
import subprocess
import zipfile
# for plots
import matplotlib.pyplot as plt
# the usual
import numpy as np
import pandas as pd
import deepgraph as dg
# notebook display
%matplotlib inline
pd.options.display.max_rows = 10
pd.set_option('expand_frame_repr', False)
Preprocessing the Nodes¶
# zip file containing node attributes
os.makedirs("tmp", exist_ok=True)
get_nodes_zip = ("wget -O tmp/terrorist_nodes.zip "
"https://sites.google.com/site/sfeverton18/"
"research/appendix-1/Noordin%20Subset%20%28ORA%29.zip?"
"attredirects=0&d=1")
subprocess.call(get_nodes_zip.split())
# unzip
zf = zipfile.ZipFile('tmp/terrorist_nodes.zip')
zf.extract('Attributes.csv', path='tmp/')
zf.close()
# create node table
v = pd.read_csv('tmp/Attributes.csv')
v.rename(columns={'Unnamed: 0': 'Name'}, inplace=True)
# create a copy of all nodes for each layer (i.e., create "node-layers")
# there are 10 layers and 79 nodes on each layer
v = pd.concat(10*[v])
# add "aspect" as column to v
layer_names = ['Business', 'Communication', 'O Logistics', 'O Meetings',
'O Operations', 'O Training', 'T Classmates', 'T Friendship',
'T Kinship', 'T Soulmates']
layers = [[name]*79 for name in layer_names]
layers = [item for sublist in layers for item in sublist]
v['layer'] = layers
# set unique node index
v.reset_index(inplace=True)
v.rename(columns={'index': 'V_N'}, inplace=True)
# swap columns
cols = list(v)
cols[1], cols[10] = cols[10], cols[1]
v = v[cols]
# get rid of the attribute columns for demonstrational purposes,
# will be inserted again later
v, vinfo = v.iloc[:, :2], v.iloc[:, 2:]
Preprocessing the Edges¶
# paj file containing edges for different layers
get_paj = ("wget -O tmp/terrorists.paj "
"https://sites.google.com/site/sfeverton18/"
"research/appendix-1/Noordin%20Subset%20%28Pajek%29.paj?"
"attredirects=0&d=1")
subprocess.call(get_paj.split())
# get data blocks from paj file
with open('tmp/terrorists.paj') as txtfile:
comments = []
data = []
part = []
for line in txtfile:
if line.startswith('*'):
# comment lines
comment = line
comments.append(comment)
if part:
data.append(part)
part = []
else:
# vertices
if comment.startswith('*Vertices') and len(line.split()) > 1:
sublist = line.split('"')
sublist = sublist[:2] + sublist[-1].split()
part.append(sublist)
# edges or partitions
elif not line.isspace():
part.append(line.split())
# append last block
data.append(part)
# extract edge tables from data blocks
ecomments = []
eparts = []
for i, c in enumerate(comments):
if c.startswith('*Network'):
del data[0]
elif c.startswith('*Partition'):
del data[0]
elif c.startswith('*Vector'):
del data[0]
elif c.startswith('*Arcs') or c.startswith('*Edges'):
ecomments.append(c)
eparts.append(data.pop(0))
# layer data parts (indices found manually via comments)
inds = [11, 10, 5, 6, 7, 8, 0, 1, 2, 3]
eparts = [eparts[ind] for ind in inds]
# convert to DataFrames
layer_frames = []
for name, epart in zip(layer_names, eparts):
frame = pd.DataFrame(epart, dtype=np.int16)
# get rid of self-loops, bidirectional edges
frame = frame[frame[0] < frame[1]]
# rename columns
frame.rename(columns={0: 's', 1: 't', 2: name}, inplace=True)
frame['s'] -= 1
frame['t'] -= 1
layer_frames.append(frame)
# set indices
for i, e in enumerate(layer_frames):
e['s'] += i*79
e['t'] += i*79
e.set_index(['s', 't'], inplace=True)
# concat the layers
e = pd.concat(layer_frames)
# edge table as described in the paper
e_paper = e.copy()
# alternative representation of e
e['type'] = 0
e['weight'] = 0
for layer in layer_names:
where = e[layer].notnull()
e.loc[where, 'type'] = layer
e.loc[where, 'weight'] = e.loc[where, layer]
e = e[['type', 'weight']]
DeepGraph’s Supra-Graph Representation of a MLN, \(G = (V, E)\)¶
Above, we have processed the downloaded data into a node table v
and
an edge table e
, that correspond to the supra-graph representation
of a multilayer network. This is the preferred representation of a MLN
by a deep graph, since all other representations are entailed in the
supra-graph’s partition lattice, as we will demonstrate below.
g = dg.DeepGraph(v, e)
print(g)
<DeepGraph object, with n=790 node(s) and m=1014 edge(s) at 0x7fb8e13499e8>
Let’s have a look at the node table first
print(g.v)
V_N layer
0 0 Business
1 1 Business
2 2 Business
3 3 Business
4 4 Business
.. ... ...
785 74 T Soulmates
786 75 T Soulmates
787 76 T Soulmates
788 77 T Soulmates
789 78 T Soulmates
[790 rows x 2 columns]
As you can see, there are 790 nodes in total. Each of the 10 layers,
print(g.v.layer.unique())
['Business' 'Communication' 'O Logistics' 'O Meetings' 'O Operations'
'O Training' 'T Classmates' 'T Friendship' 'T Kinship' 'T Soulmates']
is comprised of 79 nodes. Every node has a feature of type V_N
,
indicating the individual the node belongs to, and a feature of type
layer
, corresponding to the layer the node belongs to. Each of the
790 nodes corresponds to a node-layer of the MLN representation of this
data.
The edge table,
print(g.e)
type weight
s t
9 67 Business 2.0
69 Business 1.0
77 Business 1.0
11 61 Business 1.0
20 59 Business 1.0
... ... ...
733 769 T Soulmates 1.0
755 769 T Soulmates 1.0
787 T Soulmates 1.0
771 788 T Soulmates 1.0
783 788 T Soulmates 1.0
[1014 rows x 2 columns]
is comprised of 1014 edges between the nodes in v
. Each edge has two relations. The first relation (of type type
) is determined by the tuple of features \((layer_i, layer_j)\) of the adjacent nodes \(V_i\) and \(V_j\). The second relation (of type weight
) indicates the “weight” of the connection.
This representation of the edges of a MLN deviates from the one you can find in the paper, which is described in the last section.
There are 10 types of relations in the above edge table
g.e['type'].unique()
array(['Business', 'Communication', 'O Logistics', 'O Meetings',
'O Operations', 'O Training', 'T Classmates', 'T Friendship',
'T Kinship', 'T Soulmates'], dtype=object)
which - in the case of this data set - correspond to the layers of the
nodes. This is due to the fact that there are no inter-layer connections
in the Noordin Top Terrorist Network (such as, e.g., an edge from layer
Business
to layer Communication
would be). The edges here are
all (undirected) intra-layer edges (e.g., Business \(\rightarrow\)
Business, Operations \(\rightarrow\) Operations).
To see how the edges are distributed among the different types, you can simply type
g.e['type'].value_counts()
O Operations 267
Communication 200
T Classmates 175
O Training 147
T Friendship 91
O Meetings 63
O Logistics 29
T Kinship 16
Business 15
T Soulmates 11
Name: type, dtype: int64
Let’s have a look at how many “actors” (nodes with at least one connection) there are within each layer
# append degree
gtg = g.return_gt_graph()
g.v['deg'] = gtg.degree_property_map('total').a
# how many "actors" are there per layer?
g.v[g.v.deg != 0].groupby('layer').size()
layer
Business 13
Communication 74
O Logistics 16
O Meetings 26
O Operations 39
O Training 38
T Classmates 39
T Friendship 61
T Kinship 24
T Soulmates 9
dtype: int64
For the purpose of this tutorial, the fact that the Noordin Top Terrorist Network is a MLN with only one aspect, and without inter-layer edges, is of little importance. The generalization of what we’re showing in the following to more general MLNs is straight-forward (and explained in detail in Appendix D of the paper).
Let’s illustrate the supra-graph representation of this MLN by a plot
# create graph_tool graph for layout
import graph_tool.draw as gtd
gtg = g.return_gt_graph()
gtg.set_directed(False)
# get sfdp layout postitions
pos = gtd.sfdp_layout(gtg, gamma=.5)
pos = pos.get_2d_array([0, 1])
g.v['x'] = pos[0]
g.v['y'] = pos[1]
# configure nodes
kwds_scatter = {'s': 1,
'c': 'k'}
# configure edges
kwds_quiver = {'headwidth': 1,
'alpha': .3,
'cmap': 'prism'}
# color by type
C = g.e.groupby('type').grouper.group_info[0]
# plot
fig, ax = plt.subplots(1, 2, figsize=(15, 7))
g.plot_2d('x', 'y', edges=True, C=C,
kwds_scatter=kwds_scatter,
kwds_quiver=kwds_quiver, ax=ax[0])
# turn axis off, set x/y-lim
ax[0].axis('off')
ax[0].set_xlim((g.v.x.min() - 1, g.v.x.max() + 1))
ax[0].set_ylim((g.v.y.min() - 1, g.v.y.max() + 1))
# plot adjacency matrix
adj = g.return_cs_graph().todense()
adj = adj + adj.T
inds = np.where(adj != 0)
ax[1].scatter(inds[0], inds[1], c='k', marker='.')
ax[1].grid()
ax[1].set_xlim(-1, 791)
ax[1].set_ylim(-1,791)

The supra-graph representation of a MLN is by itself a powerful representation and exploitable in various ways (see, e.g., section 2.3 of this paper). However, in the following, we will demonstrate how to use the additional information attributed to the layers of the MLN, in order to “structure” and partition the MLN into different representations.
Redistributing Information on the Partition Lattice of the MLN¶
Based on the types of features V_N
and layer
, we can now
redistribute the information contained in the supra-graph g
. This
redistribution allows for several representations of the graph, which we
will demonstrate in the following.
The SuperGraph \(G^L = (V^L, E^L)\)¶
Partitioning by the type of feature layer
leads to the supergraph
\(G^L = (V^L,E^L)\), where every supernode
\(V^{L}_{i^L} \in V^{L}\) corresponds to a distinct layer,
encompassing all its respective nodes. Superedges
\(E^{L}_{i^L, j^L} \in E^{L}\) with either \(i^L = j^L\) or
\(i^L \neq j^L\) correspond to collections of intra- and inter-layer
edges of the MLN, respectively.
# partition the graph
lv, le = g.partition_graph('layer',
relation_funcs={'weight': ['sum', 'mean', 'std']})
lg = dg.DeepGraph(lv, le)
print(lg)
<DeepGraph object, with n=10 node(s) and m=10 edge(s) at 0x7fb8e1349c50>
print(lg.v)
n_nodes
layer
Business 79
Communication 79
O Logistics 79
O Meetings 79
O Operations 79
O Training 79
T Classmates 79
T Friendship 79
T Kinship 79
T Soulmates 79
print(lg.e)
n_edges weight_sum weight_mean weight_std
layer_s layer_t
Business Business 15 16.0 1.066667 0.258199
Communication Communication 200 200.0 1.000000 0.000000
O Logistics O Logistics 29 58.0 2.000000 0.000000
O Meetings O Meetings 63 170.0 2.698413 1.612801
O Operations O Operations 267 574.0 2.149813 0.699107
O Training O Training 147 334.0 2.272109 0.763534
T Classmates T Classmates 175 175.0 1.000000 0.000000
T Friendship T Friendship 91 91.0 1.000000 0.000000
T Kinship T Kinship 16 16.0 1.000000 0.000000
T Soulmates T Soulmates 11 11.0 1.000000 0.000000
Let’s plot the graph g
grouped by its layers.
# append layer_id to group nodes by layers
g.v['layer_id'] = g.v.groupby('layer').grouper.group_info[0].astype(np.int32)
# create graph_tool graph object
gtg = g.return_gt_graph(features=['layer_id'])
gtg.set_directed(False)
# get sfdp layout postitions
pos = gtd.sfdp_layout(gtg, groups=gtg.vp['layer_id'], mu=.15)
pos = pos.get_2d_array([0, 1])
g.v['x'] = pos[0]
g.v['y'] = pos[1]
# configure nodes
kwds_scatter = {'s': 10,
'c': 'k'}
# configure edges
kwds_quiver = {'headwidth': 1,
'alpha': .4,
'cmap': 'viridis'}
# color by weight
C = g.e.weight.values
# plot
fig, ax = plt.subplots(figsize=(12, 12))
obj = g.plot_2d('x', 'y', edges=True, C=C,
kwds_scatter=kwds_scatter,
kwds_quiver=kwds_quiver, ax=ax)
# turn axis off, set x/y-lim and name layers
ax.axis('off')
margin = 10
ax.set_xlim((g.v.x.min() - margin, g.v.x.max() + margin))
ax.set_ylim((g.v.y.min() - margin, g.v.y.max() + margin))
for layer in layer_names:
plt.text(g.v[g.v['layer'] == layer].x.mean() - margin * 3,
g.v[g.v['layer'] == layer].y.max() + margin,
layer, fontsize=15)

We can also plot the supergraph \(G^L = (V^L, E^L)\)
# create graph_tool graph of lg
gtg = lg.return_gt_graph(relations=True, node_indices=True, edge_indices=True)
# create plot
gtd.graph_draw(gtg,
vertex_text=gtg.vp['i'], vertex_text_position=-2,
vertex_fill_color='w',
vertex_text_color='k',
edge_text=gtg.ep['n_edges'],
inline=True, fit_view=.8,
output_size=(400,400))

The SuperGraph \(G^N = (V^N, E^N)\)¶
Partitioning by the type of feature V_N
leads to the supergraph
\(G^{N} = (V^{N}, E^{N})\), where each supernode
\(V^{N}_{i^N} \in V^{N}\) corresponds to a node of the MLN.
Superedges \(E^{N}_{i^N j^N} \in E^{N}\) with \(i^N = j^N\)
correspond to the coupling edges of a MLN.
# partition by MLN's node indices
nv, ne, gv, ge = g.partition_graph('V_N', return_gve=True)
# for each superedge, get types of edges and their weights
def type_weights(group):
index = group['type'].values
data = group['weight'].values
return pd.Series(data=data, index=index)
ne_weights = ge.apply(type_weights).unstack()
ne = pd.concat((ne, ne_weights), axis=1)
# create graph
ng = dg.DeepGraph(nv, ne)
ng
<DeepGraph object, with n=79 node(s) and m=623 edge(s) at 0x7fb8d1da8b70>
print(ng.v)
n_nodes
V_N
0 10
1 10
2 10
3 10
4 10
.. ...
74 10
75 10
76 10
77 10
78 10
[79 rows x 1 columns]
print(ng.e)
n_edges Business Communication O Logistics O Meetings O Operations O Training T Classmates T Friendship T Kinship T Soulmates
V_N_s V_N_t
0 15 3 NaN 1.0 2.0 NaN NaN NaN NaN NaN 1.0 NaN
1 4 1 NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN
5 1 NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN
16 1 NaN NaN NaN NaN 2.0 NaN NaN NaN NaN NaN
21 1 NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ...
72 73 4 NaN 1.0 NaN NaN 2.0 2.0 NaN NaN 1.0 NaN
76 6 NaN 1.0 NaN 2.0 2.0 2.0 1.0 1.0 NaN NaN
77 2 NaN NaN 2.0 NaN NaN NaN NaN NaN NaN 1.0
73 76 2 NaN NaN NaN NaN 2.0 2.0 NaN NaN NaN NaN
75 78 2 NaN NaN NaN NaN NaN 2.0 NaN 1.0 NaN NaN
[623 rows x 11 columns]
Let’s plot the graph g
grouped by V_N
.
# create graph_tool graph object
g.v['V_N'] = g.v['V_N'].astype(np.int32) # sfpd only takes int32
g_tmp = dg.DeepGraph(v)
gtg = g_tmp.return_gt_graph(features='V_N')
gtg.set_directed(False)
# get sfdp layout postitions
pos = gtd.sfdp_layout(gtg, groups=gtg.vp['V_N'], mu=.3, gamma=.01)
pos = pos.get_2d_array([0, 1])
g.v['x'] = pos[0]
g.v['y'] = pos[1]
# configure nodes
kwds_scatter = {'c': 'k'}
# configure edges
kwds_quiver = {'headwidth': 1,
'alpha': .2,
'cmap': 'viridis_r'}
# color by type
C = g.e.groupby('type').grouper.group_info[0]
# plot
fig, ax = plt.subplots(figsize=(15,15))
g.plot_2d('x', 'y', edges=True,
kwds_scatter=kwds_scatter, C=C,
kwds_quiver=kwds_quiver, ax=ax)
# turn axis off, set x/y-lim and name nodes
name_dic = {i: name for i, name in enumerate(vinfo.iloc[:79].Name)}
ax.axis('off')
ax.set_xlim((g.v.x.min() - 1, g.v.x.max() + 1))
ax.set_ylim((g.v.y.min() - 1, g.v.y.max() + 1))
for node in g.v['V_N'].unique():
plt.text(g.v[g.v['V_N'] == node].x.mean() - 1,
g.v[g.v['V_N'] == node].y.max() + 1,
name_dic[node], fontsize=12)

Let’s also plot the supergraph \(G^N = (V^N, E^N)\), where the color of the superedges corresponds to the number of edges within the respective superedge.
# get rid of isolated node for nicer layout
ng.v.drop(57, inplace=True, errors='ignore')
# create graph_tool graph object
gtg = ng.return_gt_graph(features=True, relations='n_edges')
gtg.set_directed(False)
# get sfdp layout postitions
pos = gtd.sfdp_layout(gtg)
pos = pos.get_2d_array([0, 1])
ng.v['x'] = pos[0]
ng.v['y'] = pos[1]
# configure nodes
kwds_scatter = {'s': 100,
'c': 'k'}
# configure edges
# split edges with only one type of connection
C_split_0 = ng.e['n_edges'].values.copy()
C_split_0[C_split_0 == 1] = 0
# edges with one type of connection
kwds_quiver_0 = {'alpha': .3,
'width': .001}
# edges with more than one type
kwds_quiver = {'headwidth': 1,
'width': .003,
'alpha': .7,
'cmap': 'Blues',
'clim': (1, ng.e.n_edges.max())}
# create plot
fig, ax = plt.subplots(figsize=(15,15))
ng.plot_2d('x', 'y', edges=True, C_split_0=C_split_0,
kwds_scatter=kwds_scatter, kwds_quiver_0=kwds_quiver_0,
kwds_quiver=kwds_quiver, ax=ax)
# turn axis off, set x/y-lim and name nodes
ax.axis('off')
ax.set_xlim(ng.v.x.min() - 1, ng.v.x.max() + 1)
ax.set_ylim(ng.v.y.min() - 1, ng.v.y.max() + 1)
for i in ng.v.index:
plt.text(ng.v.at[i, 'x'], ng.v.at[i, 'y'] + .3, i, fontsize=12)

The Tensor-Like Representation \(G^{NL} = (V^{NL}, E^{NL})\)¶
Considering only the information attributed to the layers of the MLN, and the fact that this MLN has just one aspect, there is only one more supergraph we can create of g
. It is given by creating the intersection partition (see section III E of the Deep Graphs paper) of the types of features V_N
and layer
. The resulting supergraph \(G^{N \cdot L} = (V^{N \cdot L},E^{N \cdot L})\) corresponds one to one to the graph \(G = (V, E)\), and therefore to the supra-graph representation of the MLN. The only difference is the indexing, which is tensor-like for the supergraph \(G^{N \cdot L}\).
# partition the graph
relation_funcs = {'type': 'sum', 'weight': 'sum'} # just to transfer relations
nlv, nle = g.partition_graph(['V_N', 'layer'], relation_funcs=relation_funcs)
nlg = dg.DeepGraph(nlv, nle)
nlg
<DeepGraph object, with n=790 node(s) and m=1014 edge(s) at 0x7fb8d5325550>
print(nlg.v)
n_nodes
V_N layer
0 Business 1
Communication 1
O Logistics 1
O Meetings 1
O Operations 1
... ...
78 O Training 1
T Classmates 1
T Friendship 1
T Kinship 1
T Soulmates 1
[790 rows x 1 columns]
print(nlg.e)
n_edges weight type
V_N_s layer_s V_N_t layer_t
0 Communication 15 Communication 1 1.0 Communication
O Logistics 15 O Logistics 1 2.0 O Logistics
T Kinship 15 T Kinship 1 1.0 T Kinship
1 O Operations 16 O Operations 1 2.0 O Operations
22 O Operations 1 2.0 O Operations
... ... ... ...
72 T Soulmates 77 T Soulmates 1 1.0 T Soulmates
73 O Operations 76 O Operations 1 2.0 O Operations
O Training 76 O Training 1 2.0 O Training
75 O Training 78 O Training 1 2.0 O Training
T Friendship 78 T Friendship 1 1.0 T Friendship
[1014 rows x 3 columns]
This tensor-like index allows you to use the advanced indexing features of pandas.
print(nlg.e.loc[2, 'Communication', :, 'Communication'])
n_edges weight type
V_N_s layer_s V_N_t layer_t
2 Communication 5 Communication 1 1.0 Communication
12 Communication 1 1.0 Communication
30 Communication 1 1.0 Communication
58 Communication 1 1.0 Communication
In the future, we might implement a method to convert this tensor-representation of a MLN to some sparse-tensor data structure (e.g., https://github.com/mnick/scikit-tensor). Another idea is to create an interface to a suitable multilayer network package that implements the measures and models developed particularly for MLNs.
Partitioning Edges Based on Node Properties¶
Here, we demonstrate very briefly how to use the additional information of the nodes to perform queries on the edges.
# create "undirected" edge table (swap-copy all edges)
g.e = pd.concat((e, e.swaplevel(0,1)))
g.e.sort_index(inplace=True)
print(g.partition_edges(source_features=['Nationality']))
n_edges
Nationality_s
3 1655
4 351
5 22
print(g.partition_edges(source_features=['Nationality'], target_features=['Military Training']))
n_edges
Nationality_s Military Training_t
3 0 185
1 51
3 847
4 60
5 115
... ...
5 4 3
5 1
7 1
9 1
10 1
[26 rows x 1 columns]
print(g.partition_edges(source_features=['Nationality'],
target_features=['Military Training'],
relations='type'))
n_edges
type Nationality_s Military Training_t
Business 3 0 3
3 16
4 1
9 2
10 2
... ...
T Soulmates 3 9 1
10 2
4 3 3
9 1
10 3
[138 rows x 1 columns]
Alternative Representation of the MLN Edges¶
The edges of the supra-graph representation as presented in the paper look like this
print(e_paper)
Business Communication O Logistics O Meetings O Operations O Training T Classmates T Friendship T Kinship T Soulmates
s t
9 67 2.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN
69 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN
77 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN
11 61 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN
20 59 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ...
733 769 NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0
755 769 NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0
787 NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0
771 788 NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0
783 788 NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0
[1014 rows x 10 columns]
As you can see, the edge table is also comprised of 1014 edges between
the nodes in v
. However, every type of connection get’s its own
column, where a “nan” value means that an edge does not have a relation
of the corresponding type.
What Next
Now that you have an idea of what the DeepGraph package provides, you should investigate the parts of the package most useful for you. See API Reference for details.
API Reference¶
The API reference summarizes DeepGraph’s core class, its methods and the functions subpackage.
The DeepGraph class¶
DeepGraph ([v, e, supernode_labels_by, …]) |
The core class of DeepGraph (dg). |
Creating Edges¶
DeepGraph.create_edges ([connectors, …]) |
Create an edge table e linking the nodes in v . |
DeepGraph.create_edges_ft (ft_feature[, …]) |
Create (ft) an edge table e linking the nodes in v . |
Graph Partitioning¶
DeepGraph.partition_nodes (features[, …]) |
Return a supernode DataFrame sv . |
DeepGraph.partition_edges ([relations, …]) |
Return a superedge DataFrame se . |
DeepGraph.partition_graph (features[, …]) |
Return supergraph DataFrames sv and se . |
Graph Interfaces¶
DeepGraph.return_cs_graph ([relations, dropna]) |
Return scipy.sparse.coo_matrix representation(s). |
DeepGraph.return_nx_graph ([features, …]) |
Return a networkx.DiGraph representation. |
DeepGraph.return_nx_multigraph ([features, …]) |
Return a networkx.MultiDiGraph representation. |
DeepGraph.return_gt_graph ([features, …]) |
Return a graph_tool.Graph representation. |
Plotting Methods¶
DeepGraph.plot_2d (x, y[, edges, C, …]) |
Plot nodes and corresponding edges in 2 dimensions. |
DeepGraph.plot_2d_generator (x, y, by[, …]) |
Plot nodes and corresponding edges by groups. |
DeepGraph.plot_map (lon, lat[, edges, C, …]) |
Plot nodes and corresponding edges on a basemap. |
DeepGraph.plot_map_generator (lon, lat, by[, …]) |
Plot nodes and corresponding edges by groups, on basemaps. |
DeepGraph.plot_hist (x[, bins, log_bins, …]) |
Plot a histogram (or pdf) of x. |
DeepGraph.plot_logfile (logfile) |
Plot a logfile. |
Other Methods¶
DeepGraph.append_binning_labels_v (col, col_name) |
Append a column with binning labels of the values in v[col] . |
DeepGraph.append_cp ([directed, connection, …]) |
Append a component membership column to v . |
DeepGraph.filter_by_values_v (col, values) |
Keep only nodes in v with features of type col in values . |
DeepGraph.filter_by_values_e (col, values) |
Keep only edges in e with relations of type col in values . |
DeepGraph.filter_by_interval_v (col, interval) |
Keep only nodes in v with features of type col in interval . |
DeepGraph.filter_by_interval_e (col, interval) |
Keep only edges in e with relations of type col in interval . |
DeepGraph.update_edges () |
After removing nodes in v , update e . |
The Functions Module¶
deepgraph.functions |
Auxiliary connector and selector functions to create edges. |
Connector Functions¶
great_circle_dist (lat_s, lat_t, lon_s, lon_t) |
Return the great circle distance between nodes. |
cp_node_intersection (supernode_ids, sources, …) |
Work in progress! |
cp_intersection_strength (n_unique_nodes, …) |
Work in progress! |
hypergeometric_p_value (n_unique_nodes, …) |
Work in progress! |
Selector Functions¶
Contact¶
Email¶
Please feel free to contact me if you have questions or suggestions regarding DeepGraph:
Dominik Traxl <dominik.traxl@posteo.org>
Authors¶
Deepgraph was written as part of a PhD thesis in physics by Dominik Traxl at Humboldt University Berlin, the Berstein Center for Computational Neuroscience and the Potsdam Institute for Climate Impact Research.