A Short Introduction to DeepGraph
[ipython notebook] [python script] [data]
In the following, we demonstrate DeepGraph’s core functionalities by a toy data-set, “flying balls”.
First of all, we need to import some packages
# for plots
import matplotlib.pyplot as plt
# the usual
import numpy as np
import pandas as pd
import deepgraph as dg
# notebook display
%matplotlib inline
plt.rcParams['figure.figsize'] = 8, 6
pd.options.display.max_rows = 10
pd.set_option('expand_frame_repr', False)
Loading Toy Data
Then, we need data in the form of a pandas DataFrame, representing the nodes of our graph
v = pd.read_csv('flying_balls.csv', index_col=0)
print(v)
time x y ball_id
0 0 1692.000000 0.000000 0
1 0 8681.000000 0.000000 1
2 0 490.000000 0.000000 2
3 0 7439.000000 0.000000 3
4 0 4998.000000 0.000000 4
... ... ... ... ...
1163 45 2812.552734 16.503178 39
1164 46 5686.915998 14.161693 10
1165 46 3161.729086 19.381823 14
1166 46 5594.233413 57.701712 37
1167 47 5572.216748 20.588750 37
[1168 rows x 4 columns]
The data consists of 1168 space-time measurements of 50 different toy
balls in two-dimensional space. Each space-time measurement (i.e. row of
v) represents a node.
Let’s plot the data such that each ball has it’s own color
plt.scatter(v.x, v.y, s=v.time, c=v.ball_id)
Creating Edges
In order to create edges between these nodes, we now initiate a dg.DeepGraph instance
g = dg.DeepGraph(v)
g
<DeepGraph object, with n=1168 node(s) and m=0 edge(s) at 0x7facf3b35dd8>
and use it to create edges between the nodes given by g.v. For that matter, we may define a connector function
def x_dist(x_s, x_t):
dx = x_t - x_s
return dx
and pass it to g.create_edges in order to compute the distance in the x-coordinate of each pair of nodes
g.create_edges(connectors=x_dist)
g
<DeepGraph object, with n=1168 node(s) and m=681528 edge(s) at 0x7facf3b35dd8>
print(g.e)
dx
s t
0 1 6989.000000
2 -1202.000000
3 5747.000000
4 3306.000000
5 2812.000000
... ...
1164 1166 -92.682585
1167 -114.699250
1165 1166 2432.504327
1167 2410.487662
1166 1167 -22.016665
[681528 rows x 1 columns]
Let’s say we’re only interested in creating edges between nodes with a x-distance smaller than 1000. Then we may additionally define a selector
def x_dist_selector(dx, sources, targets):
dxa = np.abs(dx)
sources = sources[dxa <= 1000]
targets = targets[dxa <= 1000]
return sources, targets
and pass both the connector and selector to g.create_edges
g.create_edges(connectors=x_dist, selectors=x_dist_selector)
g
<DeepGraph object, with n=1168 node(s) and m=156938 edge(s) at 0x7facf3b35dd8>
print(g.e)
dx
s t
0 6 416.000000
7 848.000000
19 -973.000000
24 437.000000
38 778.000000
... ...
1162 1167 -44.033330
1163 1165 349.176351
1164 1166 -92.682585
1167 -114.699250
1166 1167 -22.016665
[156938 rows x 1 columns]
There is, however, a much more efficient way of creating edges that involve a simple distance threshold such as the one above
Creating Edges on a FastTrack
In order to efficiently create edges including a selection of edges via a simple distance threshold as above, one should use the create_edges_ft method. It relies on a sorted DataFrame, so we need to sort g.v first
g.v.sort_values('x', inplace=True)
g.create_edges_ft(ft_feature=('x', 1000))
g
<DeepGraph object, with n=1168 node(s) and m=156938 edge(s) at 0x7facf3b35dd8>
Let’s compare the efficiency
%timeit -n3 -r3 g.create_edges(connectors=x_dist, selectors=x_dist_selector)
3 loops, best of 3: 557 ms per loop
%timeit -n3 -r3 g.create_edges_ft(ft_feature=('x', 1000))
3 loops, best of 3: 167 ms per loop
The create_edges_ft method also accepts connectors and selectors as input. Let’s connect only those measurements that are close in space and time
def y_dist(y_s, y_t):
dy = y_t - y_s
return dy
def time_dist(time_t, time_s):
dt = time_t - time_s
return dt
def y_dist_selector(dy, sources, targets):
dya = np.abs(dy)
sources = sources[dya <= 100]
targets = targets[dya <= 100]
return sources, targets
def time_dist_selector(dt, sources, targets):
dta = np.abs(dt)
sources = sources[dta <= 1]
targets = targets[dta <= 1]
return sources, targets
g.create_edges_ft(ft_feature=('x', 100),
connectors=[y_dist, time_dist],
selectors=[y_dist_selector, time_dist_selector])
g
<DeepGraph object, with n=1168 node(s) and m=1899 edge(s) at 0x7facf3b35dd8>
print(g.e)
dt dy ft_r
s t
890 867 -1 19.311136 33.415831
867 843 -1 17.678482 33.415831
843 818 -1 16.045829 33.415831
818 792 -1 14.413176 33.415831
792 766 -1 12.780523 33.415831
... .. ... ...
244 203 -1 -10.825226 15.455612
203 159 -1 -12.457879 15.455612
159 114 -1 -14.090532 15.455612
114 65 -1 -15.723185 15.455612
65 16 -1 -17.355838 15.455612
[1899 rows x 3 columns]
We can now plot the flying balls and the edges we just created with the plot_2d method
obj = g.plot_2d('x', 'y', edges=True,
kwds_scatter={'c': g.v.ball_id, 's': g.v.time})
obj['ax'].set_xlim(1000,3000)
Graph Partitioning
The DeepGraph class also offers methods to partition nodes, edges and an entire graph. See the docstrings and the other tutorials for details and examples.
Graph Interfaces
Furthermore, you may inspect the docstrings of return_cs_graph, return_nx_graph and return_gt_graph to see how to convert from DeepGraph’s DataFrame representation of a network to sparse adjacency matrices, NetworkX’s network representation and graph_tool’s network representation.
Plotting Methods
DeepGraph also offers a number of useful Plotting methods. See plotting methods for details and have a look at the other tutorials for examples.