What is DeepGraph¶
DeepGraph is an open source Python implementation of a new network representation introduced here. Its purpose is to facilitate data analysis by interpreting data in terms of network theory.
The basis of this software package is Pandas, a fast and flexible data analysis tool for the Python programming language. Utilizing one of its primary data structures, the DataFrame, we represent objects (i.e. the nodes of a network) by one DataFrame, and their pairwise relations (i.e. the edges of a network) by another DataFrame.
One of the main features of DeepGraph is an efficient and scalable creation of
edges. Given a set of nodes in the form of a DataFrame (or an on disc
HDFStore),
DeepGraph’s core class
provides
methods to iteratively compute pairwise relations
between the nodes (e.g. similarity/distance measures) using arbitrary, user-defined
functions on the nodes’ features. These methods provide arguments to
parallelize the computation and control memory consumption, making them
suitable for very large data-sets and adjustable to whatever hardware you have
at hand (from netbooks to cluster architectures).
Furthermore, once a graph is constructed, DeepGraph allows you to partition its
nodes
,
edges
or the entire
graph
by the
graph’s properties and labels, enabling the aggregation, computation and
allocation of information on and between arbitrary groups of nodes. These
methods also let you express elaborate queries on the information contained in
a deep graph.
DeepGraph is not meant to replace or compete with already existing Python network libraries, such as NetworkX or graph_tool, but rather to combine and extend their capabilities with the merits of Pandas. For that matter, the core class of DeepGraph provides interfacing methods to convert to common network representations and graph objects of popular Python network packages.
Deepgraph also implements a number of useful plotting methods, including drawings on geographical map projections.
It’s also possible to represent multilayer networks by deep graphs. We’re thinking of implementing an interface to a suitable package dedicated to the analysis of multilayer networks.
Note
Please acknowledge the authors and cite the use of this software when results are used in publications or published elsewhere. Various citation formats are available here: https://aip.scitation.org/action/showCitFormats?type=show&doi=10.1063%2F1.4952963
For your convenience, you can find the BibTex entry below:
@Article{traxl-2016-deep,
author = {Dominik Traxl AND Niklas Boers AND J\"urgen Kurths},
title = {Deep Graphs - A general framework to represent and analyze
heterogeneous complex systems across scales},
journal = {Chaos},
year = {2016},
volume = {26},
number = {6},
eid = {065303},
doi = {http://dx.doi.org/10.1063/1.4952963},
eprinttype = {arxiv},
eprintclass = {physics.data-an, cs.SI, physics.ao-ph, physics.soc-ph},
eprint = {http://arxiv.org/abs/1604.00971v1},
version = {1},
date = {2016-04-04},
url = {http://arxiv.org/abs/1604.00971v1}
}
To get started, have a look at
Want to share feedback, or contribute?
So far the package has only been developed by me, a fact that I would like to change very much. So if you feel like contributing in any way, shape or form, please feel free to contact me, report bugs, create pull requestes, milestones, etc. You can contact me via email: dominik.traxl@posteo.org
Note
This documentation assumes general familiarity with NumPy and Pandas. If you haven’t used these packages, do invest some time in learning about them first.
Note
DeepGraph is free software; you can redistribute it and/or modify it under the terms of the BSD License. We highly welcome contributions from the community.