deepgraph.deepgraph.DeepGraph.append_binning_labels_v

DeepGraph.append_binning_labels_v(col, col_name, bins=10, log_bins=False, floor=False, return_bin_edges=False)

Append a column with binning labels of the values in v[col].

Append a column col_name to v with the indices of the bins to which each value in v[col] belongs to.

If bins is an int, it determines the number of bins to create. If log_bins is True, this number determines the (approximate) number of bins to create for each magnitude. For linear bins, it is the number of bins for the whole range of values. If floor is set True, the bin edges are floored to the closest integer. If return_bin_edges is set True, the created bin edges are returned.

If bins is a sequence, it defines the bin edges, including the rightmost edge, allowing for non-uniform bin widths.

See np.digitize for details.

Parameters:
  • col (int or str) – A column name of v, whose corresponding values are binned and labelled.
  • col_name (str) – The column name for the created labels.
  • bins (int or array_lke, optional (default=10)) – If bins is an int, it determines the number of bins to create. If log_bins is True, this number determines the (approximate) number of bins to create for each magnitude. For linear bins, it is the number of bins for the whole range of values. If bins is a sequence, it defines the bin edges, including the rightmost edge, allowing for non-uniform bin widths.
  • log_bins (bool, optional (default=False)) – Whether to use logarithmically or linearly spaced bins.
  • floor (bool, optional (default=False)) – Whether to floor the bin edges to the closest integers.
  • return_bin_edges (bool, optional (default=False)) – Whether to return the bin edges.
Returns:

  • v (pd.DataFrame) – Appends an extra column col_name to v with the binning labels.
  • bin_edges (np.ndarray) – Optionally, return the created bin edges.

Examples

First, we need a node table:

>>> import pandas as pd
>>> import deepgraph as dg
>>> v = pd.DataFrame({'time': [1,2,12,105,899]})
>>> g = dg.DeepGraph(v)
>>> g.v
   time
0     1
1     2
2    12
3   105
4   899

Binning time values with default arguments:

>>> bin_edges = g.append_binning_labels_v('time', 'time_l',
...                                       return_bin_edges=True)
>>> bin_edges
array([   1.        ,  100.77777778,  200.55555556,  300.33333333,
        400.11111111,  499.88888889,  599.66666667,  699.44444444,
        799.22222222,  899.        ])
>>> g.v
   time  time_l
0     1       1
1     2       1
2    12       1
3   105       2
4   899      10

Binning time values with logarithmically spaced bins:

>>> bin_edges = g.append_binning_labels_v('time', 'time_l', bins=5,
...                                       log_bins=True,
...                                       return_bin_edges=True)
>>> bin_edges
array([   1.        ,    1.62548451,    2.64219989,    4.29485499,
          6.98122026,   11.34786539,   18.44577941,   29.9833287 ,
         48.73743635,   79.22194781,  128.77404899,  209.32022185,
        340.24677814,  553.06586728,  899.        ])
>>> g.v
   time  time_l
0     1       1
1     2       2
2    12       6
3   105      10
4   899      15

Binning time values with logarithmically spaced bins (floored):

>>> bin_edges = g.append_binning_labels_v('time', 'time_l', bins=5,
...                                       log_bins=True, floor=True,
...                                       return_bin_edges=True)
>>> bin_edges
array([   1.,    2.,    4.,    6.,   11.,   18.,   29.,   48.,   79.,
        128.,  209.,  340.,  553.,  899.])
>>> g.v
   time  time_l
0     1       1
1     2       2
2    12       5
3   105       9
4   899      14