deepgraph.deepgraph.DeepGraph.append_binning_labels_v¶
-
DeepGraph.
append_binning_labels_v
(col, col_name, bins=10, log_bins=False, floor=False, return_bin_edges=False)[source]¶ Append a column with binning labels of the values in
v[col]
.Append a column
col_name
tov
with the indices of the bins to which each value inv[col]
belongs to.If
bins
is an int, it determines the number of bins to create. Iflog_bins
is True, this number determines the (approximate) number of bins to create for each magnitude. For linear bins, it is the number of bins for the whole range of values. Iffloor
is set True, the bin edges are floored to the closest integer. Ifreturn_bin_edges
is set True, the created bin edges are returned.If
bins
is a sequence, it defines the bin edges, including the rightmost edge, allowing for non-uniform bin widths.See
np.digitize
for details.Parameters: - col (int or str) – A column name of
v
, whose corresponding values are binned and labelled. - col_name (str) – The column name for the created labels.
- bins (int or array_lke, optional (default=10)) – If
bins
is an int, it determines the number of bins to create. Iflog_bins
is True, this number determines the (approximate) number of bins to create for each magnitude. For linear bins, it is the number of bins for the whole range of values. Ifbins
is a sequence, it defines the bin edges, including the rightmost edge, allowing for non-uniform bin widths. - log_bins (bool, optional (default=False)) – Whether to use logarithmically or linearly spaced bins.
- floor (bool, optional (default=False)) – Whether to floor the bin edges to the closest integers.
- return_bin_edges (bool, optional (default=False)) – Whether to return the bin edges.
Returns: - v (pd.DataFrame) – Appends an extra column
col_name
tov
with the binning labels. - bin_edges (np.ndarray) – Optionally, return the created bin edges.
Examples
First, we need a node table:
>>> import pandas as pd >>> import deepgraph as dg >>> v = pd.DataFrame({'time': [1,2,12,105,899]}) >>> g = dg.DeepGraph(v)
>>> g.v time 0 1 1 2 2 12 3 105 4 899
Binning time values with default arguments:
>>> bin_edges = g.append_binning_labels_v('time', 'time_l', ... return_bin_edges=True)
>>> bin_edges array([ 1. , 100.77777778, 200.55555556, 300.33333333, 400.11111111, 499.88888889, 599.66666667, 699.44444444, 799.22222222, 899. ])
>>> g.v time time_l 0 1 1 1 2 1 2 12 1 3 105 2 4 899 10
Binning time values with logarithmically spaced bins:
>>> bin_edges = g.append_binning_labels_v('time', 'time_l', bins=5, ... log_bins=True, ... return_bin_edges=True)
>>> bin_edges array([ 1. , 1.62548451, 2.64219989, 4.29485499, 6.98122026, 11.34786539, 18.44577941, 29.9833287 , 48.73743635, 79.22194781, 128.77404899, 209.32022185, 340.24677814, 553.06586728, 899. ])
>>> g.v time time_l 0 1 1 1 2 2 2 12 6 3 105 10 4 899 15
Binning time values with logarithmically spaced bins (floored):
>>> bin_edges = g.append_binning_labels_v('time', 'time_l', bins=5, ... log_bins=True, floor=True, ... return_bin_edges=True)
>>> bin_edges array([ 1., 2., 4., 6., 11., 18., 29., 48., 79., 128., 209., 340., 553., 899.])
>>> g.v time time_l 0 1 1 1 2 2 2 12 5 3 105 9 4 899 14
- col (int or str) – A column name of