SplitAsFederatedData

class fedartml.SplitAsFederatedData(random_state=None)[source]

Creates federated data from the provided centralized data (features and labels) to exemplify identically and non-identically distributed labels and features across the local nodes (clients). It allows one to select between two methods of data federation (percent_noniid and dirichlet). It works only for classification problems (labels as classes).

Parameters

random_stateint

Controls the shuffling applied to the generation of pseudorandom numbers. Pass an int for reproducible output across multiple function calls.

References

static add_gaussian_noise(feat, mu=0, sigma=0, client_id=0, local_nodes=4, random_state=None)[source]

Add Gaussian random noise to given features.

Parameters

featarray-like

List of numpy arrays (or pandas dataframe) with images (i.e. features).

mufloat

Mean (“centre”) of the Gaussian distribution.

sigmafloat

Standard deviation (noise) of the Gaussian distribution. Must be non-negative.

client_idint

Identification or number of the client to add the noise.

local_nodesint

Number of local nodes (clients) used in the federated learning paradigm.

random_state: int

Controls the shuffling applied to the generation of pseudorandom numbers. Pass an int for reproducible output across multiple function calls.

Returns

featarray-like

List of numpy arrays (or pandas dataframe) with images (i.e. features) with the random noise applied.

References

create_clients(image_list, label_list, num_clients=4, prefix_cli='client', method='dirichlet', alpha=1000, percent_noniid=0, sigma_noise=0, bins='n_samples', feat_sample_rate=0.1, feat_skew_method='gaussian-noise', alpha_feat_split=1000, idx_feat='feat-mean', feat_quantile=20, quant_skew_method='no-quant-skew', alpha_quant_split=1000, spa_temp_skew_method='no-spatemp-skew', alpha_spa_temp=1000, spa_temp_var=None)[source]

Create a federated dataset divided per each local node (client) using the desired method (percent_noniid or dirichlet). It works only for classification problems (labels as classes) with quantitaive (numeric) features.

Parameters

image_listarray-like

List of numpy arrays (or pandas dataframe) with images (i.e. features) from the centralized data.

label_listarray-like

The target values (class labels in classification) from the centralized data.

num_clientsint

Number of local nodes (clients) used in the federated learning paradigm.

prefix_clistr

The clients’ name prefix, e.g., client_1, client_2, etc.

methodstring

Method to create the federated data based on label skew. Possible options: “percent_noniid”(default), “dirichlet”, “no-label-skew”

alphafloat

Concentration parameter of the Dirichlet distribution defining the desired degree of non-IID-ness for the labels of the federated data.

percent_noniidfloat

Percentage (between o and 100) desired of non-IID-ness for the labels of the federated data.

sigma_noisefloat

Noise (sigma parameter of Gaussian distro) to be added to the features. Applicable only for feat_skew_method=”gaussian-noise”.

binsint or str

Number of bins used to create histogram of features to check feature skew. It can be the word ‘n_samples’ or the integer number of bins to use. If ‘n_samples’(default) is selected, then it is set as the number values of the image_list (examples). Applicable only for feat_skew_method=”gaussian-noise”.

feat_sample_ratefloat

Proportion (between 0 and 1) to be sampled from features. This parameter is useful when dealing with datasets with many features (i.e. images). Applicable only for feat_skew_method=”gaussian-noise”.

feat_skew_methodstr

Method to create the federated data based on feature skew. Possible options: “gaussian-noise”(default), “hist-dirichlet”

alpha_feat_splitfloat

Concentration parameter of the Dirichlet distribution defining the desired degree of non-IID-ness for the features of the federated data. Applicable only for feat_skew_method=”hist-dirichlet”.

idx_featint or str

Position (idx) of feature used to simulate feature skew. It can be the word ‘feat-mean’ or the integer number of the position to use. If ‘feat-mean’(default) is selected, then the mean of all the features is computed as representative of the features. Applicable only for feat_skew_method=”hist-dirichlet”.

feat_quantileint

Number quantiles to use in the feature skew simulation. 20 for ventiles (default), 10 for deciles, 4 for quartiles, etc. Applicable only for feat_skew_method=”hist-dirichlet”.

quant_skew_methodstr

Method to create the federated data based on quantity skew. Possible options: “no-quant-skew”(default), “dirichlet”, “minsize-dirichlet”

alpha_quant_splitfloat

Concentration parameter of the Dirichlet distribution defining the desired degree of non-IID-ness for the quantity skew of the federated data. Applicable only for quant_skew_method=”dirichlet”.

spa_temp_skew_methodstr

Method to create the federated data based on spatio-temporal skew. Possible options: “no-spatemp-skew”(default), “st-dirichlet”

alpha_spa_tempfloat

Concentration parameter of the Dirichlet distribution defining the desired degree of non-IID-ness for the spatio-temporal skew of the federated data. Applicable only for spa_temp_skew_method=”st-dirichlet”.

spa_temp_vararray-like

The spatio-temporal variable from the centralized data. Applicable only for spa_temp_skew_method=”st-dirichlet”.

Returns

fed_datadict

Contains features (images) and labels for each local node (client) after federating the data. Includes “with_class_completion” and “without_class_completion” cases.

ids_list_fed_dataarray-like

Indexes of examples (partition) taken for each local node (client).

num_missing_classesarray-like

Number of missing classes per each local node when creating the federated dataset

distancesdict

Distances calculated while measuring heterogeneity (non-IID-ness) of the label’s distribution among clients. Includes “with_class_completion” and “without_class_completion” cases.

spatemp_fed_datadict

Contains categories of the spatio-temporal variable for each local node (client) after federating the data. It is generated only when spa_temp_skew_method = “st-dirichlet”.

Note: When creating federated data and setting heterogeneous distributions (i.e. high values of percent_noniid or small values of alpha), it is more likely the clients hold examples from only one class. Then, two cases (for labels and features) are returned as output for fed_data and distances:
  • “with_class_completion”: In this case, the clients are completed with one (random) example of each missing class for each client to have all the label’s classes.

  • “without_class_completion”: In this case, the clients are NOT completed with one (random) example of each missing class. Consequently, summing the number of examples of each client results in the same number of total examples (number of rows in image_list).

References

Examples

>>> from fedartml import SplitAsFederatedData
>>> from keras.datasets import mnist
>>> (train_X, train_y), (test_X, test_y) = mnist.load_data()
>>> my_federater = SplitAsFederatedData(random_state=0)
>>>
>>> # Using percent_noniid method
>>> clients_glob, list_ids_sampled, miss_class_per_node, distances =
>>>     my_federater.create_clients(image_list=train_X, label_list=train_y, num_clients=4,
>>>     prefix_cli='Local_node',method="percent_noniid", percent_noniid=0)
>>>
>>> # Using dirichlet method
>>> clients_glob, list_ids_sampled, miss_class_per_node, distances =
>>>     my_federater.create_clients(image_list=train_X, label_list=train_y, num_clients=4,
>>>     prefix_cli='Local_node',method="dirichlet", alpha=1000)
static create_histogram(flat_input, bins)[source]

Create histogram and bins from given flatted features.

Parameters

flat_inputarray-like (flatten)

List of numpy arrays (or pandas dataframe) with images (i.e. features) flatten.

binsint

Number of bins to use in the histogram.

Returns

histogramarray-like

The values of the histogram. Normalized to sum up to 1.

bin_edgesarray-like

The bin edges.

References

static dirichlet_method(labels, local_nodes, alpha=1000, random_state=None)[source]

Create a federated dataset divided per each local node (client) using the Dirichlet (dirichlet) method.

Parameters

labelsarray-like

The target values (class labels in classification).

local_nodesint

Number of local nodes (clients) used in the federated learning paradigm.

alphafloat

Concentration parameter of the Dirichlet distribution defining the desired degree of non-IID-ness for the federated data.

random_stateint

Controls the shuffling applied to the generation of pseudorandom numbers. Pass an int for reproducible output across multiple function calls.

Returns

pctg_distrarray-like

Percentage (between 0 and 1) distribution of the classes for each local node (client).

num_distrarray-like

Numbers of distribution of the classes for each local node (client).

idx_distrarray-like

Indexes of examples (partition) taken for each local node (client).

num_per_nodearray-like

Number of examples per each local node (client).

References

static dirichlet_method_quant_skew(labels, local_nodes, alpha=1000, random_state=None, method='no-quant-skew')[source]

Create a federated dataset divided per each local node (client) using the Dirichlet (dirichlet) method to evaluate quantity skew.

Parameters

labelsarray-like

The target values (class labels in classification).

local_nodesint

Number of local nodes (clients) used in the federated learning paradigm.

alphafloat

Concentration parameter of the Dirichlet distribution defining the desired degree of non-IID-ness for the federated data.

random_stateint

Controls the shuffling applied to the generation of pseudorandom numbers. Pass an int for reproducible output across multiple function calls.

methodstr

Method to create the federated data based on quantity skew. Possible options: “no-quant-skew”(default), “dirichlet”, “minsize-dirichlet”

Returns

pctg_distrarray-like

Percentage (between 0 and 1) distribution of the classes for each local node (client).

num_distrarray-like

Numbers of distribution of the classes for each local node (client).

idx_distrarray-like

Indexes of examples (partition) taken for each local node (client).

num_per_nodearray-like

Number of examples per each local node (client).

References

static percent_noniid_method(labels, local_nodes, pct_noniid=0, random_state=None)[source]

Create a federated dataset divided per each local node (client) using the Percentage of Non-IID (pctg_noniid) method.

Parameters

labelsarray-like

The target values (class labels in classification).

local_nodesint

Number of local nodes (clients) used in the federated learning paradigm.

pct_noniidfloat

Percentage (between o and 100) desired of non-IID-ness for the federated data.

random_stateint

Controls the shuffling applied to the generation of pseudorandom numbers. Pass an int for reproducible output across multiple function calls.

Returns

pctg_distrarray-like

Percentage (between 0 and 1) distribution of the classes for each local node (client).

num_distrarray-like

Numbers of distribution of the classes for each local node (client).

idx_distrarray-like

Indexes of examples (partition) taken for each local node (client).

num_per_nodearray-like

Number of examples per each local node (client).

References

Examples

static st_dirichlet_method(labels, local_nodes, alpha=1000, random_state=None, st_variable=None)[source]

Create a federated dataset divided per each local node (client) using the Dirichlet (dirichlet) method.

Parameters

labelsarray-like

The target values (class labels in classification).

local_nodesint

Number of local nodes (clients) used in the federated learning paradigm.

alphafloat

Concentration parameter of the Dirichlet distribution defining the desired degree of non-IID-ness for the federated data.

random_stateint

Controls the shuffling applied to the generation of pseudorandom numbers. Pass an int for reproducible output across multiple function calls.

st_variablearray-like

The spatio-temporal variable from the centralized data.

Returns

pctg_distrarray-like

Percentage (between 0 and 1) distribution of the classes for each local node (client).

num_distrarray-like

Numbers of distribution of the classes for each local node (client).

idx_distrarray-like

Indexes of examples (partition) taken for each local node (client).

num_per_nodearray-like

Number of examples per each local node (client).

pctg_distr_st_vararray-like

Percentage (between 0 and 1) distribution of the spatio-temporal variable’s categories for each local node (client).

References