Class SMOTEBalancer<U>
java.lang.Object
es.uam.eps.ir.relison.links.data.ml.balance.SMOTEBalancer<U>
- All Implemented Interfaces:
Balancer<U>
public class SMOTEBalancer<U> extends java.lang.Object implements Balancer<U>
Balances a dataset using the Synthetic Minority Over-Sampling Technique (SMOTE).
This method creates new instances by joining two different instances from the class.
Reference:Chawla, N.V, Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 16 (2002),pp. 321-357.
-
Field Summary
-
Constructor Summary
Constructors Constructor Description SMOTEBalancer(int k, Generator<U> gen, U init)
Constructor. -
Method Summary
Modifier and Type Method Description InstanceSet<U>
balance(InstanceSet<U> original)
Given an unbalanced dataset, creates a new dataset where every class has the same number of examples.private double
distance(Instance<U> p1, Instance<U> p2, java.util.List<FeatureType> types)
Computes the distance between two instances.private java.util.List<Instance<U>>
generateNewInstances(int numNewInstances, int k, java.util.List<Instance<U>> minInstances, java.util.List<FeatureType> types)
Generates new instances.private java.util.List<Instance<U>>
populate(int numExtra, Instance<U> p, java.util.List<Instance<U>> neighbourhood, java.util.List<FeatureType> types)
Given an instance and its neighbours, generates a new list of instances.
-
Field Details
-
Constructor Details
-
SMOTEBalancer
Constructor.- Parameters:
k
- number of neighbors of each instance.gen
- user identifier generator.init
- initial user value.
-
-
Method Details
-
balance
Description copied from interface:Balancer
Given an unbalanced dataset, creates a new dataset where every class has the same number of examples. -
populate
private java.util.List<Instance<U>> populate(int numExtra, Instance<U> p, java.util.List<Instance<U>> neighbourhood, java.util.List<FeatureType> types)Given an instance and its neighbours, generates a new list of instances.- Parameters:
numExtra
- number of extra instances to compute.p
- the instance.neighbourhood
- the neighbourhood of the instancetypes
- types of the attributes.- Returns:
- the number of attributes.
-
distance
Computes the distance between two instances. It is computed using the euclidean distance. In case the feature is nominal, it is considered that two different values are at distance equal to 1.- Parameters:
p1
- first instance.p2
- second instance.types
- types of the features.- Returns:
- the distance.
-
generateNewInstances
private java.util.List<Instance<U>> generateNewInstances(int numNewInstances, int k, java.util.List<Instance<U>> minInstances, java.util.List<FeatureType> types)Generates new instances.- Parameters:
numNewInstances
- the number of new instances to generate.k
- the number of neighbors of an instance to take.minInstances
- the instances from the minority class.types
- the types of the different features.- Returns:
- a list of new instances.
-