Class SMOTEBalancer<U>

java.lang.Object
es.uam.eps.ir.relison.links.data.ml.balance.SMOTEBalancer<U>
All Implemented Interfaces:
Balancer<U>

public class SMOTEBalancer<U>
extends java.lang.Object
implements Balancer<U>
Balances a dataset using the Synthetic Minority Over-Sampling Technique (SMOTE). This method creates new instances by joining two different instances from the class.

Reference:Chawla, N.V, Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 16 (2002),pp. 321-357.

  • Field Summary

    Fields 
    Modifier and Type Field Description
    private Generator<U> gen
    A user generator.
    private int k
    Number of neighbours.
  • Constructor Summary

    Constructors 
    Constructor Description
    SMOTEBalancer​(int k, Generator<U> gen, U init)
    Constructor.
  • Method Summary

    Modifier and Type Method Description
    InstanceSet<U> balance​(InstanceSet<U> original)
    Given an unbalanced dataset, creates a new dataset where every class has the same number of examples.
    private double distance​(Instance<U> p1, Instance<U> p2, java.util.List<FeatureType> types)
    Computes the distance between two instances.
    private java.util.List<Instance<U>> generateNewInstances​(int numNewInstances, int k, java.util.List<Instance<U>> minInstances, java.util.List<FeatureType> types)
    Generates new instances.
    private java.util.List<Instance<U>> populate​(int numExtra, Instance<U> p, java.util.List<Instance<U>> neighbourhood, java.util.List<FeatureType> types)
    Given an instance and its neighbours, generates a new list of instances.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • k

      private final int k
      Number of neighbours.
    • gen

      private final Generator<U> gen
      A user generator.
  • Constructor Details

    • SMOTEBalancer

      public SMOTEBalancer​(int k, Generator<U> gen, U init)
      Constructor.
      Parameters:
      k - number of neighbors of each instance.
      gen - user identifier generator.
      init - initial user value.
  • Method Details

    • balance

      public InstanceSet<U> balance​(InstanceSet<U> original)
      Description copied from interface: Balancer
      Given an unbalanced dataset, creates a new dataset where every class has the same number of examples.
      Specified by:
      balance in interface Balancer<U>
      Parameters:
      original - the original dataset.
      Returns:
      the balanced dataset.
    • populate

      private java.util.List<Instance<U>> populate​(int numExtra, Instance<U> p, java.util.List<Instance<U>> neighbourhood, java.util.List<FeatureType> types)
      Given an instance and its neighbours, generates a new list of instances.
      Parameters:
      numExtra - number of extra instances to compute.
      p - the instance.
      neighbourhood - the neighbourhood of the instance
      types - types of the attributes.
      Returns:
      the number of attributes.
    • distance

      private double distance​(Instance<U> p1, Instance<U> p2, java.util.List<FeatureType> types)
      Computes the distance between two instances. It is computed using the euclidean distance. In case the feature is nominal, it is considered that two different values are at distance equal to 1.
      Parameters:
      p1 - first instance.
      p2 - second instance.
      types - types of the features.
      Returns:
      the distance.
    • generateNewInstances

      private java.util.List<Instance<U>> generateNewInstances​(int numNewInstances, int k, java.util.List<Instance<U>> minInstances, java.util.List<FeatureType> types)
      Generates new instances.
      Parameters:
      numNewInstances - the number of new instances to generate.
      k - the number of neighbors of an instance to take.
      minInstances - the instances from the minority class.
      types - the types of the different features.
      Returns:
      a list of new instances.