Module mixture :: Class ProbDistribution
[hide private]
[frames] | no frames]

Class ProbDistribution

source code


Base class for all probability distributions.

Instance Methods [hide private]
 
__init__(self)
Constructor
source code
 
__eq__(self, other)
Interface for the '==' operation
source code
 
__str__(self)
String representation of the DataSet
source code
 
__copy__(self)
Interface for the copy.copy function
source code
 
pdf(self, data)
Density function.
source code
 
MStep(self, posterior, data, mix_pi=None)
Maximization step of the EM procedure.
source code
 
sample(self)
Samples a single value from the distribution.
source code
 
sampleSet(self, nr)
Samples several values from the distribution.
source code
 
sufficientStatistics(self, posterior, data)
Returns sufficient statistics for a given data set and posterior.
source code
 
isValid(self, x)
Checks whether 'x' is a valid argument for the distribution and raises InvalidDistributionInput exception if that is not the case.
source code
 
formatData(self, x)
Formats samples 'x' for inclusion into DataSet object.
source code
 
flatStr(self, offset)
Returns the model parameters as a string compatible with the WriteMixture/ReadMixture flat file format.
source code
 
posteriorTraceback(self, x)
Returns the decoupled posterior distribution for each sample in 'x'.
source code
 
update_suff_p(self)
Updates the .suff_p field.
source code
 
merge(self, dlist, weights)
Merges 'self' with the distributions in'dlist' by an convex combination of the parameters as determined by 'weights'
source code
Method Details [hide private]

__eq__(self, other)
(Equality operator)

source code 

Interface for the '==' operation

Parameters:
  • other - object to be compared

__str__(self)
(Informal representation operator)

source code 

String representation of the DataSet

Returns:
string representation

pdf(self, data)

source code 

Density function. MUST accept either numpy or DataSet object of appropriate values. We use numpys as input for the atomar distributions for efficiency reasons (The cleaner solution would be to construct DataSet subset objects for the different features and we might switch over to doing that eventually).

Parameters:
  • data - DataSet object or numpy array
Returns:
log-value of the density function for each sample in 'data'

MStep(self, posterior, data, mix_pi=None)

source code 

Maximization step of the EM procedure. Reestimates the distribution parameters using the posterior distribution and the data.

MUST accept either numpy or DataSet object of appropriate values. numpys are used as input for the atomar distributions for efficiency reasons

Parameters:
  • posterior - posterior distribution of component membership
  • data - DataSet object or 'numpy' of samples
  • mix_pi - mixture weights, necessary for MixtureModels as components.

sample(self)

source code 

Samples a single value from the distribution.

Returns:
sampled value

sampleSet(self, nr)

source code 

Samples several values from the distribution.

Parameters:
  • nr - number of values to be sampled.
Returns:
sampled values

sufficientStatistics(self, posterior, data)

source code 

Returns sufficient statistics for a given data set and posterior.

Parameters:
  • posterior - numpy vector of component membership posteriors
  • data - numpy vector holding the data
Returns:
list with dot(posterior, data) and dot(posterior, data**2)

isValid(self, x)

source code 

Checks whether 'x' is a valid argument for the distribution and raises InvalidDistributionInput exception if that is not the case.

Parameters:
  • x - single sample in external representation, i.e.. an entry of DataSet.dataMatrix
Returns:
True/False flag

formatData(self, x)

source code 

Formats samples 'x' for inclusion into DataSet object. Used by DataSet.internalInit()

Parameters:
  • x - list of samples
Returns:
two element list: first element = dimension of self, second element = sufficient statistics for samples 'x'

flatStr(self, offset)

source code 

Returns the model parameters as a string compatible with the WriteMixture/ReadMixture flat file format.

Parameters:
  • offset - number of ' ' characters to be used in the flatfile.

posteriorTraceback(self, x)

source code 

Returns the decoupled posterior distribution for each sample in 'x'. Used for analysis of clustering results.

Parameters:
  • x - list of samples
Returns:
decoupled posterior

merge(self, dlist, weights)

source code 

Merges 'self' with the distributions in'dlist' by an convex combination of the parameters as determined by 'weights'

Parameters:
  • dlist - list of distribution objects of the same type as 'self'
  • weights - list of weights, need not to sum up to one