Module mixture
[hide private]
[frames] | no frames]

Module mixture

source code

PyMix - Python Mixture Package

The PyMix library implements algorithms and data structures for data mining with finite mixture models. The framework is object oriented and organized in a hierarchical fashion.

Classes [hide private]
  DataSet
Class DataSet is the central data object.
  MixtureError
Base class for mixture exceptions.
  InvalidPosteriorDistribution
Raised if an invalid posterior distribution occurs.
  InvalidDistributionInput
Raised if a DataSet is found to be incompatible with a given MixtureModel.
  ConvergenceFailureEM
Raised if a DataSet is found to be incompatible with a given MixtureModel.
  ProbDistribution
Base class for all probability distributions.
  PriorDistribution
Prior distribution base class for the Bayesian framework
  NormalDistribution
Univariate Normal Distribution
  MultiNormalDistribution
Multivariate Normal Distribution
  ConditionalGaussDistribution
Constructor for conditional Gauss distributions.
  DependenceTreeDistribution
This class implemements a tree of conditional Gaussians, including the tree topology learning.
  ExponentialDistribution
Exponential distribution
  UniformDistribution
Uniform distribution over a given intervall.
  MultinomialDistribution
Multinomial Distribution
  DiscreteDistribution
This is the special case of a MultinomialDistribution with p = 1, that is a simple univariate discrete distribution.
  DirichletPrior
Dirichlet distribution as Bayesian prior for MultinomialDistribution and derived .
  NormalGammaPrior
Inverse-Gamma Normal distribution prior for univariate Normal distribution.
  DirichletMixturePrior
Mixture of Dirichlet distributions prior for multinomial data.
  ConditionalGaussPrior
Prior over ConditionalGaussDistribution.
  ProductDistributionPrior
Prior for ProductDistribution objects.
  ProductDistribution
Class for joined distributions for a vector of random variables with (possibly) different types.
  MixtureModelPrior
Mixture model prior.
  MixtureModel
Class for a context-specific independence (CSI) mixture models.
  CandidateGroupHISTORY
  CandidateGroup
CandidateGroup is a simple container class.
  BayesMixtureModel
Bayesian mixture models
  ConstrainedDataSet
Extension of the DataSet object that can hold pairwise or label constraints in the objects.
  LabeledMixtureModel
Class for a mixture model containing the label constrained version of the E-Step See A.
  labeledBayesMixtureModel
Bayesian mixture models with labeled data.
  ConstrainedMixtureModel
Class for a mixture model containing the pairwise constrained version of the E-Step
Functions [hide private]
 
numerize(data)
Cast all elements in a list to numeric values.
source code
 
remove_col(matrix, index)
Removes a column in a Python matrix (list of lists)
source code
 
LMMfromMM(mm)
Convenience function.
source code
 
CMMfromMM(mm)
Convenience function.
source code
 
structureAccuracy(true, m)
Returns the accuracy of two model structures with respect to the component partition they define.
source code
 
modelSelection(data, models, silent=False)
Computes model selection criterias NEC, BIC and AIC for a list of models.
source code
 
kl_dist(d1, d2)
Kullback-Leibler divergence for two distributions.
source code
 
sym_kl_dist(d1, d2)
Symmetric Kullback-Leibler divergence for two distributions.
source code
 
computeErrors(classes, clusters)
For an array of class labels and an array of cluster labels compute true positives, false negatives, true negatives and false positives.
source code
 
accuracy(classes, clusters)
Computes accuracy of a clustering solution
source code
 
sensitivity(classes, clusters)
Computes sensitivity of a clustering solution
source code
 
specificity(classes, clusters)
Computes specificity of a clustering solution
source code
 
random_vector(nr, normal=1.0)
Returns a random probability vector of length 'nr'.
source code
 
variance(data) source code
 
entropy(p)
Returns the Shannon entropy for the probilistic vector 'p'.
source code
 
get_loglikelihood(mix_model, data) source code
 
get_posterior(mix_model, data, logreturn=True) source code
 
sumlogs_purepy(a)
Given a Numeric.array a of log p_i, return log(sum p_i)
source code
 
sumlogs(a)
Call to C extension function sum_logs.
source code
 
matrixSumlogs(mat)
Call to C extension function matrix_sum_logs
source code
 
dict_intersection(d1, d2)
Computes the intersections between the key sets of two Python dictionaries.
source code
 
writeMixture(model, fileName, silent=False)
Stores model parameters in file 'fileName'.
source code
 
readMixture(fileName)
Reads model from file 'fileName'.
source code
 
parseMix(fileHandle, mtype, G, pi, compFix, leaders=None, groups=None)
Parses a flat file for a mixture model.
source code
 
parseProd(fileHandle, true_p)
Internal function.
source code
 
parseMixPrior(fileHandle, nr_dist, structPrior, nrCompPrior) source code
 
parseDirichletMixPrior(fileHandle, G, M, pi) source code
 
parseFile(fileHandle)
Internal function.
source code
 
chomp(string)
Removes a newline character from the end of the string if present
source code
 
sequence(next, token, end) source code
 
atom(next, token) source code
 
simple_eval(source) source code
Variables [hide private]
  log = logging.getLogger("PyMix")
  hdlr = logging.StreamHandler(sys.stderr)
  fmt = logging.Formatter("%(name)s %(filename)s:%(lineno)d - %(...
Function Details [hide private]

numerize(data)

source code 

Cast all elements in a list to numeric values.

Parameters:
  • data - list of data
Returns:
list of processed data

remove_col(matrix, index)

source code 

Removes a column in a Python matrix (list of lists)

Parameters:
  • matrix - Python list of lists
  • index - index of column to be removed
Returns:
matrix with column deleted

LMMfromMM(mm)

source code 

Convenience function. Takes a MixtureModel or and returns a LabeledMixtureModel with the same parameters.

Parameters:
  • mm - MixtureModel object

CMMfromMM(mm)

source code 

Convenience function. Takes a MixtureModel or and returns a ConstrainedMixtureModel with the same parameters.

Parameters:
  • mm - MixtureModel object

structureAccuracy(true, m)

source code 

Returns the accuracy of two model structures with respect to the component partition they define.

Parameters:
  • true - MixtureModel object with CSI structure
  • m - MixtureModel object with CSI structure
Returns:
agreement of the two structures as measure by the accuracy

modelSelection(data, models, silent=False)

source code 

Computes model selection criterias NEC, BIC and AIC for a list of models.

Parameters:
  • data - DataSet object
  • models - list of MixtureModel objects order with ascending number of components.
Returns:
list of optimal components number according to [NEC, BIC, AIC], in that order.

kl_dist(d1, d2)

source code 

Kullback-Leibler divergence for two distributions. Only accept MultinomialDistribution and NormalDistribution objects for now.

Parameters:
  • d1 - MultinomialDistribution or NormalDistribution instance
  • d2 - MultinomialDistribution or NormalDistribution instance
Returns:
Kullback-Leibler divergence between input distributions

sym_kl_dist(d1, d2)

source code 

Symmetric Kullback-Leibler divergence for two distributions. Only accept MultinomialDistribution and NormalDistribution objects for now.

Parameters:
  • d1 - MultinomialDistribution or NormalDistribution instance
  • d2 - MultinomialDistribution or NormalDistribution instance
Returns:
Symmetric Kullback-Leibler divergence between input distributions

computeErrors(classes, clusters)

source code 

For an array of class labels and an array of cluster labels compute true positives, false negatives, true negatives and false positives.

Assumes identical order of objects.

Class and cluster labels can be arbitrary data types supporting '==' operator.

Parameters:
  • classes - list of class labels (true labels)
  • clusters - list of cluster labels (predicted labels)
Returns:
Ratios for true positives, false negatives, true negatives, false postitives (tp, fn, tn, fp)

accuracy(classes, clusters)

source code 

Computes accuracy of a clustering solution

Parameters:
  • classes - list of true class labels
  • clusters - list of cluster labels
Returns:
accuracy

sensitivity(classes, clusters)

source code 

Computes sensitivity of a clustering solution

Parameters:
  • classes - list of true class labels
  • clusters - list of cluster labels
Returns:
sensitivity

specificity(classes, clusters)

source code 

Computes specificity of a clustering solution

Parameters:
  • classes - list of true class labels
  • clusters - list of cluster labels
Returns:
specificity

random_vector(nr, normal=1.0)

source code 

Returns a random probability vector of length 'nr'. Can be used to generate random parametrizations of a multinomial distribution with M = 'nr'.

Parameters:
  • nr - lenght of output vector
  • normal - sum over output vector, default 1.0
Returns:
list with random entries sampled from a uniform distribution on [0,1] and normalized to 'normal'

entropy(p)

source code 

Returns the Shannon entropy for the probilistic vector 'p'.

Parameters:
  • p - 'numpy' vector that sums up to 1.0

sumlogs_purepy(a)

source code 

Given a Numeric.array a of log p_i, return log(sum p_i)

Uses (assuming p_1 is maximal): log(\Sum p_i) = log(p_1) + log( 1 + \Sum_{i=2} exp(log(p_i) - log(p_1)))

NOTE: The sumlogs functions returns the sum for values != -Inf

dict_intersection(d1, d2)

source code 

Computes the intersections between the key sets of two Python dictionaries. Returns another dictionary with the intersection as keys.

Parameters:
  • d1 - dictionary object
  • d2 - dictionary object
Returns:
dictionary with keys equal to the intersection of keys between d1 and d2.

writeMixture(model, fileName, silent=False)

source code 

Stores model parameters in file 'fileName'.

Parameters:
  • model - MixtureModel object
  • fileName - file name the model is to be written to

readMixture(fileName)

source code 

Reads model from file 'fileName'.

Parameters:
  • fileName - file to be read
Returns:
MixtureModel object

parseMix(fileHandle, mtype, G, pi, compFix, leaders=None, groups=None)

source code 

Parses a flat file for a mixture model. Internal function, is invoked from readMixture.

parseProd(fileHandle, true_p)

source code 

Internal function. Parses product distribution.

parseFile(fileHandle)

source code 

Internal function. Parses flat files.

chomp(string)

source code 

Removes a newline character from the end of the string if present

Parameters:
  • string - input string
Returns:
the argument without tailing newline.

Variables Details [hide private]

fmt

Value:
logging.Formatter("%(name)s %(filename)s:%(lineno)d - %(message)s")