Module mixture :: Class DataSet
[hide private]
[frames] | no frames]

Class DataSet

source code


Class DataSet is the central data object.

Instance Methods [hide private]
 
__init__(self)
Creates and returns an empty DataSet object
source code
 
__len__(self)
Returns the number of samples in the DataSet.
source code
 
__copy__(self)
Interface to copy.copy function.
source code
 
fromArray(self, array, IDs=None, col_header=None)
Initializes the data set from a 'numpy' object.
source code
 
fromList(self, List, IDs=None, col_header=None)
Initializes the data set from a Python list.
source code
 
fromFiles(self, fileNames, sep='\t', missing='*', fileID=None, IDheader=False, IDindex=None)
Initializes the data set from a list of data flat files.
source code
 
__str__(self)
String representation of the DataSet
source code
 
printClustering(self, c, col_width=None)
Pretty print of a clustering .
source code
 
internalInit(self, m)
Initializes the internal representation of the data used by the EM algorithm .
source code
 
getInternalFeature(self, i)
Returns the columns of self.internalData containing the data of the feature with index 'i'
source code
 
removeFeatures(self, ids, silent=0)
Remove a list of features from the data set.
source code
 
removeSamples(self, ids, silent=0)
Remove a list of samples from the data set.
source code
 
filterSamples(self, fid, min_value, max_value)
Removes all samples with values < 'min_value' or > 'max_value' in feature 'fid'.
source code
 
maskDataSet(self, valueToMask, maskValue, silent=False)
Allows the masking of a value with another in the entire data matrix.
source code
 
maskFeatures(self, headerList, valueToMask, maskValue)
Equivalent to maskDataSet but constrained to a subset of features
source code
 
getExternalFeature(self, fid)
Returns the external data representation of a given feature
source code
 
extractSubset(self, ids)
Remove all samples in 'ids' from 'self' and return a new DataSet initialised with these samples
source code
 
singleFeatureSubset(self, index)
Returns a DataSet for the feature with internal index 'index' in 'self'.
source code
 
setMissingSymbols(self, findices, missing)
Assigns missing value placeholders to features.
source code
 
getMissingIndices(self, ind)
Get indices of missing values in one feature
source code
 
writeClusteringFasta(self, fn_pref, m)
Writes a clustering based on model 'm' into files in FASTA format.
source code
Method Details [hide private]

__len__(self)
(Length operator)

source code 

Returns the number of samples in the DataSet.

Returns:
Number of samples in the DataSet.

__copy__(self)

source code 

Interface to copy.copy function.

Returns:
deep copy of 'self'

fromArray(self, array, IDs=None, col_header=None)

source code 

Initializes the data set from a 'numpy' object.

Parameters:
  • array - 'numpy' object containing the data
  • IDs - sample IDs (optional)
  • col_header - feature headers (optional)

fromList(self, List, IDs=None, col_header=None)

source code 

Initializes the data set from a Python list.

Parameters:
  • List - Python list containing the data
  • IDs - sample IDs (optional)
  • col_header - feature headers (optional)

fromFiles(self, fileNames, sep='\t', missing='*', fileID=None, IDheader=False, IDindex=None)

source code 

Initializes the data set from a list of data flat files.

Parameters:
  • fileNames - list of data flat files
  • sep - separator string between values in flat files, tab is default
  • missing - symbol for missing data '*' is default
  • fileID - optional prefix for all features in the file
  • IDheader - flag whether the sample ID column has a header in the first line of the flat files
  • IDindex - index where the sample ids can be found, 0 by default

__str__(self)
(Informal representation operator)

source code 

String representation of the DataSet

Returns:
string representation

printClustering(self, c, col_width=None)

source code 

Pretty print of a clustering .

Parameters:
  • c - numpy array of integer cluster labels for each sample
  • col_width - column width in spaces (optional)

internalInit(self, m)

source code 

Initializes the internal representation of the data used by the EM algorithm .

Parameters:
  • m - MixtureModel object

getInternalFeature(self, i)

source code 

Returns the columns of self.internalData containing the data of the feature with index 'i'

Parameters:
  • i - feature index
Returns:
numpy containing the data of feature 'i'

removeFeatures(self, ids, silent=0)

source code 

Remove a list of features from the data set.

Parameters:
  • ids - list of feature identifiers
  • silent - verbosity control

removeSamples(self, ids, silent=0)

source code 

Remove a list of samples from the data set.

Parameters:
  • ids - list of sample identifiers
  • silent - verbosity control

filterSamples(self, fid, min_value, max_value)

source code 

Removes all samples with values < 'min_value' or > 'max_value' in feature 'fid'.

Parameters:
  • fid - feature ID in self.headers
  • min_value - minimal required value
  • max_value - maximal required value

maskDataSet(self, valueToMask, maskValue, silent=False)

source code 

Allows the masking of a value with another in the entire data matrix.

Parameters:
  • valueToMask - value to be masked
  • maskValue - value which is to be substituted
  • silent - verbosity control (False is default)

maskFeatures(self, headerList, valueToMask, maskValue)

source code 

Equivalent to maskDataSet but constrained to a subset of features

Parameters:
  • headerList - list of features IDs
  • valueToMask - value to be masked
  • maskValue - value which is to be substituted

getExternalFeature(self, fid)

source code 

Returns the external data representation of a given feature

Parameters:
  • fid - feature ID in self.headers
Returns:
list of data samples for feature fid

extractSubset(self, ids)

source code 

Remove all samples in 'ids' from 'self' and return a new DataSet initialised with these samples

Parameters:
  • ids - list of sample indices
Returns:
DataSet object containing the samples in ids

singleFeatureSubset(self, index)

source code 

Returns a DataSet for the feature with internal index 'index' in 'self'. For internal use.

Parameters:
  • index - feature index
Returns:
DataSet object

setMissingSymbols(self, findices, missing)

source code 

Assigns missing value placeholders to features.

Parameters:
  • findices - list of internal feature indices
  • missing - list of missing symbols/values

getMissingIndices(self, ind)

source code 

Get indices of missing values in one feature

Parameters:
  • ind - feature index
Returns:
list of indices of missing values

writeClusteringFasta(self, fn_pref, m)

source code 

Writes a clustering based on model 'm' into files in FASTA format. Note that this implies sequence data.

Parameters:
  • fn_pref - Filename prefix. The full name of each output file consists of the prefix, the cluster number and the extension .fa
  • m - MixtureModel object