Module bioMixture
[hide private]
[frames] | no frames]

Module bioMixture

source code

This file contains auxiliary functions for the analysis of biological sequences.

e.g. searching for transcription factor binding sites using mixtures of PWMs (positional weight matrices).

Functions [hide private]
 
readJASPAR(fileName)
Reads a flat file of JASPAR binding sites matrices.
source code
 
readFastaSequences(fileName, out_type='DataSet')
Reads a file in fasta format and returns the sequence in a DataSet object
source code
 
readSites(fileName)
Flat file parser for the JASPAR .sites format.
source code
 
readAlnData(fn, reg_str=None, out_type='DataSet')
Parses a CLUSTALW format .aln multiple alignment file and returns a mixture.DataSet object.
source code
 
getModel(G, p)
Constructs a PWM MixtureModel.
source code
 
getBayesModel(G, p, mixPrior=None)
Constructs a PWM CSI BayesMixtureModel.
source code
 
getBackgroundModel(p, dist=None)
Construct background model
source code
 
scanSequence(mix, bg, seq, scoring='mix')
Scores all positions of a sequence with the given model and background.
source code
Variables [hide private]
  AAlong = {'A': 'Ala', 'C': 'Cys', 'D': 'Asp', 'E': 'Glu', 'F':...
Function Details [hide private]

readJASPAR(fileName)

source code 

Reads a flat file of JASPAR binding sites matrices. JASPAR files are essentially fasta, but only upper case letters are part of the binding site proper. Lower case letters are discarded.

readFastaSequences(fileName, out_type='DataSet')

source code 

Reads a file in fasta format and returns the sequence in a DataSet object

Parameters:
  • fileName - Name of the input file
Returns:
list of sequence lists

readSites(fileName)

source code 

Flat file parser for the JASPAR .sites format. The files are essentially fasta but there is a count matrix at the end of the file.

Parameters:
  • fileName - File name of .sites file
Returns:
DataSet object

readAlnData(fn, reg_str=None, out_type='DataSet')

source code 

Parses a CLUSTALW format .aln multiple alignment file and returns a mixture.DataSet object.

Parameters:
  • reg_str - regular expression for sequence parsing
Returns:
DataSet object

getModel(G, p)

source code 

Constructs a PWM MixtureModel.

Parameters:
  • G - number of components
  • p - number of positions of the binding site
Returns:
MixtureModel object

getBayesModel(G, p, mixPrior=None)

source code 

Constructs a PWM CSI BayesMixtureModel.

Parameters:
  • G - number of components
  • p - number of positions of the binding site
Returns:
BayesMixtureModel object

getBackgroundModel(p, dist=None)

source code 

Construct background model

Parameters:
  • p - number of positions of the binding site
  • dist - background nucleotide frequencies, uniform is default
Returns:
MixtureModel representing the background

scanSequence(mix, bg, seq, scoring='mix')

source code 

Scores all positions of a sequence with the given model and background.

Parameters:
  • mix - MixtureModel object
  • bg - background MixtureModel object
  • seq - sequence as list of nucleotides
  • scoring - flag to determine the scoring scheme used for the mixtures. 'compmax' means maximum density over the components, 'mix' means true mixture density
Returns:
list of position-wise log-odd scores

Variables Details [hide private]

AAlong

Value:
{'A': 'Ala',
 'C': 'Cys',
 'D': 'Asp',
 'E': 'Glu',
 'F': 'Phe',
 'G': 'Gly',
 'H': 'His',
 'I': 'Ile',
...