clj-biosequence.core
->biosequenceIndex
(->biosequenceIndex index path)
Positional factory function for class clj_biosequence.core.biosequenceIndex.
->biosequenceIndexReader
(->biosequenceIndexReader index strm)
Positional factory function for class clj_biosequence.core.biosequenceIndexReader.
->bzipped
(->bzipped file)
Positional factory function for class clj_biosequence.core.bzipped.
->fastaFile
(->fastaFile file alphabet encoding)
Positional factory function for class clj_biosequence.core.fastaFile.
->fastaReader
(->fastaReader strm alphabet)
Positional factory function for class clj_biosequence.core.fastaReader.
->fastaSequence
(->fastaSequence acc description alphabet sequence)
Positional factory function for class clj_biosequence.core.fastaSequence.
->fastaString
(->fastaString str alphabet)
Positional factory function for class clj_biosequence.core.fastaString.
->gzipped
(->gzipped file)
Positional factory function for class clj_biosequence.core.gzipped.
->uncompressed
(->uncompressed file)
Positional factory function for class clj_biosequence.core.uncompressed.
->zipped
(->zipped file)
Positional factory function for class clj_biosequence.core.zipped.
bioReader
protocol
members
make-reader
(make-reader x opts)
bioreader
(bioreader x & opts)
Biosequence
protocol
members
alphabet
(alphabet this)
Returns the alphabet of a biosequence.
bs-seq
(bs-seq this)
Returns the sequence of a biosequence as a vector.
keywords
(keywords this)
Returns a collection of keywords.
moltype
(moltype this)
Returns the moltype of a biosequence.
protein?
(protein? this)
Returns true if a protein and false otherwise.
biosequence->file
(biosequence->file bs file & {:keys [append func], :or {append true, func fasta-string}})
Takes a collection of biosequences and prints them to file. To
append to an existing file use `:append true` and the `:func`
argument can be used to pass a function that will be used to prepare
the printed output, the default is `fasta-string` which will print
the biosequences to the file in fasta format. Returns the file.
biosequenceCitation
protocol
members
authors
(authors this)
Returns the authors from a citation object.
crossrefs
(crossrefs this)
Returns crossrefs - DOI, ISBN etc
journal
(journal this)
Returns the journal of a citation object.
pend
(pend this)
Returns the end page of a citation object.
pstart
(pstart this)
Returns the start page of a citation object.
pubmed
(pubmed this)
Returns the pubmed id of a reference if there is one.
title
(title this)
Returns the title of a citation object.
volume
(volume this)
Returns the volume of a citation object.
year
(year this)
Returns the year of a citation object.
biosequenceCitations
protocol
members
citation-key
(citation-key this)
Returns a citation key from a record.
citations
(citations this)
Returns a collection of references in a sequence record.
biosequenceDbRef
protocol
members
database-name
(database-name this)
Returns the name of the database.
db-properties
(db-properties this)
Returns properties of the reference.
object-id
(object-id this)
Returns the ID of an database object.
biosequenceDbRefs
protocol
members
get-db-refs
(get-db-refs this)
biosequenceDescription
protocol
members
description
(description this)
Returns the description of a biosequence object.
biosequenceEvidence
protocol
members
evidence
(evidence this)
Returns evidence records.
biosequenceFeature
protocol
members
operator
(operator this)
Returns an operator for dealing with intervals.
biosequenceFeatures
protocol
members
feature-seq
(feature-seq this)
Returns a lazy list of features in a sequence.
filter-features
(filter-features this name)
Returns a list of features that return 'name' when called using
bs/bs-name from biosequenceName protocol.
biosequenceFile
protocol
members
bs-path
(bs-path this)
Returns the path of the file as string.
biosequenceGene
protocol
members
locus-tag
(locus-tag this)
map-location
(map-location this)
Returns the map location.
orf
(orf this)
ORF associated with the gene.
biosequenceGenes
protocol
members
genes
(genes this)
Returns sub-seq gene records.
biosequenceGoTerm
protocol
members
go-component
(go-component this)
The GO component, molecular function etc.
biosequenceGoTerms
protocol
biosequenceID
protocol
members
accession
(accession this)
Returns the accession of a biosequence object.
accessions
(accessions this)
Returns a list of accessions for a biosequence object.
creation-date
(creation-date this)
Returns a java date object.
discontinue-date
(discontinue-date this)
update-date
(update-date this)
Returns a java date object.
version
(version this)
Returns the version of the accession nil if none.
biosequenceInterval
protocol
members
comp?
(comp? this)
Is the interval complementary to the biosequence
sequence. Boolean
end
(end this)
Returns the end position of an interval as an integer.
point
(point this)
Returns a point interval.
start
(start this)
Returns the start position of an interval as an integer.
biosequenceIntervals
protocol
members
intervals
(intervals this)
Returns a list of intervals.
biosequenceIO
protocol
members
bs-reader
(bs-reader this)
Returns a reader for a file containing biosequences. Use with
`with-open'
biosequenceName
protocol
members
allergen-names
(allergen-names this)
Returns the allergen names.
alternate-names
(alternate-names this)
Returns the alternate names.
biotech-names
(biotech-names this)
Returns the biotech names.
cd-antigen-names
(cd-antigen-names this)
names
(names this)
Returns the default names of a record.
submitted-names
(submitted-names this)
Returns the submitted names.
biosequenceNameObject
protocol
members
obj-description
(obj-description this)
obj-heading
(obj-heading this)
obj-label
(obj-label this)
obj-value
(obj-value this)
biosequenceParameters
protocol
members
parameters
(parameters this)
Returns parameters from a reader.
biosequenceProtein
protocol
members
activities
(activities this)
Returns a lit of activities.
calc-mol-wt
(calc-mol-wt this)
The calculated molecular weight.
ecs
(ecs this)
Returns list of E.C numbers.
processed
(processed this)
Processing of the protein.
biosequenceProteins
protocol
members
proteins
(proteins this)
Returns protein sub-seq records.
biosequenceReader
protocol
members
biosequence-seq
(biosequence-seq this)
Returns a lazy sequence of biosequence objects.
get-biosequence
(get-biosequence this accession)
Returns the biosequence object with the corresponding
accession.
biosequenceStatus
protocol
biosequenceSubCellLoc
protocol
members
subcell-location
(subcell-location this)
subcell-orient
(subcell-orient this)
subcell-topol
(subcell-topol this)
biosequenceSubCellLocs
protocol
members
subcell-locations
(subcell-locations this)
biosequenceSummary
protocol
members
summary
(summary this)
Returns the summary of a sequence.
biosequenceSynonyms
protocol
members
synonyms
(synonyms this)
Returns a list of synonyms.
biosequenceTaxonomies
protocol
members
tax-refs
(tax-refs src)
Returns taxonomy records.
biosequenceTaxonomy
protocol
members
common-name
(common-name this)
lineage
(lineage this)
Returns a lineage string.
mods
(mods this)
Returns a list of modifications to taxonomy.
tax-name
(tax-name this)
Returns the taxonomic name.
biosequenceTranslation
protocol
members
codon-start
(codon-start this)
frame
(frame this)
Returns the frame a sequence should be translated in.
trans-table
(trans-table this)
Returns the translation table code to be used.
translation
(translation this)
Returns the translation of a sequence.
biosequenceUrl
protocol
members
anchor
(anchor this)
Text to show as highlight
post-text
(post-text this)
biosequenceVariant
protocol
clean-sequence
(clean-sequence s a)
Removes spaces and newlines and checks that all characters are
legal characters for the supplied alphabet. Replaces non-valid
characters with \X. If `a' is not a defined alphabet throws an
exception.
default-biosequence-biosequence
Default implementation of Biosequence protocol.
default-biosequence-citation
Default implementation of biosequenceCitation protocol.
default-biosequence-citations
Default implementation of biosequenceCitations protocol.
default-biosequence-dbref
Default implementation of biosequenceDbRef protocol.
default-biosequence-dbrefs
Default implementation of biosequenceDbRefs protocol.
default-biosequence-description
Default implementation of biosequenceDescription protocol.
default-biosequence-evidence
default-biosequence-feature
Default implementation of biosequenceFeature protocol.
default-biosequence-features
Default implementation of biosequenceFeatures protocol.
default-biosequence-file
Default implementation of biosequenceFile protocol.
default-biosequence-gene
Default implementation of biosequenceGene protocol.
default-biosequence-genes
Default implementation of biosequenceGenes protocol.
default-biosequence-goterm
Default implementation of biosequenceGoTerm protocol.
default-biosequence-goterms
Default implementation of biosequenceGoTerms protocol.
default-biosequence-id
Default implementation of biosequenceID protocol.
default-biosequence-interval
Default implementation of biosequenceInterval protocol.
default-biosequence-intervals
Default implementation of biosequenceIntervals protocol.
default-biosequence-name
Default implementation of biosequenceName protocol.
default-biosequence-nameobject
Default implementation of biosequenceNameObject protocol.
default-biosequence-notes
Default implementation of biosequenceNotes protocol.
default-biosequence-protein
Default implementation of biosequenceProtein protocol.
default-biosequence-proteins
Default implementation of biosequenceProteins protocol.
default-biosequence-status
Default implementation of biosequenceStatus protocol.
default-biosequence-subcell
Default implementation of biosequenceSubCellLoc protocol.
default-biosequence-subcells
Default implementation of biosequenceSubCellLocs protocol.
default-biosequence-summary
Default implementation of biosequenceSummary protocol.
default-biosequence-synonyms
Default implementation of biosequenceSynonyms protocol.
default-biosequence-tax
Default implementation of biosequenceTaxonomy protocol.
default-biosequence-taxonomies
Default implementation of biosequenceTaxonomies protocol.
default-biosequence-translation
Default implementation of biosequenceTranslation protocol.
default-biosequence-url
Default implementation of biosequenceUrl protocol.
default-biosequence-variant
Default implementation of biosequenceVariant protocol.
delete-indexed-biosequence
(delete-indexed-biosequence index-file)
fasta-string
(fasta-string bioseq)
fastaReduce
protocol
members
fasta-reduce
(fasta-reduce this func fold)
Applies a function to sequence data streamed line-by-line and
reduces the results using the supplied `fold` function. Uses the
core reducers library so the fold function needs to have an
'identity' value that is returned when the function is called with
no arguments.
get-feature-sequence
(get-feature-sequence feature bs)
Returns a fastaSequence object containing the sequence specified in
a feature object from a biosequence.
get-interval-sequence
(get-interval-sequence interval bs)
Returns a fasta sequence corresponding to the provided interval.
get-list
macro
(get-list obj & keys)
Low level macro for retrieving data from xml elements.
get-one
macro
(get-one obj & keys)
Low level macro for retrieving data from xml elements.
get-text
macro
(get-text obj & keys)
Low level macro for retrieving data from xml elements.
id-convert
(id-convert ids from to email)
Takes a list of accessions and returns a hash-map mapping them to
accessions from another database. If nothing found returns an empty
hash-map and only returns entries that had a match. Uses the
Uniprot id mapping utility and a list of supported databases is
supplied at http://www.uniprot.org/faq/28#id_mapping_examples. Some
common mappings include:
DB Name Abbreviation Direction
UniProtKB AC/ID ACC+ID from
UniProtKB AC ACC to
EMBL/GenBank/DDBJ EMBL_ID both
EMBL/GenBank/DDBJ CDS EMBL both
Entrez Gene (GeneID) P_ENTREZGENEID both
GI number P_GI both
RefSeq Protein P_REFSEQ_AC both
RefSeq Nucleotide REFSEQ_NT_ID both
WormBase WORMBASE_ID both
There is a 100,000 limit on accessions in a single query imposed by
Uniprot.
index-biosequence-file
(index-biosequence-file file & {:keys [func], :or {func accession}})
init-fasta-file
(init-fasta-file path alphabet & opts)
Initialises fasta protein file. Accession numbers and description
are processed by splitting the string on the first space, the
accession being the first value and description the second. Encoding
can be specified using the :encoding keyword, defaults to UTF-8.
init-fasta-reader
(init-fasta-reader strm alphabet)
init-fasta-sequence
(init-fasta-sequence accession description alphabet sequence)
Returns a new fastaSequence. Currently :iupacNucleicAcids
and :iupacAminoAcids are supported alphabets.
init-fasta-string
(init-fasta-string str alphabet)
Initialises a fasta string. Accession numbers and description are
processed by splitting the string on the first space, the accession
being the first value and description the second.
load-biosequence-index
(load-biosequence-index path)
make-date
(make-date str f)
map->biosequenceIndex
(map->biosequenceIndex m__5869__auto__)
Factory function for class clj_biosequence.core.biosequenceIndex, taking a map of keywords to field values.
map->biosequenceIndexReader
(map->biosequenceIndexReader m__5869__auto__)
Factory function for class clj_biosequence.core.biosequenceIndexReader, taking a map of keywords to field values.
map->bzipped
(map->bzipped m__5869__auto__)
Factory function for class clj_biosequence.core.bzipped, taking a map of keywords to field values.
map->fastaFile
(map->fastaFile m__5869__auto__)
Factory function for class clj_biosequence.core.fastaFile, taking a map of keywords to field values.
map->fastaReader
(map->fastaReader m__5869__auto__)
Factory function for class clj_biosequence.core.fastaReader, taking a map of keywords to field values.
map->fastaSequence
(map->fastaSequence m__5869__auto__)
Factory function for class clj_biosequence.core.fastaSequence, taking a map of keywords to field values.
map->fastaString
(map->fastaString m__5869__auto__)
Factory function for class clj_biosequence.core.fastaString, taking a map of keywords to field values.
map->gzipped
(map->gzipped m__5869__auto__)
Factory function for class clj_biosequence.core.gzipped, taking a map of keywords to field values.
map->uncompressed
(map->uncompressed m__5869__auto__)
Factory function for class clj_biosequence.core.uncompressed, taking a map of keywords to field values.
map->zipped
(map->zipped m__5869__auto__)
Factory function for class clj_biosequence.core.zipped, taking a map of keywords to field values.
n50
(n50 reader)
Takes anything that can have `biosequence-seq' called on it and
returns the N50 of the sequences therein.
object->file
(object->file obj file)
Spits an object to file after making sure *print-length* is
temporarily set to false.
post-req
(post-req a param)
protein-charge
(protein-charge p & {:keys [ph disulfides], :or {ph 7, disulfides 0}})
Calculates the theoretical protein charge at the specified
pH (default 7). Uses pKa values set out in the protein alphabets
from cl-biosequence.alphabet. Considers Lys, His, Arg, Glu, Asp, Tyr
and Cys residues only and ignores all other amino acids. The number
of disulfides can be specified and 2 times this figure will be
deducted from the number of Cys residues used in the calculation.
Values used for the pKa of the N-term and C-term are 9.69 and 2.34
respectively.
reverse-comp
(reverse-comp this)
Returns a new fastaSequence with the reverse complement sequence.
reverse-seq
(reverse-seq this)
Returns a new fastaSequence with the reverse sequence.
set-bioseq-proxy!
(set-bioseq-proxy! params)
six-frame-translation
(six-frame-translation nucleotide)
(six-frame-translation nucleotide table)
Returns a lazy list of fastaSequence objects representing
translations of a nucleotide biosequence object in six frames.
sub-bioseq
(sub-bioseq bs beg)
(sub-bioseq bs beg end)
Returns a new fasta sequence object with the sequence corresponding
to 'beg' (inclusive) and 'end' (exclusive) of 'bs'. If no 'end'
argument returns from 'start' to the end of the sequence. Zero
based index.
translate
(translate bs frame & {:keys [table id-alter], :or {table (ala/codon-tables 1), id-alter true}})
Returns a fastaSequence sequence representing the translation of
the specified biosequence in the specified frame.