View on GitHub

Clj-biosequence

A Clojure library to make the manipulation of biological sequence data easier.

Download this project as a .zip file Download this project as a tar.gz file

Clojure for biology.

clj-biosequence is a library designed to make working with biological sequence data easier. It provides accessors for common file types and wrappers for various programs so that biological data can be easily accessed and manipulated in a 'clojure-y' style. Basic functionality includes:

Parses and accessors for Genbank, Uniprot XML, fasta and fastq formats.
A wrapper for BLAST.
A wrapper for signalP.
A wrapper for TMHMM.
Indexing of files for random access.
Mechanisms for lazy processing of sequences from very large files.
Interfaces for search and retrieval of sequences from online databases.
Translation functions for DNA and RNA sequences.
ID mapping functionality using the Uniprot's ID mapping tool.

clj-biosequence is under heavy development and other file formats and wrappers will be supported in the near future. Community contributions are welcome.

Getting started.

If you have never used Clojure click here for a guide to getting up and running in Clojure.

If you already use Clojure find the clj-biosequence API docs here and the projects github page has an in-depth tutorial.

clj-biosequence is available from Clojars. For the current version add the following to your project.clj file:


[clj-biosequence "0.1.4-SNAPSHOT"]

To use, import the clj-biosequence.core namespace which contains basic functionality and an interface to fasta files. For other uses import, depending on your needs, any of the following namespaces:


(ns my-app.core
  (:require [clj-biosequence.core :as cbs] ;; for base functionality and fasta
            [clj-biosequence.uniprot :as up] ;; for Uniprot functionality
            [clj-biosequence.genbank :as gb] ;; for Genbank functionality
            [clj-biosequence.blast :as bl] ;; for BLAST functionality
            [clj-biosequence.fastq :as fq] ;; for fastq functionality
            [clj-biosequence.index :as ind] ;; for indexing functionality
            [clj-biosequence.interproscan :as ips] ;; for interproscan functionality
            [clj-biosequence.signalp :as sp] ;; for a wrapper for signalp
            [clj-biosequence.tmhmm :as tm] ;; for a wrapper for TMHMM))