Logo sequences

It is interesting to investigate the base bias in distinct reads from each annotation family, which might give hints about how these small RNAs are processed. In ncPRO-seq, we calculate the frequencies of bases at each position of distinct reads and plot them in http://www.bioconductor.org/packages/2.2/bioc/html/seqLogo.html sequence logos [2]. Sequence logos can be drawn with respect to 5' end or 3' end of reads depending on the choice of 5 or 3 in LOGO_DIRECTION option. Users can choose their favourite way to display sequence logos, either with uniform column heights, or with column heights proportional to informaion content. (IC_SCALE option).

For each annotation family, ncPRO-seq provides two types of sequence logos figures by using different subsets of distinct reads. In one figure, all distinct reads in annotation family are used to create sequence logos, which will give you the processing information of small RNAs like piRNAs that can be produced from various positions in a single region. In another figure, only the distinct read with the highest abundance in each family member is used, which is the case for small RNAs like miRNAs that are accurately processed by enzymes from some special loci.

Chongjian Chen 2012-01-26