Configuration file

ncPRO-seq is a flexible pipeline which allows users to specify different options at each analysis stage, from raw reads processing to ways to generate results. In the web interface version, all options can be easily chosen through the web (see 3.2). In command line version (see 3.3), users can manually edit the config-ncrna.txt file to define options according to the descriptions of options below (Table 2) and also inside the file. In the config-ncrna.txt file, you may find more options than that in the web page, especially for the Bowtie mapping step, but we do not suggest you to make any changes in these extra options unless you are an expert.

1whitegray!20
Table 1: Options from the configuration file
gray Options Description
LOGFILE File that lists actions that have occurred during the analysis
N_CPU Number of CPU used by Bowtie to do mapping
FASTQ_FORMAT The quality score format of the fastq reads. Three formats are supported: phred33 (Sanger, Solexa version 1.8 or later), solexa (Solexa prior to version 1.3), solexa1.3 (Solexa version 1.3 to 1.7)
BOWTIE_GENOME_REFERENCE Basename of the Bowtie index genome reference file (base space). See the http://bowtie-bio.sourceforge.net/manual.shtml Bowtie manual for additional informations
BOWTIE_GENOME_REFERENCE_CS Basename of the Bowtie index genome reference file (color space). See the http://bowtie-bio.sourceforge.net/manual.shtml Bowtie manual for additional informations
BOWTIE_GENOME_OPTIONS_FQ Options for Bowtie to map base space reads in fastq format (Solexa)
BOWTIE_GENOME_OPTIONS_FA Options for Bowtie to map base space reads in fasta format (454)
BOWTIE_GENOME_OPTIONS_CS Options for Bowtie to map color space reads (SOLiD)
GROUP_READ Group reads based on their sequence for raw reads before mapping or read alignments in bam file depending on the input format. 1: Yes; 0: No
MATURE_MIRNA Annotation against miRNAs from miRBase. Both miRNA with and without an extended item are acceptable (see 5.5.2)
PRECURSOR_MIRNA Annotation against pre-miRNAs from miRBase. Both miRNA with and without an extended item are acceptable (see 5.5.2)
NCRNA_RFAM List of the RFAM ncRNA(s) to focus on (comma separator) - no extension parameter
NCRNA_RFAM_EX List of the RFAM ncRNA(s) to focus on (comma separator) - extension parameter (see 5.5.2)
NCRNA_RMSK List of the repetitive elements to focus on (comma separator) - no extension parameter
NCRNA_RMSK_EX List of the repetitive elements to focus on (comma separator) - extension parameter (see 5.5.2)
TRNA_UCSC Mapping against tRNA sequences. Both tRNA with and without an extended item are acceptable (see 5.5.2)
OTHER_NCRNA_GFF List of custom gff files to intersect with the mapped reads
LOGO_DIRECTION Align the sequence on the 5' or 3' end [5/3]
IC_SCALE Use the information content scale for Logo outputs. 1: Yes; 0: No
GENOME_TRACK_OPTIONS Options to select reads mapped in the genome to generate track file. Four options should be provided to filter reads, and separated by comma. min_len=N : the minimum length (N) of read; max_len=N: the maximum length (N) of read; min_copy=N : the minimum number (N) of matches in the genome; max_copy=N: the maximum number (N) of matches in the genome. To have more than one type of track, different sets of options should be separated by pipe ($ \vert$)
SIG_READ_OPTIONS Options to select mapped reads for enrichment analysis (see 5.6). Please refer to the format of GENOME_TRACK_OPTIONS
SIG_WIN_SIZE The window size used to scan the genome (e.g. 10000) (see 5.6)
SIG_STEP_SIZE The step size (e.g. 50000) (see 5.6)
EXCLUDE_ANN_GFF List of annotation files (gff3). Only reads which are not mapped in these annotated regions are kept for enrichment analysis (see 5.6)
FIT_MODEL The model used to fit window-based read distribution. Three models can be chosen: NB.ML, NB.012, and Poisson (see 5.6)
PVAL_CUTOFF The cut-off used to get regions significantly enriched with reads
GENOME_DESC_FILE Genome description file containing the size information of all chromosomes. Please refer to the faidx function in http://samtools.sourceforge.net/samtools.shtml SAMtools to create this file
RFAM_GFF Annotation of ncRNAs downloaded from the http://rfam.sanger.ac.uk/ RFAM database (gff3) (see 6.1)
RMSK_GFF Annotation of repeat sequences created from the RepeatMasker results (gff3 with special attributes) (see 6.2)
PRECURSOR_MIRNA_GF Annotation of precursor miRNAs downloaded from the http://mirbase.org miRbase database (gff3)
MATURE_MIRNA_GFF Annotation of mature miRNAs created using files from the miRbase database (gff3) (see 6.3)
TRNA_GFF Annotation of tRNAs (gff3 with special attributes) (see 6.4)
PROTEIN_GENE_GFF Annotation of coding sequences (gff3)
 

Chongjian Chen 2012-01-26