Configuration file

ncPRO-seq is a flexible pipeline which allows users to specify different options at each analysis stage, from raw reads processing to ways to generate results. In the web interface version, all options can be easily chosen through the web (see 3.2). In command line version (see 3.3), users can manually edit the config-ncrna.txt file to define options according to the descriptions of options below (Table 3) and also inside the file. In the config-ncrna.txt file, you may find more options than that in the web page, especially for the Bowtie mapping step, but we do not suggest you to make any changes in these extra options unless you are an expert.

1whitegray!20
Table 2: Options from the configuration file
gray Options Description
LOGFILE File that lists actions that have occurred during the analysis
N_CPU Number of CPU used by Bowtie to do mapping
FASTQ_FORMAT The quality score format of the fastq reads. Three formats are supported: phred33 (Sanger, Solexa version 1.8 or later), solexa (Solexa prior to version 1.3), solexa1.3 (Solexa version 1.3 to 1.7)
BOWTIE_GENOME_REFERENCE Basename of the Bowtie index genome reference file (base space). See the http://bowtie-bio.sourceforge.net/manual.shtml Bowtie manual for additional informations
BOWTIE_GENOME_REFERENCE_CS Basename of the Bowtie index genome reference file (color space). See the http://bowtie-bio.sourceforge.net/manual.shtml Bowtie manual for additional informations
BOWTIE_GENOME_OPTIONS_FQ Options for Bowtie to map base space reads in fastq format (Solexa)
BOWTIE_GENOME_OPTIONS_FA Options for Bowtie to map base space reads in fasta format (454)
BOWTIE_GENOME_OPTIONS_CS Options for Bowtie to map color space reads (SOLiD)
GROUP_READ Group reads based on their sequence for raw reads before mapping or read alignments in bam file depending on the input format. 1: Yes; 0: No; 2: for the online version where the input files have already been grouped using our provided scripts
ORGANISM Name of the reference organism. Must be the same as the organism available in the annotation folder (i.e. mm9, hg19, ...)
MATURE_MIRNA Annotation against miRNAs from miRBase. Both miRNA with and without an extended item are acceptable (see 5.4.2)
PRECURSOR_MIRNA Annotation against pre-miRNAs from miRBase. Both miRNA with and without an extended item are acceptable (see 5.4.2)
NCRNA_RFAM List of the RFAM ncRNA(s) to focus on (comma separator) - no extension parameter
NCRNA_RFAM_EX List of the RFAM ncRNA(s) to focus on (comma separator) - extension parameter (see 5.4.2)
NCRNA_RMSK List of the repetitive elements to focus on (comma separator) - no extension parameter
NCRNA_RMSK_EX List of the repetitive elements to focus on (comma separator) - extension parameter (see 5.4.2)
TRNA_UCSC Mapping against tRNA sequences. Both tRNA with and without an extended item are acceptable (see 5.4.2)
OTHER_NCRNA_GFF List of custom gff files to intersect with the mapped reads
LOGO_DIRECTION Align the sequence on the 5' or 3' end [5/3]
IC_SCALE Use the information content scale for Logo outputs. 1: Yes; 0: No
GENOME_TRACK_OPTIONS Options to select reads mapped in the genome to generate track file. Four options should be provided to filter reads, and separated by comma. min_len=N : the minimum length (N) of read; max_len=N: the maximum length (N) of read; min_copy=N : the minimum number (N) of matches in the genome; max_copy=N: the maximum number (N) of matches in the genome. To have more than one type of track, different sets of options should be separated by pipe ($ \vert$)
SIG_READ_OPTIONS Options to select mapped reads for enrichment analysis (see 5.5). Please refer to the format of GENOME_TRACK_OPTIONS
SIG_WIN_SIZE The window size used to scan the genome (e.g. 10000) (see 5.5)
SIG_STEP_SIZE The step size (e.g. 50000) (see 5.5)
EXCLUDE_ANN_GFF List of annotation files (gff3). Only reads which are not mapped in these annotated regions are kept for enrichment analysis (see 5.5)
FIT_MODEL The model used to fit window-based read distribution. Three models can be chosen: NB.ML, NB.012, and Poisson (see 5.5)
PVAL_CUTOFF The cut-off used to get regions significantly enriched with reads
 

Nicolas Servant 2012-05-31