================================================================================ PeakSplitter Overview ================================================================================ //////////////////////////////////////////////////////////////////////////////// INSTALLATION //////////////////////////////////////////////////////////////////////////////// This folder contains the executable jar file PeakSplitter.jar and the archive PeakSplitter.src.zip, which contains source directories and the ant build file. You can move the "PeakSplitter.jar" exe file to anywhere in your file system and set the PATH to this location. //////////////////////////////////////////////////////////////////////////////// USAGE //////////////////////////////////////////////////////////////////////////////// You should have Java 1.5 or later installed. In order to launch the program, open a terminal window, go to the folder where the jar file is located, and type java -jar PeakSplitter.jar <-p peakfile> <-w wig file/folder> <-o output folder> [options] Options include: -help,-? displays help information -p,--peakFile input peak file -w,--wigFile input wig file or folder -o,--outDir output folder -x,--prefix string to add to output file names -c,--cutoff height cutoff (default 5) -v,--valley float value to determine the valley depth required for peak separation (default 0.6) -f,--fetch whether to fetch subpeaks sequences or not (default true) -u,--url Das url (default is for human "http://www.ensembl.org/das/Homo_sapiens.GRCh37.reference" -n,--numSeq number of best peak sequences to fetch (default 300) -l,--length length of sequence to fetch (default 60) *** -p/ --peakFile This is a REQUIRED parameter for PeakSplitter. The file lists the genomic coordinates output by a peak calling program (or obtained in some other way). The format should be tab/space delimited, where each locus is described by its "chromosome", "start" and "end" location. This file should be sorted by chromosome and start position. PLEASE REMOVE ANY HEADER LINES FROM THE FILE IF THESE ARE PRESENT *** -w/--wigFile This is a REQUIRED parameter for PeakSplitter. This can be a wig file OR a wig folder that contains one wig file for each chromosome, where the wig file describes the signals (usually number of reads) along the genome and are created by the peak-calling program that generated the peak file. PeakSplitter supports wig files in VariableStep or Bedgraph formats. The wig header lines, "track type" and "variableStep" (when using VariableStep format) are required. The files can be zipped or gzipped, so it's not necessary to uncompress them. wig file names for each chromosme (under wig folder) should contain the word "chr" + chromosome number, for example "my.chr12.wig". *** -o/--outDir This is a REQUIRED parameter for PeakSplitter. An output directory must be specified where PeakSplitter can write the result files. *** -x/--prefix string to add to output file names, for example when the same peak files are to be analyzed using different parameters. *** -c/--cutoff Height cutoff (default 5). Only subpeaks with at least this number of reads in their summit region will be reported. *** -v/--valley Real value indicating the valley depth required for peak separation (default 0.6). Local maxima regions are found within each peak and the height of neighboring local maxima are compared. The lowest value is multiplied by the valley real-valued number to yield the minimum depth required to separate the two peaks. For example, a value of 0.5 means that the height of the valley should be less than half the height of its summits in order for them to be separated. *** -f/--fetch By default, PeakSplitter will fetch the subpeak sequences near their summit region. In order to turn this feature off set it to be -f false *** -u/--url The sequences are exported directly from the DAS Ensembl database. The user has to specify the DAS URL for the organism of interest. The DAS URL for the human database is "http://www.ensembl.org/das/Homo_sapiens.GRCh37.reference", and URLs for other organisms can be found at: "http://www.ensembl.org/das/dsn" *** -n/--numSeq Number of best subpeak sequences to fetch (those with the highest numbers of reads in their summit region). These sequences can be used as input for motif prediction tools such as MEME. The default number is 300. This is the maximum number of sequences the web-based version of MEME will accept (more sequences can be input when run locally). *** -l/--length Length of sequence to fetch (default 60) The sequences are retrieved near the summit region. If the length is 60, 30 bp will be included upstream to the peak summit position, and 30 downstream. The total sequence length will be 61. //////////////////////////////////////////////////////////////////////////////// OUTPUT FILES //////////////////////////////////////////////////////////////////////////////// If the -x parameter is specified, all output file names will start with the base string following the -x flag. 1. peakFileName.subpeaks.inputFileNameSuffix For example, if the input peak file is "myPeaks.test", the output file will be "myPeaks.subpeaks.test" This is a tab-delimited file which contains information about subpeaks, including: a. Chromosome name b. Start position of the subpeak c. End position of the subpeak d. Number of reads in the peak summit position e. Subpeak summit position relative to the start position of the subpeak region. 2. peakFileName(without suffix).bestSubpeaks.fa For example, if the input peak file is "myPeaks.test", the output file will be "myPeaks.bestSubpeaks.fa This is a fasta file, containing the sequences of the best subpeaks (those with highest numbers of reads in their summit position).