================================================================================
    PeakSplitter Overview 
================================================================================

////////////////////////////////////////////////////////////////////////////////
INSTALLATION
////////////////////////////////////////////////////////////////////////////////

This folder contains the executable jar file PeakSplitter.jar and the archive
PeakSplitter.src.zip, which contains source directories and the ant build file.
You can move the "PeakSplitter.jar" exe file to anywhere in your file system and 
set the PATH to this location.

////////////////////////////////////////////////////////////////////////////////
USAGE
////////////////////////////////////////////////////////////////////////////////

You should have Java 1.5 or later installed.
In order to launch the program, open a terminal window, go to the folder
where the jar file is located, and type

java -jar PeakSplitter.jar <-p peakfile> <-w wig file/folder> <-o output folder> [options]

Options include:

-help,-?                        displays help information
-p,--peakFile <string>          input peak file
-w,--wigFile <string>           input wig file or folder
-o,--outDir <string>            output folder
-x,--prefix <string>            string to add to output file names
-c,--cutoff <decimal integer>   height cutoff (default 5)
-v,--valley <float>             float value to determine the valley depth required for peak separation (default 0.6)
-f,--fetch <boolean>            whether to fetch subpeaks sequences or not (default true)
-u,--url <string>               Das url (default is for human "http://www.ensembl.org/das/Homo_sapiens.GRCh37.reference"
-n,--numSeq <decimal integer>   number of best peak sequences to fetch (default 300)
-l,--length <decimal integer>   length of sequence to fetch (default 60)


*** -p/ --peakFile 
This is a REQUIRED parameter for PeakSplitter. 
The file lists the genomic coordinates output by a peak calling program 
(or obtained in some other way). The format should be tab/space delimited, 
where each locus is described by its "chromosome", "start" and "end" location.
This file should be sorted by chromosome and start position.
PLEASE REMOVE ANY HEADER LINES FROM THE FILE IF THESE ARE PRESENT

*** -w/--wigFile
This is a REQUIRED parameter for PeakSplitter.
This can be a wig file OR a wig folder that contains one wig file for each chromosome, 
where the wig file describes the signals (usually number of reads) along the genome and
are created by the peak-calling program that generated the peak file. 
PeakSplitter supports wig files in VariableStep or Bedgraph formats. 
The wig header lines, "track type" and "variableStep" (when using VariableStep format)
are required. 
The files can be zipped or gzipped, so it's not necessary to uncompress them.

wig file names for each chromosme (under wig folder) should contain the word "chr" + 
chromosome number, for example "my.chr12.wig".

*** -o/--outDir
This is a REQUIRED parameter for PeakSplitter.
An output directory must be specified where PeakSplitter can write the result files. 

*** -x/--prefix
string to add to output file names, for example when the same peak files are to be 
analyzed using different parameters.

*** -c/--cutoff
Height cutoff (default 5). Only subpeaks with at least this number of reads in 
their summit region will be reported.

*** -v/--valley 
Real value indicating the valley depth required for peak separation (default 0.6). 
Local maxima regions are found within each peak and the height of neighboring local 
maxima are compared. The lowest value is multiplied by the valley real-valued number 
to yield the minimum depth required to separate the two peaks. 
For example, a value of 0.5 means that the height of the valley should be less than 
half the height of its summits in order for them to be separated.

*** -f/--fetch
By default, PeakSplitter will fetch the subpeak sequences near their summit region. 
In order to turn this feature off set it to be
-f false

*** -u/--url
The sequences are exported directly from the DAS Ensembl database.
The user has to specify the DAS URL for the organism of interest.
The DAS URL for the human database is "http://www.ensembl.org/das/Homo_sapiens.GRCh37.reference", 
and URLs for other organisms can be found at: "http://www.ensembl.org/das/dsn"

*** -n/--numSeq
Number of best subpeak sequences to fetch (those with the highest numbers of reads 
in their summit region). These sequences can be used as input for motif prediction 
tools such as MEME. 
The default number is 300. This is the maximum number of sequences the web-based 
version of MEME will accept (more sequences can be input when run locally).

*** -l/--length
Length of sequence to fetch (default 60)
The sequences are retrieved near the summit region. 
If the length is 60, 30 bp will be included upstream to the peak summit position, 
and 30 downstream. The total sequence length will be 61.

////////////////////////////////////////////////////////////////////////////////
OUTPUT FILES
////////////////////////////////////////////////////////////////////////////////

If the -x parameter is specified, all output file names will start with the 
base string following the -x flag.

1. peakFileName.subpeaks.inputFileNameSuffix
For example, if the input peak file is "myPeaks.test", 
the output file will be "myPeaks.subpeaks.test"

This is a tab-delimited file which contains information about subpeaks, including:
a. Chromosome name
b. Start position of the subpeak
c. End position of the subpeak 
d. Number of reads in the peak summit position 
e. Subpeak summit position relative to the start position of the subpeak region.

2. peakFileName(without suffix).bestSubpeaks.fa
For example, if the input peak file is "myPeaks.test", the output file will be 
"myPeaks.bestSubpeaks.fa

This is a fasta file, containing the sequences of the best subpeaks 
(those with highest numbers of reads in their summit position).