********************************************************************* ********************************************************************* PoreLogo 1.0 M. Pellegrini-Calace, R. Oliva & J.M. Thornton EMBL/EBI, The Wellcome Trust Genome Campus Hinxton, Cambridge CB10 1SD United Kingdom Contacts: Dr. Marialuisa Pellegrini-Calace marial@ebi.ac.uk Dr. Romina Oliva oliva@uniparthenope.it ********************************************************************* ********************************************************************* PoreLogo is an automated tool for the visualization of sequence and conservation of pore-lining residues from channel proteins. Sequence logos of pore-lining residues are built using Weblogo3.0 (http://weblogo.threeplusone.com/) and show amino acids ordered according their appearence along the pore and not according their positions in the protein sequence. Positions of pore-lining residues along the pore are assumed to depend on the coordinate of the corresponding beta-carbons (alpha-carbons for glycines) along the pore-axis. Pore-lining residues and corresponding pore-axis coordinates can be calculated using the program PoreWalker (http://www.ebi.ac.uk/thornton-srv/software/PoreWalker). In this case, the pore axis corresponds to the x-axis of the current Cartesian system. Three types of logos can be built by PoreLogo: (1) Porelogos showing pore-lining residues from ALL protein chains contributing amino acids to the channel (suitable for monomeric or hetero-oligomeric channel proteins); (2) Porelogos showing pore-lining residues from ONE protein chain contributing amino acids to the channel (suitable for monomeric or homo-oligomeric channel proteins); (3) Porelogos showing sequence and conservation of pore-lining residues from ONE protein chain contributing amino acids to the channel (suitable for all type of channel proteins). Logos of type (1) and (2) are generated using the uploaded list of pore-lining residues only. Logos of type (3) require a PDB file and optionally a multiple sequence alignment files. Logos of type (3) are generated as following: (a) the PDB file is converted into a FASTA file of the corresponding sequence file; (b) the FASTA file is used as query for a strict BLAST search (E-value < 10-6) and BLAST hits are considered as the "family" of the channel proteins and aligned by ClustalW; (c) the uploaded list of pore-lining residues and the family multiple alignment are combined to generate a pore-lining positions multiple alignments, where residues are ordered according to their position along the pore and not according to their position in the protein sequence; (d) the pore-lining positions multiple alignment is used as input for Weblogo3.0 and the corresponding logo is generated. If a multiple sequence alignment is provided, then only steps (c) and (d) are performed. PoreLogo is available as web-server at the following URL address: http://www.ebi.ac.uk/thornton-srv/software/PoreLogo/ ========= INPUTS ========= PoreLogo inputs are of 3 types: .................................................. . . . TYPE (1). LIST-FILES OF PORE-LINING RESIDUES . . . .................................................. The list file needs include, for each amino acid, the residue name, number, chain and x-coordinate of the corresponding beta-carbon and should be in a PoreWalker-like format. Please be aware that the list-file must contain at least 5 pore-lining residues. ****************************************************************************** How to prepare a pore-lining residues text file to use as input for PoreLogo. (Please remember that your protein should be PDB-known). ****************************************************************************** The list-file needs to include all the pore-lining residues in the following PoreWalker-like format. For each amino acid, the residue name, number, chain and coordinate along the pore-axis of the corresponding beta-carbon (alpha for glycines) should be reported, in a single line, separeted by tabs, as follows: ResName(tab)ResNumber(tab)(tab)Chain(tab)Poreaxis-coord(CB) For example: GLU 51 C 22.229000 Please note that the pore-axis coordinate can be given in any number of digits! Below you can see a full PoreWalker example (first line is ignored by PoreLogo): AA AAnumber Chain Xcoord (CB) GLU 51 C 22.229000 GLY 53 C 27.319000 ALA 54 C 26.131001 GLN 58 C 25.860001 ARG 64 C 21.283001 THR 72 C 12.296000 ALA 73 C 7.595000 THR 74 C 8.131000 THR 75 C 8.199000 VAL 76 C 12.149000 GLY 77 C 14.908000 TYR 78 C 18.313999 GLY 79 C 20.378000 ASP 80 C 22.013000 TYR 82 C 19.860001 VAL 84 C 23.972000 MET 96 C 8.374000 VAL 97 C 4.083000 ILE 100 C 2.856000 THR 101 C -0.765000 PHE 103 C 0.620000 GLY 104 C -2.898000 THR 107 C -5.116000 ALA 108 C -8.857000 ALA 111 C -11.318000 THR 112 C -14.503000 VAL 115 C -17.386999 GLY 116 C -19.815001 GLU 118 C -19.476999 GLN 119 C -23.497000 GLU 120 C -23.979000 ARG 122 C -25.673000 HIS 124 C -27.759001 ************************************************************************ What to do in case you don't have your own list of pore-lining residues ************************************************************************ STEP 1. Submit your PDB to http://www.ebi.ac.uk/thornton-srv/software/cgi-bin/data/PoreWalker/. You will obtain an e-mail with a link to your results page in few minutes to few hours, depending on the protein length. STEP 2. Please download yourPDB-aa_list.txt file from the PoreWalker results page and use it as your input text file for PoreLogo. ****************************************************************** What to do in case you have your own list of pore-lining residues but you don't know their pore-axis coordinates ****************************************************************** Step 1. Submit your PDB to http://www.ebi.ac.uk/thornton-srv/software/cgi-bin/data/PoreWalker/. You will obtain an e-mail with a link to your results page in few minutes to few hours, depending on the protein length. Step 2. Please download yourPDBcode-marked-pdb2.pdb file from the PoreWalker results page and extract/copy from it the x-coordinate of your pore-lining residues in the column of the pore-axis coordinate to complete the pore-lining list text file in the proper above format. ....................................................... . . . TYPE (2). MULTIPLE SEQUENCE ALIGNMENT (MSA) FILES . . . ....................................................... Supported MSA file formats are FASTA (*.fas, *.fst, *.fsa), GDE (*.gde) or A2M (*.a2m). **************************************************************************************** How to prepare a multiple sequence alignment (MSA) to use as input for family PoreLogo. Please note that the first sequence in the alignment **MUST** be your reference protein, i.e. the protein for which you are also providing a pore-lining residues text file and a PDB file. **************************************************************************************** **************** 1. FASTA format **************** Sequences in FASTA formatted files are preceded by a line starting with >. The first word on this line is the name of the sequence. The rest of the line is a description of the sequence. The remaining lines contain the sequence itself. The FASTA format can be obtained in output from sequence alignment softwares such as TCOFFEE, ClustalW (when run as command line, by selecting option '-OUTPUT=FASTA'), as well as from BioEdit and many other sequence alignment editors. An example of MSA in FASTA format: >1ymg_A THE CHANNEL ARCHITE SASFWRAICAEFFASLFYVFFGLGASLRW-----AG------P---------lHVLQVAL AFGLALATLVQAVGHISGAHVNPAVTFAFLVGSQMSLLRAICYMVAQLLGAVAGAAVLYS VT--PPAvRGNlALNTLHPGVSVGQATIVEIFLTLQFVLCIFATYDERRNGRLGSVALAV GFSLTLGHLFGMYYTGAGMNPARSFAPAILTR------NFTNHWVYWVGPVIGAGLGSLL YDFLLFPRLKSVSERLSILKG >2d57_A DOUBLE LAYERED 2D C TQAFWKAVTAEFLAMLIFVLLSVGSTINW-----GG-SENPLP---------VDMVLISL CFGLSIATMVQCFGHISGGHINPAVTVAMVCTRKISIAKSVFYITAQCLGAIIGAGILYL VT--PPSVVGGLGVTTVHGNLTAGHGLLVELIITFQLVFTIFASCDSKRTDVTGSVALAI GFSVAIGHLFAINYTGASMNPARSFGPAVIMG------NWENHWIYwVGPIIGAVLAGAL YEYVF--------------CP >2f2b_A CRYSTAL STRUCTURE O MVSLTKRCIAEFIGTFILVFFGAGSAAVTLMIASGGTSPNPFNIGIGLLGGLGDWVAIGL AFGFAIAASIYALGNISGCHINPAVTIGLWSVKKFPGREVVPYIIAQLLGAAFGSFIFLQ CAGIGAATVGGLGATAPFPGISYWQAMLAEVVGTFLLMITIMGIAvDERAP-KGFAGIII GLTVAGIITTLGNISGSSLNPARTFGPYLNDMifagtDlWNYYSIYvIGPIVGAVLAALT YQYL---------------TS ************** 2. GDE format ************** Sequences in GDE formatted files are preceded by a line starting with %. The first word on this line is the name of the sequence. The remaining lines contain the sequence itself, in lower case characters. The GDE fomat can be obtained, for example, from the ClustalW web server, by interactively selecting the relative output format. An example of MSA in GDE format: %1ymg_A sasfwraicaeffaslfyvffglg----aslrwag-----plhvlq-----------val afglalatlvqavghisgahvnpavtfaflvgsqmsllraicymvaqllgavagaavlys vt--ppavrgnlalntlhpgvsvgqativeifltlqfvlcifatyderrngrlgsvalav gfsltlghlfgmyytgagmnparsfapailtrnftn------hwvywvgpvigaglgsll ydfllfprlksvserlsilkg %2d57_A tqafwkavtaeflamlifvllsvg----stinwggsenplpvdmvl-----------isl cfglsiatmvqcfghisgghinpavtvamvctrkisiaksvfyitaqclgaiigagilyl vt--ppsvvgglgvttvhgnltaghgllveliitfqlvftifascdskrtdvtgsvalai gfsvaighlfainytgasmnparsfgpavimgnwen------hwiywvgpiigavlagal yeyvfcp-------------- %2f2b_A mvsltkrciaefigtfilvffgagsaavtlmiasggtspnpfnigigllgglgdwvaigl afgfaiaasiyalgnisgchinpavtiglwsvkkfpgrevvpyiiaqllgaafgsfiflq cagigaatvgglgatapfpgisywqamlaevvgtfllmitimgiavder-apkgfagiii gltvagiittlgnisgsslnpartfgpylndmifagtdlwnyysiyvigpivgavlaalt yqylts--------------- *************** 3. A2M format *************** The A2M format is also accettable. A2M is the primary format for multiple alignments in the SAM tools and can be also obtained from HMMER (by selecting '--outformat A2M'). For A2M format documentation please see: http://compbio.soe.ucsc.edu/a2m-desc.html. ............................................... . . . TYPE (3). FILES OF PROTEIN 3D-COORDINATES . . . ............................................... 3D structures of the channel proteins in current wwPDB format. For indications please visit: http://www.wwpdb.org/docs.html. Please be aware that protein structures submitted to PoreLogo needs to include at least 50 residues to be analysed. ========= OUTPUTS ========= PoreLogo outputs are displayed on the results HTML page and archived in a downloadable ZIP file. Outputs include: 1) Picture of the PoreLogo at a resolution of 600 dpi (JPEG format, filename.jpg); 2) The FASTA file used as to generate the porelogo, i.e. used as input for Weblogo3.0 (filename.fa) 3) A list of pore-lining residues ordered according to their position along the pore-axis (filename.out); 4) A ready-to-run Pymol script which allows to visualise pore-lining residues in the corresponding 3D-structure and colour them according to their physico-chemical features (filename.pml). ============ CITATIONS ============ Please cite "..." ... when using PoreLogo. ============= REFERENCES ============= 1. Crooks GE, Hon G, Chandonia JM, Brenner SE, 2004, "WebLogo: A sequence logo generator", Genome Res 141:1188. 2. Pellegrini-Calace M, Maiwald T, Thornton JM, 2009, "PoreWalker: a Novel Tool for the Identification and Characterization of Channels in Transmembrane Proteins from their Three-Dimensional Structure".