Bioinformatics

SeQuence IDentification (SQID) is a database searching algorithm for tandem mass spectrometry developed in Wysocki group. The SQID program and source code are available under the GNU GPL license.

Download SQID (for windows)

Usage
Open “SQID_1.0.exe”, specify fasta database and dta folders, then click “Index&Rundtalist”. “Run dtalist” button will only be available when an “.index” file is used instead of “.fasta” file. Indexing any database will create an “.index” file in the db folder.Input files  Currently SQID only accepts .dta files. The folders DIRECTLY containing .dta files should be used in data field.A newer test version can directly accept Thermo .Raw files (collected from Xcalibur2.0 or lower), can be downloaded here. Note that in addition to .out output option, this version has an .mzid output option instead of .txt file output option.

Database  SQID accepts (.fasta) database. After indexing the database, a folder, a protein file (.pro) and a (.index) file with the same name as fasta database will be created in the “db” folder. This indexed database can be used repeatedly in future searches by putting the (.index) file in “database” field. Because currently SQID needs to index database for all searches, it does not support super large database like NR.

Output files   Currently two output options are available:
1. A single tab delimited file (.txt), which can be opened directly in Excel. This format only reports top hit for each spectrum, with the SQID score, delta score, intensity score, matched ions and ion pairs. In excel the results can be easily sorted/filtered according to each column. This format is mainly used for testing purpose.
2. .Out files mimic Sequest. This is currently the default format. The .out file will be generated in .dta folders. Note that in the .out file:

Xcorr= SQID score/5;
deltCN= deltSQID ((top-second)/top);
sp= intensity score;Number of matched ions = Number of matched ions in SQID;

SQIDscore/5 is almost at the same scale with Sequest Xcorr. Filtering the out file withXcorr and DeltCN (Xcorr>1.8, 2,5, 3.5, DeltCN>0.05) will give a ~5% FDR for SQID. The .out files can be viewed using scaffold, or dtaselect; they can also be converted to pepXML using trans-proteomic pepline (TPP). Note that a “Sequest.params”file may be needed for dtaselect and TPP. This file can be obtained from any Sequest search. More information about dtaselect and a sampleSequest.param file can be downloaded here.

Common errors

1. SQID makes use of some system commands in win32.

If you see an error message like ” ‘cmd’ is not recognized as an internal or external command” simply go to “indows/system32” folder and copy the “cmd.exe” to the SQID folder containing the “SQID_1.0.exe”.

Work in progress

1. Incorporate more input and output file formats.

2. Enable more protease options.

3. Improve database index efficiency.

Reference
W. Li, L. Ji, J. Goya, G. Tan & V.H. Wysocki, “SQID: An Intensity-Incorporated Protein Identification Algorithm for Tandem Mass Spectrometry ,” J. Proteome Res 10(4), 1593-1602 (2011).
Please report bugs to lwz@email.arizona.edu

 


 SQID- XLink (for cross-linking)

SQID-XLink is a database searching algorithm specially designed for tandem mass spectrometry based cross-linking study. It automatically searches regular peptides, mono-linked peptides and cross-linked peptides. It utilizes a similar scoring function from SQID. Currently BS2g, BS3 and EDC cross-linkers are supported. The program is freely available under GNU GPL license. © 2011 Wysocki group

Download SQID-XLink (for windows)

Please read the Usage.html file in the distribution for usage.

Please report bugs to lwz@email.arizona.edu

Reference: W. Li, H.A. O’Neill, V.H. Wysocki. SQID-XLink: Implementation of An Intensity-Incorporated Algorithm for Cross-linked Peptide Identification.Bioinformatics, 2012, doi:10.1093/bioinformatics/bts442


Spectrum predictor

Spectrum predictor is a program to predict ion trap CID fragmentation spectrum with intensities. The program is freely available under GNU GPL license. © 2011 Wysocki group

Download Spectrum predictor (for windows)

The program is still under testing stage.

Please report bugs to lwz@email.arizona.edu


PNNL dataset (28311 spectra)

PNNL dataset contains 28311 spectra (25% singly charged, 62% doubly charged and 13% triply charged) from unmodified Deinococcus radiodurans and Shewanellaoneidensis peptides collected by the Pacific Northwest National Laboratories (PNNL) on a Thermo LCQ ion trap mass spectrometer. The dataset was used to optimize and test our algorithm.

Download the dataset

References for PNNL dataset:
1 Lipton, M. S.; Pasa-Tolic, L.; Anderson, G. A.; Anderson, D. J.; Auberry, D. L.; Battista, J. R.; Daly, M. J.; Fredrickson, J.; Hixson, K. K.; Kostandarithes, H.;Masselon, C.; Markillie, L. M.; Moore, R. J.; Romine, M. F.; Shen, Y.; Stritmatter, E.;Tolic, N.; Udseth, H. R.; Venkateswaran, A.; Wong, K.; Zhao, R.; Smith, R. D.,Globalanalysis of the Deinococcus radiodurans proteome by using accurate mass tags.Proc Natl Acad Sci U S A, 2002, 99, (17), 11049-11054.
2. Kolker, E.; Picone, A. F.; Galperin, M. Y.; Romine, M. F.; Higdon, R.; Makarova, K. S.; Kolker, N.; Anderson, G. A.; Qiu, X.; Auberry, K. J.; Babnigg, G.; Beliaev, A. S.;Edlefsen, P.; Elias, D. A.; Gorby, Y. A.; Holzman, T.; Klappenbach, J. A.;Konstantinidis, K. T.; Land, M. L.; Lipton, M. S.; McCue, L.; Monroe, M.; Pasa-Tolic, L.; Pinchuk, G.; Purvine, S.; Serres, M. H.; Tsapin, S.; Zakrajsek, B. A.; Zhu, W.; Zhou, J.; Larimer, F. W.; Lawrence, C. E.; Riley, M.; Collart, F. R.; Yates, J. R.; Smith, R. D.; Giometti, C. S.; Nealson, K. H.; Fredrickson, J.K.; Tiedje, J. M., Global profiling of Shewanella oneidensis MR-1: expression of hypothetical genes and improved functional annotations. Proc Natl Acad Sci U S A, 2005, 102, (6), 2099-2104.


Hemoglobin data

We recently reported the successful de novo sequencing of hemoglobins from nine small mammals native to North America using LC-MS/MS combined with pepNovo. The spectra files as well as pepNovo results for each species can be download from the following links:

Microtus Pennsylvanicus
Peromyscus californicus
Peromyscus Crinitus
Sciurus Carolinensis
Spermophilus Beecheyi
Tamias Merriam
Tamias Striatus
Tamiasciurus hudsonicus
Blarina Brevicauda

Reference: Ünige A. Laskay, Erin J. Kaleta, Inger-Marie E. Vilcins, Sam R. Telford III, Alan G. Barbour, Vicki H. Wysocki. Development of a Host Blood Meal Database: De Novo Sequencing of Hemoglobin from Nine Small Mammals Using Mass Spectrometry , Biological Chemistry, 393, pp. 195–201, 2012.