Abstract
Transitivity Clustering is a method for the partitioning of biological data into groups of similar objects, such as genes, for instance. It provides integrated access to various functions addressing each step of a typical cluster analysis. To facilitate this, Transitivity Clustering is accessible online and offers three user-friendly interfaces: a powerful stand-alone version, a web interface, and a collection of Cytoscape plug-ins. In this paper, we describe three major workflows: (i) protein (super)family detection with Cytoscape, (ii) protein homology detection with incomplete gold standards and (iii) clustering of gene expression data. This protocol guides the user through the most important features of Transitivity Clustering and takes ∼1 h to complete.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Enright, A.J., Kunin, V. & Ouzounis, C.A. Protein families and TRIBES in genome sequence space. Nucleic Acids Res. 31, 4632–4638 (2003).
Enright, A.J., Van Dongen, S. & Ouzounis, C.A. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575–1584 (2002).
Krause, A., Stoye, J. & Vingron, M. Large scale hierarchical clustering of protein sequences. BMC Bioinformatics 6, 15 (2005).
Enright, A.J. & Ouzounis, C.A. GeneRAGE: a robust algorithm for sequence clustering and domain detection. Bioinformatics 16, 451–457 (2000).
Paccanaro, A., Casbon, J.A. & Saqi, M.A. Spectral clustering of protein sequences. Nucleic Acids Res. 34, 1571–1580 (2006).
Frey, B.J. & Dueck, D. Clustering by passing messages between data points. Science 315, 972–976 (2007).
Wittkop, T. et al. Partitioning biological data with transitivity clustering. Nat. Methods 7, 419–420 (2010).
Cline, M.S. et al. Integration of biological networks and gene expression data using Cytoscape. Nat. Protoc. 2, 2366–2382 (2007).
Wittkop, T. Transitivity Clustering: Clustering Biological Data by Unraveling Hidden Transitive Substructures, 148 (Suedwestdeutscher Verlag fuer Hochschulschriften, 2010).
Rahmann, S. et al. Exact and heuristic algorithms for weighted cluster editing. Comput. Syst. Bioinformatics Conf. 6, 391–401 (2007).
Wittkop, T., Baumbach, J., Lobo, F.P. & Rahmann, S. Large scale clustering of protein sequences with FORCE—a layout based heuristic for weighted cluster editing. BMC Bioinformatics 8, 396 (2007).
Böcker, S., Briesemeister, S. & Klau, G.W. Exact algorithms for cluster editing: evaluation and experiments. Algorithmica (in press) (2009).
Brown, S.D., Gerlt, J.A., Seffernick, J.L. & Babbitt, P.C. A gold standard set of mechanistically diverse enzyme superfamilies. Genome Biol. 7, R8 (2006).
Golub, T.R. et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999).
Monti, S., Tamayo, P., Mesirov, J. & Golub, T. Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Machine Learning 52, 91–118 (2003).
Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Acknowledgements
J.B. thanks the German Academic Exchange Service (DAAD) for funding his work at ICSI, Berkeley. J.B. and M.A. are grateful for support from the German Research Foundation (DFG)-funded Cluster of Excellence for Multimodal Computing and Interaction. D.E. and M.A. received funding from the German National Genome Research Network. T.W. gained financial support through NIH grant NIH R01 LM009722 and the Buck Trust.
Author information
Authors and Affiliations
Contributions
T.W. and J.B. collected data, and tested and wrote Step 2A. D.E. and M.A. prepared Step 2B. A.T. and S.B. were responsible for Step 2C. All authors contributed to the preparation and proofreading of all other parts of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Fig. 1: Cytoscape Plug-in Blast2SimilarityGraph
The screenshot shows the user interface of the Blast2Similarity Cytoscape plug-in together with the similarity network after importing the gold standard proteins as described in Option 2A, step v. (TIFF 1991 kb)
Supplementary Fig. 2: Cytoscape Plug-in ClusterExplorer
The screenshot shows the user interface of the ClusterExplorer Cytoscape plug-in together with the inter/intra similarity distribution for the gold standard family assignments obtained after step viii, Option 2A. (TIFF 704 kb)
Supplementary Fig. 3: Cytoscape Plug-in TransClust
The screenshot shows the user interface of the TransClust Cytoscape plug-in including the results window from the density threshold determination and the gold standard network clustered with a threshold of 57, as described in step xi, Option 2A. (TIFF 595 kb)
Supplementary Data 1: Brown et al. all-vs-all VOC subset BLAST
The result file of an all-vs-all BLAST of the 133 protein sequences of the vicinal oxygen chelate (VOC) superfamily from the Brown et al. gold standard. We used an E-value cutoff of 100 and the “-m 8” option of BLAST for table-based output. (TXT 769 kb)
Supplementary Data 2: Brown et al. VOC subset FASTA
The 133 protein sequences of the VOC superfamily from the Brown et al. gold standard in FASTA format. (TXT 41 kb)
Supplementary Data 3: Brown et al. VOC subset family assignment
Tab-delimited flat file containing the family assignments for the 133 proteins of the VOC superfamily from the Brown et al. gold standard. (TXT 5 kb)
Supplementary Data 6
Tab-delimited flat file containing the family assignments for the 866 proteins of the Brown et al. gold standard. (TXT 24 kb)
Supplementary Data 8: Gene expression data file
This file contains an expression matrix of 38 bone marrow samples from acute leukemia patients with 999 monitored genes14 processed with a Human Genome HU6800 Affymetrix microarray. (TXT 448 kb)
Supplementary Data 9: Gene expression gold standard file
This file contains the gold standard for the "Leukemia" dataset of Supplementary File 8. The 38 samples are classified in 11 cases of acute myeloid leukemia (AML), 8 of T-lineage acute lymphoblastic leukemia (T-ALL), and 19 of B-lineage ALL (B-ALL). (TXT 0 kb)
Rights and permissions
About this article
Cite this article
Wittkop, T., Emig, D., Truss, A. et al. Comprehensive cluster analysis with Transitivity Clustering. Nat Protoc 6, 285–295 (2011). https://doi.org/10.1038/nprot.2010.197
Published:
Issue Date:
DOI: https://doi.org/10.1038/nprot.2010.197
This article is cited by
-
Guiding biomedical clustering with ClustEval
Nature Protocols (2018)
-
Novel 9-cis/all-trans β-carotene isomerases from plastidic oil bodies in Dunaliella bardawil catalyze the conversion of all-trans to 9-cis β-carotene
Plant Cell Reports (2017)
-
Comparing the performance of biomedical clustering methods
Nature Methods (2015)
-
Diversity of the metal-transporting P1B-type ATPases
JBIC Journal of Biological Inorganic Chemistry (2014)
-
BiCluE - Exact and heuristic algorithms for weighted bi-cluster editing of biomedical data
BMC Proceedings (2013)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.