Comprehensive cluster analysis with Transitivity Clustering

Wittkop, Tobias; Emig, Dorothea; Truss, Anke; Albrecht, Mario; Böcker, Sebastian; Baumbach, Jan

doi:10.1038/nprot.2010.197

Protocol
Published: 10 February 2011

Comprehensive cluster analysis with Transitivity Clustering

Tobias Wittkop¹,
Dorothea Emig²,
Anke Truss³,
Mario Albrecht²,
Sebastian Böcker³ &
…
Jan Baumbach^2,4,5

Nature Protocols volume 6, pages 285–295 (2011)Cite this article

1530 Accesses
38 Citations
3 Altmetric
Metrics details

Subjects

Abstract

Transitivity Clustering is a method for the partitioning of biological data into groups of similar objects, such as genes, for instance. It provides integrated access to various functions addressing each step of a typical cluster analysis. To facilitate this, Transitivity Clustering is accessible online and offers three user-friendly interfaces: a powerful stand-alone version, a web interface, and a collection of Cytoscape plug-ins. In this paper, we describe three major workflows: (i) protein (super)family detection with Cytoscape, (ii) protein homology detection with incomplete gold standards and (iii) clustering of gene expression data. This protocol guides the user through the most important features of Transitivity Clustering and takes ∼1 h to complete.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Overview of the Transitivity Clustering functionalities and user interfaces.**

**Figure 3: Intra- versus inter-cluster similarity distributions.**

**Figure 4: Stand-alone software user interface.**

KMD clustering: robust general-purpose clustering of biological data

Article Open access 02 November 2023

Accurately clustering biological sequences in linear time by relatedness sorting

Article Open access 08 April 2024

Band-based similarity indices for gene expression classification and clustering

Article Open access 03 November 2021

References

Enright, A.J., Kunin, V. & Ouzounis, C.A. Protein families and TRIBES in genome sequence space. Nucleic Acids Res. 31, 4632–4638 (2003).
Article CAS Google Scholar
Enright, A.J., Van Dongen, S. & Ouzounis, C.A. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575–1584 (2002).
Article CAS Google Scholar
Krause, A., Stoye, J. & Vingron, M. Large scale hierarchical clustering of protein sequences. BMC Bioinformatics 6, 15 (2005).
Article Google Scholar
Enright, A.J. & Ouzounis, C.A. GeneRAGE: a robust algorithm for sequence clustering and domain detection. Bioinformatics 16, 451–457 (2000).
Article CAS Google Scholar
Paccanaro, A., Casbon, J.A. & Saqi, M.A. Spectral clustering of protein sequences. Nucleic Acids Res. 34, 1571–1580 (2006).
Article CAS Google Scholar
Frey, B.J. & Dueck, D. Clustering by passing messages between data points. Science 315, 972–976 (2007).
Article CAS Google Scholar
Wittkop, T. et al. Partitioning biological data with transitivity clustering. Nat. Methods 7, 419–420 (2010).
Article CAS Google Scholar
Cline, M.S. et al. Integration of biological networks and gene expression data using Cytoscape. Nat. Protoc. 2, 2366–2382 (2007).
Article CAS Google Scholar
Wittkop, T. Transitivity Clustering: Clustering Biological Data by Unraveling Hidden Transitive Substructures, 148 (Suedwestdeutscher Verlag fuer Hochschulschriften, 2010).
Rahmann, S. et al. Exact and heuristic algorithms for weighted cluster editing. Comput. Syst. Bioinformatics Conf. 6, 391–401 (2007).
Article Google Scholar
Wittkop, T., Baumbach, J., Lobo, F.P. & Rahmann, S. Large scale clustering of protein sequences with FORCE—a layout based heuristic for weighted cluster editing. BMC Bioinformatics 8, 396 (2007).
Article Google Scholar
Böcker, S., Briesemeister, S. & Klau, G.W. Exact algorithms for cluster editing: evaluation and experiments. Algorithmica (in press) (2009).
Brown, S.D., Gerlt, J.A., Seffernick, J.L. & Babbitt, P.C. A gold standard set of mechanistically diverse enzyme superfamilies. Genome Biol. 7, R8 (2006).
Article Google Scholar
Golub, T.R. et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999).
CAS Google Scholar
Monti, S., Tamayo, P., Mesirov, J. & Golub, T. Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Machine Learning 52, 91–118 (2003).
Article Google Scholar
Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Article CAS Google Scholar

Download references

Acknowledgements

J.B. thanks the German Academic Exchange Service (DAAD) for funding his work at ICSI, Berkeley. J.B. and M.A. are grateful for support from the German Research Foundation (DFG)-funded Cluster of Excellence for Multimodal Computing and Interaction. D.E. and M.A. received funding from the German National Genome Research Network. T.W. gained financial support through NIH grant NIH R01 LM009722 and the Buck Trust.

Author information

Authors and Affiliations

Buck Institute for Age Research, Novato, California, USA
Tobias Wittkop
Max Planck Institute for Informatics, Saarbrücken, Germany
Dorothea Emig, Mario Albrecht & Jan Baumbach
Friedrich-Schiller University Jena, Jena, Germany
Anke Truss & Sebastian Böcker
International Computer Science Institute, University of California at Berkeley, Berkeley, California, USA
Jan Baumbach
Saarland University, Cluster of Excellence for Multimodal Computing and Interaction, Saarbrücken, Germany
Jan Baumbach

Authors

Tobias Wittkop
View author publications
You can also search for this author in PubMed Google Scholar
Dorothea Emig
View author publications
You can also search for this author in PubMed Google Scholar
Anke Truss
View author publications
You can also search for this author in PubMed Google Scholar
Mario Albrecht
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Böcker
View author publications
You can also search for this author in PubMed Google Scholar
Jan Baumbach
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

T.W. and J.B. collected data, and tested and wrote Step 2A. D.E. and M.A. prepared Step 2B. A.T. and S.B. were responsible for Step 2C. All authors contributed to the preparation and proofreading of all other parts of the manuscript.

Corresponding author

Correspondence to Jan Baumbach.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Fig. 1: Cytoscape Plug-in Blast2SimilarityGraph

The screenshot shows the user interface of the Blast2Similarity Cytoscape plug-in together with the similarity network after importing the gold standard proteins as described in Option 2A, step v. (TIFF 1991 kb)

Supplementary Fig. 2: Cytoscape Plug-in ClusterExplorer

The screenshot shows the user interface of the ClusterExplorer Cytoscape plug-in together with the inter/intra similarity distribution for the gold standard family assignments obtained after step viii, Option 2A. (TIFF 704 kb)

Supplementary Fig. 3: Cytoscape Plug-in TransClust

The screenshot shows the user interface of the TransClust Cytoscape plug-in including the results window from the density threshold determination and the gold standard network clustered with a threshold of 57, as described in step xi, Option 2A. (TIFF 595 kb)

Supplementary Data 1: Brown et al. all-vs-all VOC subset BLAST

The result file of an all-vs-all BLAST of the 133 protein sequences of the vicinal oxygen chelate (VOC) superfamily from the Brown et al. gold standard. We used an E-value cutoff of 100 and the “-m 8” option of BLAST for table-based output. (TXT 769 kb)

Supplementary Data 2: Brown et al. VOC subset FASTA

The 133 protein sequences of the VOC superfamily from the Brown et al. gold standard in FASTA format. (TXT 41 kb)

Supplementary Data 3: Brown et al. VOC subset family assignment

Tab-delimited flat file containing the family assignments for the 133 proteins of the VOC superfamily from the Brown et al. gold standard. (TXT 5 kb)

Supplementary Data 4: Brown et al. all-vs-all BLAST (TXT 8408 kb)

Supplementary Data 5: Brown et al. FASTA (TXT 365 kb)

Supplementary Data 6

Tab-delimited flat file containing the family assignments for the 866 proteins of the Brown et al. gold standard. (TXT 24 kb)

Supplementary Data 7: Brown et al. subset family pre-assignment (TXT 84 kb)

Supplementary Data 8: Gene expression data file

This file contains an expression matrix of 38 bone marrow samples from acute leukemia patients with 999 monitored genes¹⁴ processed with a Human Genome HU6800 Affymetrix microarray. (TXT 448 kb)

Supplementary Data 9: Gene expression gold standard file

This file contains the gold standard for the "Leukemia" dataset of Supplementary File 8. The 38 samples are classified in 11 cases of acute myeloid leukemia (AML), 8 of T-lineage acute lymphoblastic leukemia (T-ALL), and 19 of B-lineage ALL (B-ALL). (TXT 0 kb)

Supplementary information (ZIP 2709 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wittkop, T., Emig, D., Truss, A. et al. Comprehensive cluster analysis with Transitivity Clustering. Nat Protoc 6, 285–295 (2011). https://doi.org/10.1038/nprot.2010.197

Download citation

Published: 10 February 2011
Issue Date: March 2011
DOI: https://doi.org/10.1038/nprot.2010.197

This article is cited by

Guiding biomedical clustering with ClustEval
- Christian Wiwie
- Jan Baumbach
- Richard Röttger
Nature Protocols (2018)
Novel 9-cis/all-trans β-carotene isomerases from plastidic oil bodies in Dunaliella bardawil catalyze the conversion of all-trans to 9-cis β-carotene
- Lital Davidi
- Uri Pick
Plant Cell Reports (2017)
Comparing the performance of biomedical clustering methods
- Christian Wiwie
- Jan Baumbach
- Richard Röttger
Nature Methods (2015)
Diversity of the metal-transporting P1B-type ATPases
- Aaron T. Smith
- Kyle P. Smith
- Amy C. Rosenzweig
JBIC Journal of Biological Inorganic Chemistry (2014)
BiCluE - Exact and heuristic algorithms for weighted bi-cluster editing of biomedical data
- Peng Sun
- Jiong Guo
- Jan Baumbach
BMC Proceedings (2013)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.