Main

The emerging field of engineered living materials (ELMs) seeks to create engineered biomaterials with distinctive ‘living’ attributes such as autonomous growth, self-healing and environmental responsiveness that are only found in natural living materials1,2. The recent advances and integration of synthetic biology and materials science tools had led to the development of a wide range of remarkable ELMs with applications in biosensors3,4, bioremediation5,6, biomedicine7, biomanufacturing8,9, wearable devices10 and electronics11. Depending on the source of their structural components, ELMs can be produced either by harnessing engineered cells to simultaneously make the material and incorporate new functionalities into it (known as self-organizing living materials or biological ELMs)1 or by embedding living cells in an organic or inorganic matrix (referred to as hybrid living materials)12. Self-organizing living materials aim to recapitulate the autonomous, adaptive and versatile properties of natural living materials, and represent opportunities to harness engineered biological systems for new capabilities1. For example, by building on endogenous biopolymers from microorganism systems, such as intracellular membraneless organelles13,14, extracellular amyloid fibers of bacterial biofilms15,16, bacterial cellulose (BC) from Komagataeibacter rhaeticus3 and fungal mycelium17, a diversity of self-organizing living materials have been created, with emerging functionalities ranging from acoustic properties8 to underwater adhesion18 and mechanical strengthening19.

Despite the advances in ELMs, further development and application of self-organizing living materials faces severe challenges due to the lack of engineerable chassis and corresponding programmable endogenous biopolymers in microorganisms, particularly the nonpathogens. At present, only model microbial systems, such as Escherichia coli and Bacillus subtilis along with their extracellular amyloid fibers1, and several nonmodel systems including BC-producing K. rhaeticus20, the surface-layer protein-containing Caulobacter crescentus21 and the dominant bacterial component of Pantoea agglomerans in native feedstocks of fungus17 have been successfully harnessed in ELM design.

Here, we develop an integrative technological workflow for ELMs by rationally combining tools from bioinformatics, structural biology and synthetic biology (Fig. 1). Building on those tools, our technological workflow facilitates mining, understanding and harnessing of bacterial biopolymers of interest and appropriate biopolymer-producing bacteria, as structural building blocks and hosts, for living material design in a systematic and sequential manner (Fig. 1). Our workflow was initially motivated by the state-of-the-art methodologies applied to search for metabolic bioactive molecules in microorganisms, for which the combination of genome mining, structure elucidation and pathway engineering has greatly accelerated the discovery and engineering of bioactive chemical compounds in diverse microorganisms22. We set out to develop bioinformatics software, which we termed Bacteria Biopolymer Sniffer (BBSniffer), by harnessing diverse bioinformatics tools. From a large library of bacterial genome sequences, BBSniffer can assist in searching a specific functional biopolymer of interest (including proteins, polysaccharides and other biopolymers), and correspondingly generate a screened list of biopolymer-producing bacteria, including functionally useful, experimentally tenable nonpathogens (Fig. 1a).

Fig. 1: An integrative technological workflow for mining, understanding, designing and engineering structural building blocks for living materials by combining tools from bioinformatics, structural biology and synthetic biology.
figure 1

a, Natural microorganisms produce a wide spectrum of functional bacterial biopolymers that are ripe for use in diverse applications. Building on decades of microbiology, molecular biology and functional genomic research, a bioinformatics-enabled software, referred to as BBSniffer, was developed to mine specific functional biopolymers and corresponding biopolymer-producing bacteria in nature. b, For a target biopolymer (for example, a proteinaceous polymer produced by a specific microorganism) mined by BBSniffer, its structural features and assembly mechanism are further explored through genetic manipulation, structural verification and morphology characterization. c, Building on the structure and assembly information, an in situ modification of the building block is enabled through strain engineering on the host strain, allowing for rational design of ELMs for various application scenarios.

As a proof-of-principle study, using the well-studied Gram-positive pilus structure widely found in pathogenic bacteria as a reference23, we uncovered the biosynthetic gene cluster (BGC) of the covalently linked pili (CLP) fiber in the BBSniffer-screened industrial workhorse Corynebacterium glutamicum. Through genetic manipulation, bioimaging and structural characterization, we identified Spa2 protein as major composition of the CLP fiber structure and gained insights into the molecular mechanism of CLP assembly (Fig. 1b). Using structure-guided design, we ultimately developed a type of engineerable extracellular protein scaffold that can be genetically appended with diverse functional peptides or proteins at multiple sites of the Spa2 protein (Fig. 1c). Finally, we rationally engineered the CLP fibers as programmable living materials in the industrial C. glutamicum ATCC 14067 strain with synthetic biology tools, enabling efficient use of cellulose biomass for lycopene production by coupling extracellular enzymatic degradation capacity with intracellular bioconversion ability. Leveraging the synergy power of bioinformatics, structural biology and synthetic biology, we demonstrate the rational design of pili-enabled living materials from scratch. Our systematic approach opens up new opportunities for designing self-organizing living materials with tailored functionalities.

Results

Mining biopolymer-producing bacteria through BBSniffer

Through billions of years, a large and diverse library of natural microbial biopolymers has evolved, such as extracellular polysaccharides, proteinaceous fibers and intracellular membraneless organelles24. These biopolymers possess outstanding physicochemical and/or mechanical properties24 and are ripe for harvesting as functional building blocks for biomaterials and living materials design. However, information about the producers, synthetic pathways and assembly mechanisms for most biopolymers, particularly those in nonpathogens, remains elusive, making it difficult to fully exploit them24. Genome mining by various software tools such as antiSMASH25, PRISM26 and CLUSEAN27 has been demonstrated as a powerful approach to detect and characterize BGCs of bioactive chemical compounds in microorganisms. However, these software programs cannot be used to mine BGCs of biopolymers in microorganisms because the rule-based screening algorithm in these tools is limited to only identifying microbial secondary metabolites.

To facilitate efficient mining of functional biopolymer-producing chassis, we set out to develop software, BBSniffer, by integrating several bioinformatics tools (Fig. 2a). The BBSniffer software, which couples the rule-based BGC identification algorithm of antiSMASH with an additional rule specified to define the BGC of interest, is specifically designed to detect BGCs of biopolymers in bacteria (for details, see the Methods section), and can automatically classify the strains based on the internal bacterial database into pathogens, industrial microorganisms and other nonpathogens. In the classification, a given strain, usually containing well-known information about the BGC and assembly mechanism of a specific biopolymer of interest, is used as a reference. Based on the defined reference, BBSniffer can further build a distance-based phylogenetic tree using JolyTree and generate a list of distance scores for all mined industrial microorganisms using the reference strain as a benchmark (Fig. 2a). Finally, BBSniffer uses the list of distance scores as a ranking reference to recommend candidate strains with uncovered BGCs for biopolymers of interest in genomes. Additionally, the software can output useful information for all the candidate strains such as anaerobes and/or aerobes, condition of growth and availability of genetic manipulation tools, therefore providing guidance for users in choosing strains for further engineering of living materials.

Fig. 2: Mining functional biopolymer-producing bacteria in nature through BBSniffer.
figure 2

a, A complete computational pipeline of the BBSniffer software shows the required input information, algorithms and databases used for mining the biopolymer-producing strain and uncovering the corresponding BGC in specific bacteria. Details of each step are provided in the Methods. be, An example illustrating the use of BBSniffer software for mining sortase-assembled CLP bacterial producers and generating the candidate strains for further CLP engineering. Mining results enabled by BBSniffer software, including 1,162 strains (among the 446,308 total sequenced bacterial genomes) containing the CLP-BGC, among which 102 strains are identified as nonpathogenic industrial microorganisms (b). A distance-based phylogenetic tree, generated using JolyTree, applied for scoring the distance between the mined CLP-containing industrial microorganisms and the reference strain (a pathogen of C. diphtheriae NCTC 13129) (c). Heat map showing the distance score of each CLP-containing industrial microorganism relative to the reference strain, which is used for ranking all candidate strains (d). The top five finally mined nonpathogenic and engineerable industrial workhorses C. glutamicum containing the CLP-BGC in the genomes, sorted by the distance score relative to C. diphtheriae NCTC 13129, were selected as the candidate strains (e). Note that information regarding culture conditions and the availability of genetic manipulation tools for specifically mined strains can also be outputted, providing useful information for further engineering.

To demonstrate the usefulness of BBSniffer, we initially applied this software to mine ideal producers of the sortase-assembled CLP, which has frequently been found to be associated with Gram-positive pathogenic bacteria23. The CLP is well known for its robust structure with strong tensile strength28, providing a promising building block for living materials design. To search potential nonpathogenic industrial strains that might contain similar pili structures, we used the technical terms, the pilin and sortase, as input for initial exploration (Supplementary Fig. 1). We found 2,665 probable bacterial genomes associated with CLP in the UniProt database and downloaded them from the National Center for Biotechnology Information (NCBI). This number was further reduced to 1,162 through a one-step screening process with modified antiSMASH v.6.0, which uses specific rules to define the identification of a CLP-BGC based on the presence of both pilin and sortase proteins in a region of specified length in a genome (Supplementary Fig. 1). After all CLP-containing bacteria were classified, 102 industrial microorganisms distributed in Bacillus, Bifidobacterium, Corynebacterium, Lacticaseibacillus and Lactococcus were identified (Fig. 2b and Supplementary Data 1). These industrial microorganisms, and the reference strain of the pathogenic Corynebacterium diphtheriae NCTC 13129 (ref. 23), were used to build a distance-based phylogenetic tree using JolyTree (Fig. 2c) and then generated with a list of distance scores (Fig. 2d and Supplementary Data 1). Finally, the top five scored strains including C. glutamicum BE, ATCC 14067, YI, ATCC 13869 and AJ1511 (with their genomes closest to that of the reference strain), along with other corresponding information about culture conditions and the availability of genetic manipulation tools (for example, CRISPR–Cas system), were outputted as candidates for further CLP engineering (Fig. 2e).

To demonstrate the generalizability of BBSniffer, we next explored the bacterial producers of three typical biopolymers (BC, gas vesicle (GV) and bacterial microcompartment (BMC)). Dependent on the input technical terms, BBSniffer recommended the industrial strains, Zymomonas mobilis ATCC 29192, ATCC 10988, ATCC 31821 and several others, from 7,109 mined strains for engineering bacteria cellulose (Extended Data Fig. 1a and Supplementary Data 2); the antibiotic producers of Streptomyces venezuelae ATCC 14585, Streptomyces lydicus 103, Streptomyces lavendulae subsp. lavendulae Del-LP and several others, from 489 mined strains for engineering GVs (Extended Data Fig. 1b and Supplementary Data 3) and the nonpathogenic E. coli M70, M24, M11957 and several others, from 4,241 mined strains for engineering BMCs (Extended Data Fig. 1c and Supplementary Data 4), according to the list of distance scores relative to the input reference strain of Komagataeibacter xylinus E25, Halobacterium salinarum 91-R6 and Salmonella typhimurium LT2, respectively.

To validate the performance of our BBSniffer software for mining the aforementioned four different types of biopolymer, we took the overall two cases of the searching results into account (for details, see the Methods section). First, we checked the detection accuracy of the BBSniffer software in searching all the literature-reported and experimentally well-characterized strains (usually containing well-defined information about the specific biopolymers and corresponding BGCs) by calculating their coverage rate. Second, for those searching results that have not yet been identified or reported in the literature, we applied the existing knowledge and features of biopolymer BGC as the major reference and then manually inspected the annotation results of those predicted biopolymer BGC strains. According to these cases, the success rate of the BBSniffer software for detecting strains containing BGC of CLP, BC, GV and BMC is 93.6, 80.5, 93.3 and 93.8% (Supplementary Data 14 and Supplementary Table 1), respectively. These results demonstrated that the BBSniffer software is applicable for mining various biopolymer-producing bacteria of specific interest.

Probing the CLP assembly in C. glutamicum

As a proof of concept, we next investigated the CLP assembly in the industrial workhorse C. glutamicum ATCC 14067 (referred to as CgCLP) by combining the approaches of genetic manipulation, morphological characterization, mass spectrometry analysis and X-ray crystallography (Supplementary Fig. 2). The industrial workhorse C. glutamicum is a ‘generally recognized as safe’ strain with well-established gene editing tools that is widely used for the industrial-scale production of valued products such as amino acids, diamines, terpenoids and other chemicals29. In C. glutamicum, the CLP-BGC contains three predicted pilin-encoding genes, spa1 (NCBI locus tag: CEY17_01465), spa2 (NCBI locus tag: CEY17_01470) and spa3 (NCBI locus tag: CEY17_01485), as well as two sortase coding genes of srtC1 (NCBI locus tag: CEY17_01475) and srtC2 (NCBI locus tag: CEY17_01480) (Fig. 3a), which is similar to the SpaH-type (a relatively less well-studied pili type) CLP gene cluster in the pathogenic C. diphtheriae30. Confirming that CgCLP-BGC are responsible for fiber formation, we observed no filamentous structures at the C. glutamicum cell surface on deletion of the CLP-BGC, while the filamentous structure phenotype was rescued on complementing the BGC of CLP (Fig. 3a).

Fig. 3: Probing the composition and molecular assembly of the CLP in C. glutamicum.
figure 3

a, The biogenesis of CLP in C. glutamicum (CgCLP). The cartoon shows the CgCLP-BGC encoding the sortase genes srtC1 and srtC2, and the sortase-catalyzed pilin genes spa1, spa2 and spa3. The TEM and AFM images show that the major pilin Spa2 is indispensable for CgCLP fiber structure formation. WT, wild type. b, Identification of intermolecular isopeptide bonds for the polymerization of Spa2 monomers in CgCLP. Fragmentation spectra of the parent ion at m/z 832.92+ containing the intermolecular isopeptide bond (green font) between Spa2i Lys194 (blue font) and Spa2i+1 Thr477 (red font) are shown. The scale bars in the TEM and AFM images are 200 and 400 nm, respectively.

We next generated polyclonal antibodies against recombinant Spa1, Spa2 and Spa3 to determine the composition of CgCLP. Transmission electron microscopy (TEM) images of the CgCLP with immunogold labeling showed that the CgCLP fibers comprise two minor pilins of Spa1 and Spa3 and a major pilin of Spa2 (Supplementary Fig. 3). A whole-cell filtration enzyme-linked immunosorbent assay (ELISA)15, TEM and atomic force microscopy (AFM) imaging used to assess the specific roles of the three pilins in the CgCLP assembly showed that the cells, which were defective for Spa1 (Δspa1), Spa3 (Δspa3) or both (Δspa1Δspa3), could still produce fibers (Fig. 3a and Extended Data Fig. 2a). By contrast, cells lacking Spa2 (Δspa2) could not produce any fiber, and overexpression of Spa2 (Spa2) promoted the formation of abundant long fibers surrounding the cell surface (Fig. 3a and Extended Data Fig. 2a). TEM and AFM images also showed that cells lacking both SrtC1 and SrtC2 (ΔsrtC1ΔsrtC2) completely blocked fiber formation (Supplementary Fig. 4). Collectively, these findings verified that the major pilin of Spa2 protein is an indispensable building block for the sortase-catalyzed CgCLP assembly and production, similar to the role of the most highly studied SpaA in pili assembly in the pathogenic C. diphtheriae31. Despite this similarity, the wide variation in the size and sequences of major pilin protein from diverse Gram-positive pathogens32 makes it challenging to predict whether the structural principles characterized for the CLP of other hosts are also appliable in CgCLP.

Unlike the non-CLP produced in Gram-negative bacteria23, the CLP monomer subunits are typically joined via intermolecular isopeptide bond catalyzed by sortase conferring enormous tensile strength28. Furthermore, the CLP subunits contain auto-catalyzed intramolecular isopeptide bonds that are less susceptible to proteolytic cleavage and can dissipate mechanical energy23 imparting the robustness of CLP. In addition, several pilin proteins in the CLP structure of different strains contain additional disulfide bonds that further enhance stability32. Having identified the Spa2 major pilin as the essential building block for CgCLP fiber production, we next explored whether any intermolecular isopeptide bond, disulfide bond or intramolecular isopeptide bond forms during the CgCLP assembly.

First, the purified CgCLP polymers were excised from Coomassie blue-stained SDS–PAGE gels (Supplementary Fig. 5) and then digested in-gel with trypsin and AspN endoproteinase. Liquid chromatography with tandem mass spectrometry was used to analyze the digestion products, and verify the presence of the intermolecular isopeptide bond, as indicated by the elimination of a water molecule and thus a slight decrease of molecular weight as a result of isopeptide bond formation. Specifically, the peptide peak with m/z 832.92+ (Fig. 3b and Supplementary Table 2) suggested that the major pilin of Spa2 was cross-linked between K194 in the N terminus of Spa2i and T477 in the C terminus of Spa2i+1 (Lys194-Thr477).

Quadrupole time-of-flight mass spectrometry analysis of a recombinant variant of Spa2 (Spa2cut) (Supplementary Fig. 6) secreted by C. glutamicum cells indicated a molecular weight of 46,504.6 Da (Supplementary Fig. 7), which is about 54.7 Da less than the expected value calculated from the secreted Spa2cut amino acid sequence. This detected mass is consistent with the loss of three NH3 units and two 2H units, which can be explained by the formation of three intramolecular isopeptide bonds (loss of one molecule of ammonia, 17 Da) and two disulfide bonds (loss of two hydrogen atoms, 2 Da) in Spa2.

To probe the structural features of major pilin Spa2, we solved its X-ray crystal structure at 2.73 Å resolution (Protein Data Bank (PDB) ID 7WOI) (Extended Data Fig. 3a and Supplementary Table 3) by the molecular replacement method using PHASER with the coordinates predicted by Alphafold Colab as a template (Supplementary Fig. 8a). Spa2 is arranged in three tandem Ig-like domains, including N-domain (residues 36–197, pink), M-domain (residues 198–343, blue), and C-domain (residues 344–469, green), giving an elongated molecule of roughly 125 Å in length (Extended Data Fig. 3a). These three tandem Ig-like domains of Spa2 are similar to the major pilin of SpaA (PDB 3HR6, root-mean-square deviation (r.m.s.d.) 6.5 Å over 270 alpha-carbon (Cα) atoms, Supplementary Fig. 8b) and SpaD (PDB 4HSS, r.m.s.d. 4.0 Å over 311 Cα atom, Supplementary Fig. 8c) from human pathogen C. diphtheriae32,33. The crystals of the Spa2 adopt head-to-tail stacking such that the N-domain in Spa2i abuts against the C-domain in Spa2i+1 (Extended Data Fig. 3a and Supplementary Fig. 9), which is consistent with the result that the Spa2 monomers are connected via the intermolecular isopeptide bond between K194 in the N terminus of Spa2i and T477 in the C terminus of Spa2i+1 (Fig. 3b). Together, these results indicate that the biological assembly of CgCLP fiber occurs via the head-to-tail polymerization of Spa2 monomers.

Furthermore, interpretation of electron-density maps clearly showed three common isopeptide bonds and two unique disulfide bonds in the structure of Spa2 (Extended Data Fig. 3b,c and Supplementary Fig. 10). Formation of multiple covalent bonds was also verified by liquid chromatography–tandem mass spectrometry analysis of the pepsin-digested Spa2cut products (Extended Data Fig. 4). The isopeptide bonds linked Lys57 and Asn195 with catalytic Glu158 in the N-domain; Lys203 and Asn318 with catalytic Asp246 in the M-domain and Lys355 and Asn466 with catalytic Glu435 in the C-domain (Extended Data Fig. 3b and Supplementary Fig. 10a). Notably, the presence of three intramolecular isopeptide bonds distributed in three domains of major pilin Spa2 in C. glutamicum is similar to the feature of the major pilin SpaD from the pathogenic C. diphtheriae33, but is different from the major pilin SpaA from the pathogenic C. diphtheriae lacking isopeptide bonds in the N-terminal domain32. In addition, two disulfide bonds were formed in the N-domain between Cys97 and Cys128 and the C-domain between Cys380 and Cys432, respectively (Extended Data Fig. 3c and Supplementary Fig. 10b). Notably, the presence of two disulfide bonds in Spa2 is very unique in comparison with other major pilins in human pathogens, such as Spy0128 (PDB 3B2M) from Streptococcus pyogenes34 and BcpA (PDB 3KPT) from Bacillus cereus35 lacking a disulfide bond, and the SpaA and SpaD from C. diphtheriae containing only one disulfide bond in the C-terminal domain32,33.

To explore the intermolecular polymerization between Spa2 monomers, we next conducted functional assays with various Spa2 mutant variants to explore their roles for CgCLP formation in vivo. Indeed, mutagenesis experiments with K194A and LPLTG474LALAA478 variants blocked CgCLP production, confirming that both Lys194 in the N-domain and LPLTG474-478 in the C-domain participate in Spa2 monomer polymerization (Extended Data Fig. 3d,e). To further test how the intramolecular isopeptide bond and disulfide bond in Spa2 monomer contribute to the formation and stabilization of CgCLP, a series of Spa2 variants were generated. Substitutions of Glu158, Asp246 and Glu435 (E158A, D246A, E435A) with alanine that abolished one or two intramolecular isopeptide bonds (Supplementary Fig. 11a–c) had no notable impact on CgCLP production (Extended Data Fig. 3d,e). Only the double mutation variants of D246A/E435A abolished all three intramolecular isopeptide bonds in Spa2 (Supplementary Fig. 11d), and produced only 44.9% of CgCLP compared to Spa2 cells (Extended Data Fig. 3e). Abrogation of the disulfide bonds in the N and C domains of Spa2 with C97A and C380A variants, respectively, dramatically reduced the extent of CgCLP formation (Extended Data Fig. 3d,e). The C97A/C380A double mutant variant completely blocked CgCLP formation (Extended Data Fig. 3d,e). Taken together, these results indicate that both isopeptide and disulfide bonds contribute to the formation of CLP in C. glutamicum, with the disulfide bond appearing as the most important element for its stabilization.

Programming CgCLP as an extracellular protein scaffold

As an extracellular matrix, the CLP fibers can be conveniently and reliably positioned directly outside cells (Fig. 4a). Because these extracellular fibers not only possess extraordinarily high tensile strength owing to their extensive inter- and intramolecular isopeptide bonds28, but also contain unique amino acids such as cysteine residues, the CLP structure may serve as an attractive building block for various applications. For example, our preliminary study revealed that the Spa2 protein of the CgCLP, containing four cysteine residues, could automatically promote local mineralization of CdS on the fibers (Extended Data Fig. 5), potentially resulting in photocatalytic applications similar to previous work based on photocatalyst-mineralized living biofilms36.

Fig. 4: In situ functionalization of CLP as a programmable extracellular protein scaffold.
figure 4

ac, Rational engineering of the CgCLP protein scaffold through a modular genetic design strategy: the cartoon shows a polymerized Spa2 major pilin functionalized by incorporating a POI (for example, mCherry, a fluorescent reporter protein) at candidate insertion sites (including Q35 (E1) at the N terminus, and G215 (E2), G236 (E3) and G336 (E4) in the M-domain lacking a disulfide bond) based on structural verification (a); fluorescence intensity and quantitative analysis of the amount of CgCLP fiber by whole-cell filtration ELISA (detection by anti-Spa2 antibody) (b) and confocal microscopy imaging (c) of engineered cells containing Spa2-mCherry fusion proteins inserted at different sites. d,e, Extracellular secretion and assembly of R-Spa2 pilins into CgCLP fiber at the cell-surfaces of engineered C. glutamicum cells: a series of R-Spa2 fusion protein constructs comprising functional R peptides and/or proteins with different amino acid (aa) sequences (d) and morphologies of assembled R-Spa2 CgCLP on the extracellular scaffold based on immunogold labeling and TEM imaging (e). Scale bar, 200 nm. fh, Coassembly of split-Venus components into the CgCLP fibers leading to increased fluorescence intensity. Schematic showing simultaneous expression of the two Spa2 pilin fusion proteins, N-Ven-Spa2 and C-Ven-Spa2 (N-Ven-Spa2+C-Ven-Spa2 strain), containing the N terminus (N-Ven) and C terminus (C-Ven) module of the split-Venus system40, resulting in coassembly of the split-Venus components into the final functional CgCLP structures (f). The engineered C. glutamicum cells show greater fluorescence intensity only in the N-Ven-Spa2+C-Ven-Spa2 strain (g). Confocal microscopy of C. glutamicum cells showing that the strongest Venus fluorescence signal appeared at the extracellular sites of the N-Ven-Spa2+C-Ven-Spa2 strain (h). All P values of Δspa2 strain, N-Ven+C-Ven strain, N-Ven-Spa2 strain and C-Ven-Spa2 strain versus the N-Ven-Spa2+C-Ven-Spa2 strain in g are P < 0.0001. ****P < 0.0001. The samples were collected from the Δspa2 strain harboring various plasmids that express different R-fusion proteins. Statistically significant differences were calculated by using a two-tailed t-test. Mean ± standard deviation (s.d.), n = 3 biological replicates in b and g. Scale bar, 2 μm in c and h.

Source data

In addition, the proteinaceous nature of the CLP fibers makes them potentially amenable for elaboration using genetic engineering. To determine suitable fusion sites to append peptides and/or proteins to Spa2, guided by both the Spa2 crystal structure and our characterization of specific functional domains within Spa2, we selected four different positions to test the fusion of a protein of interest (POI), with one site at the N terminus of Spa2 and three sites in the M-domain lacking a disulfide bond (Fig. 4a). TheΔspa2 strain with abrogated extracellular CgCLP formation was harnessed to harbor the exogenous expression plasmid for Spa2 fusion protein expression to test the restored CgCLP fiber production (Fig. 4a).

Using the fluorescent reporter protein mCherry, we set out to identify the interrogated positions for generating functional fusion proteins while retaining the sortase-catalyzed CLP formation capacity of Spa2. We explored four sites for mCherry insertion, including at Q35 (E1) in the N terminus of Spa2 (after the cleavage site of peptidase, Supplementary Fig. 6c), G215 in loop 1 of the M-domain (E2), G236 in the loop 2 of the M-domain (E3) and G336 in the β23-sheet of the M-domain (E4) (Fig. 4a). Fluorescence intensity and quantitative ELISA both showed that the cells expressing the fusion protein in each site fluoresced and produced corresponding fibers at varied levels (Fig. 4b). Confocal microscopy showed that mCherry fluorescence was detected for all engineered variants, with fluorescence evident at extracellular sites on the C. glutamicum cells (Fig. 4c), consistent with TEM imaging results showing that the mCherry-functionalized CgCLP proteins formed extracellular fibers surrounding the cells (Supplementary Fig. 12). Combining the results of fluorescence intensity, ELISA quantification analysis, confocal microscopy and TEM imaging, we concluded that both E1 and E2 are more ideal sites for fusion of a functional POI yielding abundant amount of functionalized CgCLP fibers.

To explore how the functionalized peptides and/or proteins of varied sequence length affect the secretion and assembly of CgCLP, we next assessed the expression of a variety of Spa2 fusion proteins (six POIs, each fused at the E1 position) (Fig. 4d), including one with a 6-His tag; one containing a SpyCatcher protein and one with its partner, SpyTag37; one with Mfp3S peptide that can promote interfacial adhesion19; one with Venus fluorescence reporter protein38 and one with catalytic protein of endo-1,4-β-glucanase from Clostridium cellulolyticum (CcEgl)39. All of these fusion proteins were successfully expressed, secreted and formed CgCLP (Fig. 4e and Extended Data Fig. 2b). Appropriate assays, such as imaging and enzyme activity assays, also confirmed that each of the POIs was functional even after CLP fiber formation (Extended Data Fig. 6).

To further quantify how the fused polypeptides and/or proteins of varied sequence length and different features affect the amount of the CLP biopolymers produced by the strains, we applied a whole-cell filtration ELISA15. The analysis showed that strains of 6His-Spa2, SpyTag-Spa2, Mfp3Spep-Spa2, SpyCatcher-Spa2 produced 1.53-, 1.35-, 1.45- and 1.27-fold CLP fibers when compared to the wild type (Extended Data Fig. 2b), and strains of Venus-Spa2 and CcEgl-Spa2 produced only 22.2 and 34.7% amount of the CLP fibers, respectively (Extended Data Fig. 2b). These results indicate that polypeptides with longer sequence length or certain secondary structures may affect the conformation of Spa2 and decrease the catalytic efficiency of sortase-catalyzed polymerization. However, the sortase-mediated polymerization is not completely abolished by fusion of POIs to Spa2 monomers, indicating that various types and sizes of proteins can be engineered into a generally programmable extracellular protein scaffold of CgCLP.

To assess whether our programmable CgCLP extracellular protein scaffold can support the coassembly of multiple heterologous proteins, we conducted experiments in the Δspa2 strain with the well-established spilt-Venus system40 (Fig. 4f). Coassembly of two distinct proteins did not disturb CgCLP assembly as indicated by TEM images (Supplementary Fig. 13) and the results of ELISA quantification analysis (Extended Data Fig. 2c). The highest fluorescence intensity was observed in cells where the split-Venus components were simultaneously fused with Spa2 (Fig. 4g,h). Almost no fluorescence was detected when only N-Ven and C-Ven were simultaneously secreted without anchoring to the CgCLP scaffold (Fig. 4g,h). These results indicated that the split components can be coassembled in the extracellular CgCLP scaffold. With this established programmable CLP in C. glutamicum, we thus opened up a new possibility for the design of pili-enabled living materials.

Pili-enabled ELMs for biomass-to-chemical conversion

Our positive results building on the split-Venus system showing successful coassembly of two types of fusion protein within Spa2 suggested the potential for metabolic channeling applications. Cellulosic biomass is an abundant source of fixed, renewable carbon that represents a promising alternative to fossil petroleum as a feedstock for producing a wide range of chemicals. C. glutamicum cells do not have an endogenous capacity to degrade cellulose chains into sugar monomers41. We next turned to coassemble multiple cellulases into a catalytic cascade for extracellular degradation of cellulose into glucose to support production of specific chemicals of interest (for example, lycopene) in C. glutamicum (Fig. 5a). We tested this hypothesis by coassembly of the endo-1,4-β-glucanase from Trichoderma reesei (TrEgl)39 and β-glucosidase from Saccharophagus degradans (SdBgl)42 in the CgCLP fiber, as these two enzymes are known to work together to degrade cellulose into glucose via enzyme cascade reactions (Fig. 5a).

Fig. 5: ELMs based on the programmable C. glutamicum pilus structure for lycopene production from biowastes.
figure 5

a, Schematic illustrating of engineered C. glutamicum living materials transforming cellulosic biomass into a value-added product of lycopene by combining the extracellular cellulose degradation capacity and intracellular bioconversion ability. Specifically, for extracellular cellulose degradation (step 1), endo-1,4-β-glucanase from TrEgl and a β-glucosidase from SdBgl were simultaneously fused with Spa2 pilin (TrEgl-Spa2+SdBgl-Spa2) and coassembled into a CgCLP structure, potentially forming a catalytic cascade for the extracellular degradation of cellulose into glucose. For intracellular transformation (step 2), the glucose was used for lycopene production in the pathway engineered C. glutamicum of C003 strain by inducing IPTG. G3P, glyceraldehyde-3-phosphate and IPP, isopentenyl phosphate. b, ELMs can degrade CMC-Na in a medium from a viscous gel to a thin solution only when both TrEgl and SdBgl were coassembled into the CgCLP structure (TrEgl-Spa2+SdBgl-Spa2, C003 strain), outperforming the case of the secreted free enzymes (TrEgl+SdBgl, C004 strain). Δspa2Δdec (C001 strain) is the negative control strain. c, Degradation assays using CMC-Na as the substrate. The C003 strain showed fourfold higher enzyme activity than the C004 strain. P values are indicated above the bars. Not significant, NS, P > 0.05; ****P < 0.0001; C003 strain, C004 strain versus C001 strain using a two-tailed t-test, from three biologically independent samples, mean ± s.d.). d, HPLC assay for lycopene production with the C003 strain cultured in M63 medium with the replacement of the carbon source of glucose by CMC-Na with lycopene production induced by the addition of IPTG. Mean ± s.d., n = 3 biological replicates in d.

Source data

We next turned to create a catalytic cascade of multiple cellulases to degrade cellulose into glucose to support lycopene production in C. glutamicum. We first constructed a C001 chassis (Δspa2Δdec) with deletion of both the spa2 gene (Δspa2, for the abrogation CgCLP formation) and a 43,702 basepair (bp) region between CEY17_RS03380 and CEY17_RS03560dec, for accumulation of the precursor)43 (Extended Data Fig. 7a). To construct the basal lycopene-producing strain C002, we introduced a plasmid of P1 for isopropyl-β-d-thiogalactoside (IPTG)-inducible expression of the dxs gene and crtEBI gene cluster (Extended Data Fig. 7b,c). We then constructed the C003 strain for cellulose degradation by simultaneous expression of TrEgl-spa2 and SdBgl-spa2 genes in the C002 strain transformed with a plasmid P2 encoding the two genes (Extended Data Fig. 7b,c). The C003 strain coassembled TrEgl and SdBgl in CgCLP fiber on the extracellular scaffold (Extended Data Figs. 2d and 7d) and enabled the degradation of carboxymethylcellulose sodium (CMC-Na, the ether derivate of cellulose) in medium, as indicated by the medium turning from a viscous gel to a thin solution (Fig. 5b). By contrast, the C004 strain, which only simultaneously secreted both TrEgl and SdBgl without anchoring to the CgCLP scaffold, did not show similar behavior (Fig. 5b and Extended Data Figs. 2d and 7d).

By using the CLP-based extracellular protein scaffold for multi-enzyme colocalization in the presence of C003 strain (Fig. 5c), the ELMs could drastically improve the degradation efficiency and produce a four-fold higher yield of glucose compared to the simultaneously secreted cellulases (C004 strain, Fig. 5c). As shown in Fig. 5d, the lycopene production titer in C003 strain reached 0.83 mg g−1 dry cell weight after 36 h of culture in a M63 medium with CMC-Na as the sole carbon resource. Admittedly, this yield is substantially lower than that achieved by C. glutamicum using a combined pathway engineering approach44. Nevertheless, our work here illustrated an example of applying ELMs for lycopene production by coupling the extracellular enzymatic degradation capacities with intracellular bioconversion ability. In addition, the C. glutamicum-based ELMs could simultaneously sustain self-growth and lycopene production using cellulose biomass waste as the sole carbon source, therefore expanding both the functionalities and application scenarios of existing engineered strain systems15,16. Future yield improvement in lycopene production with our constructed ELMs can be achieved by integrating the state-of-the-art directed enzyme evolution45 and machine-learning approaches46 to enhance the performance of coassembled cellulases in CgCLP fiber to improve the conversion efficiency of cellulose to glucose. In addition, further efforts in pathway optimization44 are expected to enhance the metabolic flux from glucose to lycopene.

Collectively, we here demonstrate a new type of ELMs by programming our engineerable CgCLP biopolymer for anchoring multiple cellulases on the extracellular scaffold. Our work thus offers an example of direct ‘upcycling’ of a renewable waste resource into a value-added chemical by combining extracellular and intracellular bioconversion abilities. This programmable CLP-enabled living material would push the boundary of self-organizing living biofilm materials, which were mainly based on the programmable amyloid fibers of E. coli and B. subtilis biofilms15,16. Future reengineering the CLP-based extracellular protein scaffold with other types of enzyme or by rewiring other types of intracellular pathway synthesis route in our engineered strain, one can envision that our ELMs would exhibit more tunable functionalities (such as degrading of polyethylene terephthalate with PETase47 and degrading chitin with chitinases48) or enable direct ‘upcycling’ of a renewable waste resource of cellulose into other valued chemicals.

Discussion

Recent studies have harnessed both model and nonmodel microorganisms with their endogenous biopolymers to create self-organizing living materials with diverse functionalities. Despite those advances, all of the previous efforts were mainly based on existing molecular biology knowledge for a limited number of endogenous biopolymers in certain bacteria. Further bottom-up design and application of self-organizing living materials has been hampered due to the lack of engineerable chassis and limited access to programmable building blocks in nonpathogens.

In this work, we have established an integrative technological workflow for efficient design of living materials by rationally combining tools from bioinformatics, structural biology and synthetic biology. The workflow enables rapid mining of biopolymer-producing nonpathogenic chassis, understanding of the molecular assembly mechanism of biopolymers and engineering of biopolymer building blocks for living materials in a systematic and sequential manner. By using this approach, one can search for biopolymer-producing strains in nonpathogenic industrial workhorses among a large library of bacteria and accordingly perform new ELMs design from scratch, regardless of the current knowledge of endogenous biopolymers. For example, uncovering the biosynthesis pathway of polysaccharides, such as surface capsular polysaccharides in probiotics, can be harnessed to engineer bacteria for living therapeutics, while mining the biosynthesis pathway of the highly negatively charged polyamides, such as poly-γ-d-glutamic acid, in nonpathogens may enable bioremediation of heavy metal ion-contaminated rivers.

Our BBSniffer software, based on bacterial databases, currently is only limited to searching for biopolymer-producing bacterial strains; however, it can be improved for fungi and mammalian cell-enabled living materials design by incorporating more relevant databases. In addition, although our BBSniffer software proves useful in recommending various biopolymer-producing strains with output information such as culture conditions and genetic manipulation tools, the performance of the software for searching different types of biopolymer BGC may vary to a certain degree owing to the variable levels in deciphering the biopolymer BGC and assembly mechanisms in different biopolymer systems. In the future, to improve the performance of the BBSniffer software, machine-learning approaches can be applied to refine the characteristics of a specific biopolymer BGC and its assembly mechanisms to generate more tailored and accurate detection rule for each type of biopolymer.

To further improve the design of ELMs in the future, our established workflow can be coupled with machine-learning methods and protein computation tools, such as Alphafold2, which excels in protein structure prediction from amino acid sequences46, and trRosetta, which enables the de novo design of proteins with new functions49. For example, using such combined approaches, various proteinaceous building blocks with predictable behaviors can be designed to generate ELMs with tailored properties. We envision that our approach will guide future efforts in mining and deciphering diverse biopolymers of interest in nonpathogens or even industrial workhorse bacteria, and eventually accelerate the design of living materials with tailored functionalities.

Methods

BBSniffer software and associated workflow

Open-source BBSniffer software was developed to uncover target biopolymer biosynthesis in nature and generate a screened list of the biopolymer-producing bacteria. Specifically, the complete workflow of BBSniffer is provided below.

Genome and protein profile extraction

Input protein family IDs of related structural materials, obtained from InterPro database and Pfam database, were searched against the UniProt database to obtain proteins of interest and the genomes where proteins were located. Query results were saved in table format under the following columns: protein family ID, entry name, reviewed, protein names, genes, organism, organism ID, length and sequence of amino acids.

For genome extraction, genome sequences and GenBank files were downloaded based on organism IDs (NCBI taxa IDs) in the table using the NCBI-genome-download tool (https://github.com/kblin/ncbi-genome-download). UniProt entries were accessed via API queries (https://www.uniprot.org/help/api_queries). In UniProt query results, the organisms IDs can either be a species-level ID or a strain-level ID. If an organism ID from the results was given at the species level, all complete and chromosome-level genomes of the strains under the same species ID were downloaded. All the downloaded genomes were saved in .fasta format.

For protein profile extraction, proteins sharing the same input protein family IDs were aligned with Clustal Omega, then related hidden Markov model (HMM) profiles were generated by HMMER with default parameters. Fasta and genebank files were processed with Biopython.

Biopolymer BGC detection

AntiSMASH25 was modified to identify the biopolymer BGC of interest in downloaded genomes. Modified algorithm of antiSMASH uses a new rule to identify the target biopolymer cluster type. To be eligible for a target biopolymer BGC, the genes encoding functional proteins for biopolymer biosynthesis (using the ‘HMM-profiles’) in the downloaded genomes (using the ‘genome sequences file’) should be present in a specified length region of a genome defined by a ‘rule file’. The rule file stipulates that the maximum DNA length between the coding sequences of protein families in a genome should be 20 kbp and the DNA length of the extra region outside the coding sequence of protein families in a genome should be exactly 8 kbp. In addition, the rule file also contains a description file specifying the path of the profile HMMs. After that, all downloaded genome sequences were analyzed for the target biopolymer BGC by the modified antiSMASH. Finally, the detected genomes with biopolymer BGC information were integrated into table format by a Python script.

Building an internal bacterial database

Next, identified strains containing the target biopolymer BGC were classified into ‘pathogens’, ‘industrial microorganisms’ and ‘other nonpathogens’ based on the annotation of an integrated bacterial database. The bacterial database was created in the following manner. First, all bacteria information was downloaded from the PATRIC database using PATRIC’s P3 scripts (https://docs.patricbrc.org/cli_tutorial/index.html). Second, strains were classified as ‘pathogens’ if the strains were found associated with any disease information in the PATRIC database. Third, the label ‘industrial microorganisms’ was assigned to bacteria included in PROBIO (http://bidd.group/probio/download.htm), MicrobiomePost.com (https://microbiomepost.com/probiotic-strain-database/), Generally Recognized as Safe organisms (https://www.cfsanappsexternal.fda.gov/scripts/fdcc/index.cfm?set=GRASNotices) or the strains or species used for chemical or biomaterial production (extracted from several publications). Finally, strains that were neither pathogens nor industrial microorganisms were classified as others. In addition, we used the BacDive database (https://bacdive.dsmz.de/), which is a free and the worldwide largest database for standardized bacterial information, to annotate the strains with additional information including anaerobes and/or aerobes and condition of growth (for example, medium and growth conditions). Following a similar strategy of the literature-reported SynBioStrainFinder database50, which is a microbial strain database of manually curated CRISPR–Cas genetic manipulation system information), we have annotated our own bacterial database with additional information regarding the available genetic manipulation tools (such as the CRISPR–Cas system) of the industrial microorganisms. The internal bacterial database we built for BBSniffer is available at GuitHub (https://github.com/xbiome/BBSniffer/blob/publish/Database/Bacteria_database.xlsx).

Phylogenetic tree construction

After classification, genomes of industrial microorganisms and the reference strain were used to construct a distance-based phylogenetic tree using JolyTree51. The generated tree was saved in Newick format by default and then visualized using the ggtree R package52.

Candidate strain generation

The candidate strains were ranked with two rules: (1) whether they belong to the ‘industrial microorganisms’ class and (2) the distance to the selected reference strain, which can be extracted from the distance matrix of the JolyTree results. The lower the score of a strain, the closer it is to the reference strain and, thus, the higher its ranking in the recommendation list. Finally, the top five strains (distance ranked from low to high) are recommended as the candidate strains for further engineering of targeted biopolymers.

All the steps mentioned above were wrapped as a standalone Python program named BBSniffer, accessible at GitHub (https://github.com/xbiome/BBSniffer/).

Assessment of the performance of BBSniffer

We performed more bioinformatics experiments for all the four types of biopolymer, sortase-assembled CLP, BC, GV and BMC, and assessed the success rate of the BBSniffer software to illustrate the power of our established BBSniffer workflow. Specially, we assessed the final success rate of the BBSniffer software by taking two general cases into account. First, we checked the detection accuracy of the BBSniffer software in searching all the literature-reported and experimentally well-characterized strains (usually containing well-defined information about the specific biopolymers and corresponding BGCs) by calculating their coverage rate. Second, for those searching results that have not yet been identified or reported in literature, we applied the existing knowledge and features of biopolymer BGC as the main reference and then manually inspected the annotation results of those predicted biopolymer BGC strains. We eventually assessed the overall performance of our established workflow based on the results of the aforementioned four types of biopolymer.

We defined the final success rate of the BBSniffer software for searching the target biopolymer containing strains based on the equation:

$$\begin{array}{ccc}{{\eta }} & = & ({\rm{A}}1{\prime} +{\rm{A}}2{\prime} )/({\rm{A}}1+{\rm{A}}2)\\ {\rm{R}}1 & = & \frac{{\rm{the}}\; {\rm{coverage}}\; {\rm{number}}\; {\rm{based}}\; {\rm{on}}\; {\rm{BBSniffer}}\; {\rm{detection}}\; {\rm{results}}}{{\rm{the}}\; {\rm{number}}\; {\rm{of}}\; {\rm{BGC}}-{\rm{containing}}\; {\rm{strains}}\; {\rm{reported}}\; {\rm{in}}\; {\rm{literature}}}=\frac{{\rm{A}}1{\prime} }{{\rm{A}}1}\\ {\rm{R}}2 & = & \frac{{\rm{the}}\; {\rm{number}}\; {\rm{of}}\; {\rm{verified}}\; {\rm{strains}}\; {\rm{that}}\; {\rm{contain}}\; {\rm{the}}\; {\rm{BGC}}\; {\rm{based}}\; {\rm{on}}\; {\rm{manual}}\; {\rm{inspection}}}{{\rm{the}}\; {\rm{number}}\; {\rm{of}}\; {\rm{randomly}}\; {\rm{picked}}\; {\rm{up}}\; {\rm{strains}}\; {\rm{based}}\; {\rm{on}}\; {\rm{BBSniffer}}\; {\rm{detection}}\; {\rm{results}}}=\frac{{\rm{A}}2{\prime} }{{\rm{A}}2}\end{array}$$

where η is the success rate of BBSniffer software for searching the target biopolymer containing strains. R1 is defined as the detection accuracy rate of the BBSniffer software using the BGC-containing strains reported in literature as references. We first search a certain number of published papers that contain clear information about the target biopolymer BGC-containing strains. Using the information as a reference, we calculate the coverage rate of the BBSniffer software by comparing the information of experimentally characterized biopolymer BGC with BBSniffer detection results.

R2 is defined as the detection accuracy rate of the BBSniffer software in predicting those software-mined but unidentified strains (corresponding information about the strains containing target biopolymer BGC is unknown). Several software-mined strains were randomly chosen and, for each selected strain, we performed more bioinformatics experiments to analyze and annotate the functional genes in biopolymer BGC of BBSniffer-mined strains. Then, using the well-defined biopolymer BGC features as a reference, we determined whether the strain indeed contains the related biopolymer BGC. Note that the BBSniffer-detected strains with target BGC were functionally annotated through emapper53 and NCBI blast.

To determine whether the predicted strains contain the target CLP-BGC, we use the well-defined CLP-BGC character as a reference: there are at least one pilin and one sortase in the detected BGC23. We determined whether the predicted strains possess the capability for the biosynthesis of BC by following the well-studied rule: BcsA and BcsB are necessary and sufficient for forming the BC polysaccharide chain in vitro54. GV formation involves the primary structural protein GvpA protein and additional required Gvp proteins, all of them are encoded in the gvp gene clusters14. Therefore, we determined whether the predicted strains have the capability for the formation of GV with the following rule: GvpA proteins and at least one other Gvp protein must exist in the detected GV-BGC. BMCs comprise multiple shell proteins surrounding enzyme cargos, typically encoded on a single gene cluster55, we use these characters as a reference to determine whether the predicted strains contains the BMC-BGC.

General methods

The original DNA sequence was fully synthesized (Genewiz) or PCR generated. All PCR products were generated by KOD DNA polymerase (Toyobo). All plasmid construction was performed using the T4 DNA ligase (New England BioLabs) for ligations or the NEB Builder HiFi DNA Assembly Master Mix (New England BioLabs) for assembly. All plasmids or markerless strains were confirmed by DNA sequencing (GENEWIZ). Primers and protein sequences are listed in Supplementary Tables 4 and 5.

Growth media

C. glutamicum ATCC 14067 was provided by S. Zheng’s research group at the South China University of Technology. C. glutamicum ATCC 14067 was grown in BHI (brain, heart infusion) liquid medium for recovery (37 g l−1 BHI, Becton, Dickinson and company) at 30 °C, 250 rpm, overnight. For CgCLP formation, C. glutamicum ATCC 14067 was inoculated into M63 liquid medium (15.6 g l−1 M63 Broth (Sangon Biotech, Guangzhou, China), supplemented with 1 mM MgSO4, 0.2% (wt/vol) glucose) and cultivated in an incubator at 30 °C without shaking for 2–3 days. Antibiotics for C. glutamicum culture were kanamycin (25 μg ml−1) and chloramphenicol (7.5 μg ml−1).

IPTG at 1 mM or theophylline at 1 mM was used to induce gene expression. Trans1-T1 (TransGen Biotech) was used as the cloning host for plasmid manipulation, and E. coli BL21 (DE3) (New England BioLabs) was used for protein expression. E. coli was cultured in Luria-Bertani medium (10 g l−1 peptone, 5 g l−1 yeast extract, 10 g l−1 NaCl) at 37 °C or 16 °C when applicable for protein expression. Antibiotics for E. coli culture were kanamycin (50 μg ml−1) and chloramphenicol (30 μg ml−1). IPTG at 0.5 mM was used to induce gene expression. All bacterial strains used in this study are listed in Supplementary Table 6.

TEM and immunogold labeling

TEM imaging

C. glutamicum cells cultured for 2–3 days in M63 medium were collected and washed twice in PBS buffer and 20 μl with an optical density (OD600) of roughly 1 of M63 liquid cultures were deposited onto carbon-coated TEM grids for 5–10 min. The samples were washed two times with 50 μl of PBS buffer and three times with 20 μl of water and then the excessive solution were quickly wicked away with filter paper. The fixed cells were negatively stained with 15 μl of 2 w/v% uranyl acetate solutions for 1 min and dried for 10 min under an infrared lamp. Samples were examined in a JEOL JEM-1400 TEM at an accelerating voltage of 120 kV.

Immunogold labeling

Partial of the coding sequences of CgCLP pilins of Spa1 (residues 44–473 of Spa1, Spa1ab), Spa2 (residues 35–469 of Spa2, Spa2ab) and Spa3 (residues 31–235 of Spa3, Spa3ab), were expressed in E. coli, purified and injected into rabbits to prepare the specific polyclonal antibodies of α-Spa1, α-Spa2 and α-Spa3 (ChinaPeptides), respectively. For immunogold labeling, 20 μl with an OD600 of roughly 1 of M63 liquid cultures were placed on carbon-coated grids for 10 min, washed twice with PBS buffer and three times with water. The samples were blocked with PBS with 1% bovine serum albumin (BSA) for 30 min. The solution was wicked off with filter paper and the fixed cells were stained with a pilin primary antibody diluted 1:200 in PBS with 1% BSA for 1 h, followed by washing and blocking. Samples were stained with 10 nm gold-decorated goat antirabbit IgG (Bioss) diluted 1:50 in PBS with 1% BSA for 45 min, followed by washing three times with PBS and five times with water. Then, negative staining, drying and imaging were performed. Double immunogold labeling experiments were performed according to a previous publication with some modification56. Briefly, after the primary antibody incubation, samples were incubated with PBS containing 3% paraformaldehyde and 2% glutaraldehyde for 2 h. Samples were washed three times with PBS and incubated with 0.02 M glycine in PBS for 10 min. The immunogold labeling process was performed with the second pilin antibody and different sizes (5, 15 or 30 nm) of gold-decorated goat antirabbit IgG, followed by negative staining, drying and imaging.

Quantitative assay of CLP via whole-cell filtration ELISA

A method of whole-cell filtration ELISA to detect the presence of extracellular amyloids was adopted15 for quantitative analysis of CLP. Briefly, C. glutamicum strains were cultured for 48 h in M63 liquid medium, and cultures were collected, washed and diluted to an OD600 of 0.1 in Tris-buffered saline with 0.1% Proclin 300 (TBS + 0.1% Proclin 300) on ice. Then, 25 μl of the diluted culture was loaded in a Multiscreen-GV96-well filter plate, followed by washing, blocking, incubating with α-Spa2 (diluted to 1:5,000) and washing, blocking and incubating with goat antirabbit horseradish peroxidase-conjugated secondary antibody (diluted to 1:5,000) (Sangon Biotech). Subsequently, a chromogenic reaction was performed via ultra-3,30,5,50-tetramethyl-benzidine, which was terminated by the addition of 2 M H2SO4. Finally, the product was measured absorbance at 450 nm (a reference wavelength of 650 nm) with a Cytation reader (BioTek).

Protein crystallization and structure determination

The final purified protein was concentrated to 20 mg ml−1 in 10 mM Tris-HCl pH 8.0 and 50 mM NaCl for crystallization. The sitting drop vapor diffusion technique was used to crystallize the Spa2 protein. Crystals were obtained by mixing 4 μl of Spa2 protein with 4 μl of reservoir solution (0.2 M sodium sulfate, 0.1 M Bis-Tris propane pH 7.5, 20% (w/v) polyethylene glycol 3350) after the mixture was incubated at 18 °C for 1–2 weeks (Supplementary Fig. 14a). The crystals were soaked in a cryogenic-protectant solution consisting of the reservoir solution and 20% (v/v) glycerol, and then quickly frozen with liquid nitrogen. Diffraction data were collected on the BL18U1 beamline at the Shanghai Synchrotron Radiation Facility with flash frozen crystals (at 100 K in a stream of nitrogen gas). The data were processed by XDS software and then further processed using STARANISO (a server of Global Phasing Company).

The recombinant Spa2 crystal form diffracted to 2.73 Å resolution (Supplementary Fig. 14b) and belongs to the space group P212121, with unit-cell parameters a = 45.7 Å, b = 64.1 Å, c = 442.0 Å, α = β = γ = 90.0° and two molecules in the asymmetric unit. The structure was solved by the molecular replacement method using PHASER57 and the predicted Spa2 coordinates by Alphafold Colab46 as template. Further manual model building was carried out using COOT. The model was refined by PHENLX. Data collection, phasing and refinement statistics are given in Supplementary Table 3. Structure figures were prepared using PyMOL v.2.3.4 (https://pymol.org/2/).

Enzymatic activity assay

The enzyme activity of cellulases against carboxymethylcellulose sodium salt (CMC-Na, Sigma) was detected using a 3,5-dinitrosaloculoc acid (DNS) assay58. Cells of TrEgl-Spa2_SdBgl-Spa2 (C003 strain) and TrEgl_SdBgl (C004 strain) at an OD of 10 were concentrated to 500 μl and incubated in 2 ml of 50 mM acetic acid (pH 4.8) with 1% (w/v) CMC-Na substrate at 50 °C for 30 min. The reaction was stopped by adding DNS and boiling for 10 min; reducing sugars were detected at 540 nm. One unit of enzyme activity was defined as the number of cells that released 1 μmol of glucose from cellulose at 50 °C in 1 min. The enzyme activity of endo-1,4-β-glucanase was determined using the manual assay kit (K-CellG5-2V, Megazyme).

Quantitative analysis of lycopene by HPLC

The lycopene-producing plasmid of pZ9-dxs_crtEBI was transferred into strain TrEgl_SdBgl to construct the recombinant strains of C003 and C004 for the use of cellulose to produce lycopene. C003 and C004 strains were inoculated into 10 ml of BHI with 25 μg ml−1 kanamycin and 7.5 μg ml−1 chloramphenicol, and cultured for 12 h at 30 °C at a stirring speed at 200 rpm. Then cells were transformed into 50 ml of modified M63 medium (15.6 g l−1 M63 broth, supplemented with 1 mM MgSO4, 2% (wt/vol) CMC-Na) with initial OD600 of 3 for 2 days at 30 °C and 1 mM IPTG was added or not.

A previous approach was adopted for the quantitative analysis of lycopene production44. IPTG induced and uninduced cells (1 ml) were separately collected into 2 ml tubes of lysing matrix Y (M.P. Biomedicals) by centrifugation at 13,523g for 5 min. The pellets were resuspended in a 60% hexane and 40% acetone mixture and lysed using the FastPrepR-24 5G bead beating grinder and lysis system (M.P. Biomedicals) for lycopene extraction. The lysis condition is 30 s once with a 1 min interval six times.

The samples were centrifuged at 18,406g for 10 min at 4 °C, and the resulting supernatant was then transferred to brown 2 ml screw cap glass vials (Agilent Technologies) and directly subjected to high-performance liquid chromatography (HPLC) analysis. The quantification of lycopene was performed on an Agilent 1260 series HPLC system (Agilent Technologies) using YMC Carotenoid (250 × 4.6 mml.D., YMC) and detected via a diode array detector at 450 nm. For separation, binary gradient elution was applied to change the eluent from 100% eluent A of methanol:methyl tert-butyl ether:water (81:15:4) to 100% eluent B of methanol:methyl tert-butyl ether:water (7:90:3) over 90 min at a flow rate of 1.0 ml min−1 at 20 °C with an injection volume of 10 μl.

Statistics and reproducibility

All results presented in graphs show the mean data ± s.d. using at least three technical replicates. A two-tailed Student’s t-test was used to calculate P values. Sample sizes for all the micrograph data (confocal images, TEM and AFM) were at least three independently biological replicates, and the replicate experiments yielded similar results.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.