Introduction

Microbial survival is largely determined by the ability to adapt to fluctuating nutrient supplies in natural environments. Carbohydrate utilization is one of the most important nutrient acquisition functions to microorganisms. The phosphoenolpyruvate (PEP):carbohydrate phosphotransferase system (PTS) is involved in the uptake of diverse carbohydrates and regulation of microbial metabolism, which has been extensively characterized in a few model microorganisms [1,2,3,4]. Its function has been expanded to nitrogen and phosphate metabolisms [5], antibiotic resistance [6, 7], biofilm formation [8, 9], expression of virulence genes [10, 11], endotoxin production [12], and regulation of quorum sensing [13]. Therefore, the PTS is important to biogeochemical carbon cycling, ecological homeostasis, and health status of higher life forms.

The complete PTS has three core components [2], including Enzyme I (EI), histidine-phosphorylatable phosphocarrier protein (HPr), and Enzyme II complex (EII) made up of EIIA, EIIB, and EIIC/D. To transport sugars, the high-energy phosphoryl group donated by PEP is transferred from EI to HPr, then to EIIA/B, and finally to sugars during transport by EIIC [2, 4, 14]. To regulate carbon metabolism, the core PTS components can either transfer the high-energy phosphoryl group to downstream target proteins or interact with target proteins to form a complex, which results in activation/inactivation of those target proteins. For example, LevR in Bacillus subtilis can be activated by HPr (P~15His-HPr) through phosphorylation to regulate levan degradation [15] and the P-46Ser-HPr phosphorylated by HPr kinase/phosphorylase (HprK) can biochemically interact with a LacI-family repressor, catabolite control protein A (CcpA), to cause carbon catabolite repression (CCR) in B. subtilis [16, 17]. However, these canonical mechanisms have only been characterized in a small set of model bacteria (e.g., B. subtilis) which all contain a complete set of the three core PTS components. Previous studies have found that many proteobacteria do not have EIIB or EIIC components, which might affect PTS-mediated carbohydrate metabolism [2, 3, 18]. With the advancements of microbial genome sequencing and extensive culturing studies of environmental and host-associated microorganisms [19, 20], it raises questions on the overall prevalence of incomplete PTSs in microbes and carbon regulatory mechanisms for those incomplete PTSs [2, 21].

Carbohydrate availability either in natural or artificial biosystems, such as gastrointestinal tracts [22,23,24] or microbial consortia-based bioprocessing systems [25], determines microbial diversity and functional outcomes. Understanding carbon utilization mechanisms would facilitate microbe-specific or microbiome-oriented modulation towards improved functions [24]. To do so, we made use of a large number of fully assembled prokaryotic genomes and performed computational and experimental investigation into PTS components. Distinct from canonical PTSs in bacteria [2,3,4], here we describe the architecture and distribution of incomplete PTSs with loss of conserved components through massive genome mining in prokaryotic genomes, and revealed novel functional diversification of incomplete PTS components by integrated multi-omics analyzes in a panel of PTS mutants, and solved the crystal structure of key components that are pivotal to carbohydrate metabolism in incomplete PTSs.

Results

High prevalence and phylogeny-independent distribution of incomplete PTSs in prokaryotes

To systematically understand the distribution and configuration of PTSs, we performed genome mining for PTS core and associated components in 14,998 sequenced prokaryotic genomes from 4,293 species classified by GTDB-Tk [26], including archaea and bacteria (Fig. 1A). In total, 23 HMM profiles of PTS relevant components (Table S1), including HPr, EI, EIIA, EIIB, EIIC/D, HprK, and LacI-family regulators, were used to search for homologs in each genome through hmm search against our inhouse database containing 56,869,107 proteins built from these genomes. HPr, EI, EIIA, EIIB, EIIC/D are considered as core components of PTS, and HprK and LacI regulators are considered as associated components of PTS. Homologs are identified to be a single or fused architecture depending on the number of modules predicted. We then cataloged the copy number of each hmm module, the occurrence of each PTS component, and the completeness of a PTS in each genome along with corresponding phylogenetic information. PTSs consisting of at least one set of core components (EI, HPr, EIIA, EIIB, and EIIC/D) with singular architecture are considered as complete PTS, with the potential of direct sugar transport; in contrast, those lacking at least one of above core components are named incomplete PTS presumably incapable of sugar transport [2, 4].

Fig. 1: Architectural and compositional diversification of phosphotransferase systems (PTS) in prokaryotes.
figure 1

A Schematic workflow for genome mining of PTS systems in 14, 998 complete genomes of 4,293 species, including archaea and bacteria. B Venn diagram showing the number of microbial species that miss different components (na). Among 3164 species with incomplete PTS systems, 3008 species have an incomplete EII complex for sugar transport due to the absence of either EIIA, EIIB, or EIIC. C The distribution of PTS relevant components across the microbial phylogenetic tree. The copy number or completeness of PTS components are shown in different tracks. D Heatmap of Jaccard coefficients to correlate the co-occurrence of PTS components. E The averaged copy number (upper) and proportion (lower) of single or fused PTS components in prokaryotic genomes. F The probability of amino acids at positions 11–21 in HPr homologs. All sequences were aligned and split to three categories, H15-carrying HPr, H15-free HPr, and S16-carrying HPr. G The phylogenetic tree of microbial species that only carry H15-free HPr homologs and presumably lost the function of PTS-mediated sugar transport. The presence of HprK and EI and the completeness of EIIABC (EII) were labeled blue for each species. A group of species in clostridia have been reported to be cellulose degraders. The genetically tractable R. cellulolyticum H10 was selected as a representative to dissect the function of non-canonical PTS.

At the genome level, we observed 3,164 prokaryotic species have an incomplete PTS due to the lack of one or more PTS core components (Table S1), accounting for 73.70% of species analyzed in this study. In particular, 981 species do not have an HPr homolog (Fig. 1B), 840 species without EI, 3,008 species without a complete EIIABC complex (Figure S1A). It is notable that 5 of 243 archaeal species carry all core components of PTS (HPr, EI, EIIA, EIIB, and EIIC) that are required for sugar transport. Among them, Haloferax mediterranei [27] and Haloferax volcanii DS2 [28, 29], have been demonstrated to use the PTS to transport fructose. Moreover, extensively sequenced genomes of some bacterial species allowed us to dissect inter-strain variation of the configuration of PTS, which has been previously overlooked. For example, among 27 B. subtilis strains, 12 strains have all PTS components whereas the others are missing EI or EIIC. Inter-strain variations were also observed in Escherichia coli and Clostridioides difficile (Table S1). All these observations revealed novel insights into the unrecognized prevalence of incomplete PTSs in prokaryotes and the possibility of inter-strain variations in PTS configuration.

We reassociated the distribution of each PTS core or associated component to microbial phylogeny (Fig. 1C and Table S1). As EIIA, EIIB, and EIIC often form a protein complex that interacts with sugars that will be transported [2, 4, 30,31,32], we evaluated and grouped the overall completeness of EIIABC into three categories (absent, partial or complete). In general, PTS components have little shared patterns for their occurrence across the phylogenetic tree. For example, most HPr-carrying species, especially in members of Chlamydia, Actinobacteria and Gamma-proteobacteria, lack HprK homologs; species with EI may not have the potential to assemble a complete EIIABC complex. One exception is that HPr and EIIABC completeness tend to have a similar distribution pattern (Fig. S1B). The Jaccard correlation analysis that only distinguishes the occurrence of the PTS components rather than the copy number in a given genome, revealed that HPr and EIIA have a high correlation coefficient (Figs. 1D;  S1C), indicating they are likely to co-occur in the genome. Both HPr and EI, which are usually encoded by the ptsHI operon in several well-studied bacteria [2], have very low correlation in their occurrence in this large dataset. Although LacI-family regulators, such as CcpA, were reported to interact with HPr to regulate catabolic genes in Firmicutes [2, 4, 16, 17], we observed a large variation in the copy number of LacI-family members across the phylogeny but there is no clear association with the presence of HPr (Fig. 1D). These observations elicit new questions about the evolution and functional relevance of PTS components.

The presence of each PTS core or associated component has no clear distribution pattern across the phylogeny of its origin (Figs. 1C;  S1D). For example, in the phylum of Firmicutes, HprK mainly exists in bacilli and clostridia rather than negativicutes. This variable inter-class distribution was also observed in Proteobacteria, where HprK is mostly absent in Gamma-Proteobacteria and deltaproteobacteria but present in betaproteobacteria. It is also notable that nearly all PTS components, except EI, are missing in Campylobacterota and a few clusters of species in other Classes of bacteria, including Bacteriodia, Actinobacteria, Clostridia, and Gamma-Proteobacteria; however, these absences are distributed with no clear patterns across the tree. The large compositional deficiency in Campylobacterota is consistent with its most likely autotrophic lifestyle [33]. This unpredictable distribution suggests the overall evolutionary relationship cannot reflect presence or absence of PTS components.

Compositional and motif-level features confine the potential role of PTSs in sugar transport

The genetic and molecular features of PTS components were further investigated to determine sugar transport. EI and EII components vary in the copy number of their homologs and their protein architecture (Fig. 1E). For instance, EI PEP_utilisers_N, and EIIB_GUT_C have a low copy number in genomes and are often fused with other modules to form multi-modular proteins; PTS_EIIC have multiple copies and are also mainly present in a fused architecture. Compared to other components, EI most widely exists in both archaea and bacteria, except flavobacteria, cytophagia, and sphingobacteria in the phylum of Bacteroidetes. By examining the distribution of EIIC transporters (Fig S1D), we found that Actinobacteria, Clostridia, Bacilli, Negativicutes, Gamma-Proteobacteria, Alpha-Proteobacteria, and Fusobacteria tend to have more diverse sugar transporters than other classes, which may indicate their genetic potential of transporting and utilizing different carbohydrates.

In canonical PTSs, the H15 in HPr is an indispensable amino acid for transferring the high-energy phosphate moiety from EI to the sugar transporting EIIABC complex or other regulatory proteins [6, 16, 17]. Analysis of HPr homologs in 4293 prokaryotic species revealed a very conserved motif sequence around H15 (Fig. 1F). However, 10% of HPr homologs have H15 substituted by other amino acids, such as Q, and 85% of these come from 53 species with an incomplete PTS (Fig. 1G). Most HPr homologs lacking H15 have a phosphorylatable Serine at site 16 and presents a very distinct motif sequence (Fig. 1F). It indicates the substitution of H15 may disable PTS-controlled sugar transport and affect the regulatory role mediated by H15 phosphorylation. It is notable that a cluster of H15-free HPr carrying species in clostridia have been shown to degrade cellulose (Fig. 1G), including a model organism of mesophilic cellulolytic clostridia, Ruminiclostridium cellulolyticum [34]. It is known that lignocellulose biodegradation is essential to microbial homeostasis in animal guts and carbon recycling in the global ecosystem [35, 36]. These results raise questions regarding the role of the incomplete PTS in carbohydrate utilization, for instance, microbial degradation of plant fibers, and the mechanism of catabolite regulation in these heterotrophic microbes.

Novel metabolic profiles of R. cellulolyticum dependent of the mini “PTS” components

R. cellulolyticum is a natural producer of lignocellulosic biofuels and chemicals via consolidated bioprocessing [34]. It carries an incomplete PTS with the substitution of H15 in HPr and the absence of EI, EIIA, EIIB, and EIIC (Fig. 2A). We named this incomplete PTS a mini PTS and sought to understand its regulatory role in carbohydrate utilization. Most microbes exhibit CCR—preferring glucose over other sugars with a diauxic growth when multiple sugars are available [16]. There are three major mechanisms of CCR characterized in bacteria, including PTS-mediated cAMP-CRP mechanism in enteric bacteria, RNA-binding protein mediated mechanism in pseudomonads, PTS-mediated CcpA/HprK mechanism in Firmicutes [16]. When grown on a carbon mixture, R. cellulolyticum utilized cellobiose, glucose, and pentoses simultaneously (Figure S2A), at a > 3-fold higher consumption rate for soluble sugars other than glucose (Fig. S2A, S2B; qcellobiose = 27.7, qxylose = 27.6, qarabinose = 19.3, qglucose = 5.8 mmol/g/h). Compared to the utilization of singular sugars, the presence of glucose increased, rather than decreased, the consumption rate of cellobiose and xylose (Fig. S2C, S2D). Similar reverse catabolite repression—preferring other organics over glucose, has been described in a few bacterial species, such as probiotic Streptococcus thermophilus [37] Bifidobacterium longum [38] and pathogenic Pseudomonas aeruginosa/putida [39, 40].

Fig. 2: The mini “PTS” of R. cellulolyticum is essential for microbial metabolism.
figure 2

A A hypothetical model of non-canonical PTS in absence of sugar transport components in R. cellulolyticum H10. B Comparison of physiological and catabolic features, including the maximal cell biomass (Maxbiomass), the doubling time of growth (Tdoubling), the utilization efficiency of cellobiose (qcellobiose), arabinose (qarabinose), xylose (qxylose), and glucose (qglucose), across mutants and control strains, when grown on a carbon mix consisting of cellobiose, arabinose, xylose, and glucose. Data represent the fold change of mean values between mutants and controls indicated in the plot, those of which present statistical significances are marked. C Growth of bacterial strains on 10 g/L Avicel cellulose (n = 3). D Determination of residual cellulose after 300 h fermentation with bacterial strains on 10 g/L Avicel cellulose (n = 3). Unpaired two-tailed t test, *p < 0.05, ***p < 0.001. Data (C and D) are represented as mean ± sd.

We performed genetic manipulation to interrogate the role of mini PTS components in R. cellulolyticum (Fig. S2E). We applied Cas9n-based genome editing [41] to precisely inactivate the hpr ortholog (named hereafter catabolite promotion HPr, or cph, the mutant of which is Δcph) and two CcpA-like transcriptional regulators, ccpA (ccpA ortholog; ΔccpA) and ccpB (ccpA paralog; ΔccpB), and generate a site-specific cph mutant with the serine at 46 substituted by aspartate to mimic the phosphorylated form (cph_S46D). Since a hprK knockout was not successfully obtained, we knocked down the activity of hprK by antisense RNA molecules (hprKi). The functional relevance of each PTS component was evaluated by comparing carbohydrate utilization and cell growth of these mutants on different carbon sources. Detailed interpretation to phenotypic changes is shown in the supplementary text (Supplementary text A). On soluble carbon mix (CM, consisting of cellobiose, glucose, arabinose, and xylose), which mimics the co-presence of diverse sugars released during lignocellulose biodegradation, we observed that cph knockout and cph_S46D reduced the consumption rate of most sugars (Figs. 2B, S2B). The positive role of S46-carrying Cph in using simple sugars simultaneously is opposite to catabolite repression as described in model microbes [16, 21]. Since HPr proteins are cognate targets of HprK for phosphorylation [42], it is consistent that hprK knockdown also decreases sugar utilization (Figs. 2B, S2B). For the ccpA-like genes, loss of ccpB rather than ccpA affected the consumption of soluble sugars, suggesting functional divergence of these two LacI regulators in R. cellulolyticum. On cellulose, we observed that loss of cph and ccpA, as well as cph_S46D, abolished bacterial growth (Fig. 2C); ccpB disruption had both a reduced growth rate and cellulose degradation (Fig. 2C, D), whereas hprK knockdown had no significant impact on cellulose degradation at final time point. The indispensable role for Cph and CcpA in cellulose metabolism, instead of catabolite repression, has not been reported before. Compared to canonical PTS, these metabolic profiles together demonstrated unique characteristics of this non-canonical incomplete PTS, such as Cph-dependent catabolite promotion and functional specialization of CcpA on cellulose.

Functional diversification of the mini PTS components at the transcriptional, biochemical and structural level

To systematically understand the mechanism of how this mini PTS regulates catabolite metabolism, we investigated into transcriptomic changes in PTS mutants, examined the functional relevance and interactions of PTS components by correlating transcriptomic, DNA-binding, and protein-binding profiles, and finally performed protein crystallization for key players of this incomplete PTS to decipher the molecular-level mechanism.

Cph, CcpA, and CcpB mediates distinct transcriptomic profiles

On carbon mix, Δcph had broad transcriptional changes in 286 genes that were enriched for diverse basic cellular functions and carbohydrate metabolism (Table S2); whereas cph_S46D presented very little impact on transcriptome (Fig. 3B). These results suggest that R. cellulolyticum H15-free Cph is associated with pleiotropic functions, not limited to carbohydrate utilization, which largely depend on the phosphorylation of Cph at S46. ΔccpB presented 239 differentially expressed genes (DEGs) on carbon mix versus 10 DEGs in ΔccpA (Fig. 3B; Table S2). DEGs in ΔccpB were involved in a variety of basic biological processes (Table S2), consistent with its wide impact on the consumption of soluble sugars (Figs. 2, S2D). As ΔccpA had a limited impact on transcriptome (Fig. 3B), irrespective of soluble sugars, R. cellulolyticum CcpB is more functionally similar to reported CcpA in B. subtilis and Staphylococcus mutans [43, 44]. It is known that HPr interacts with CcpA and thereby modulates the DNA binding affinity of CcpA [2, 16, 17, 45, 46]. Among more than 200 DEGs in Δcph and ΔccpB, respectively (Table S2), only 71 genes were shared and enriched for some biological processes (e.g., monocarboxylic acid biosynthetic process, ATPase-coupled transmembrane transporter activity and quorum sensing) (Fig. S3B). Plus differential catabolic and phenotypic features as shown above (Fig. 2B), it seems that Cph and CcpB regulate transcription mainly through independent pathways.

Fig. 3: Distinct roles of the mini “PTS” components in carbohydrate dependent transcriptional regulation.
figure 3

A Experimental design for functional characterization of the mini“PTS” genes in R. cellulolyticum H10 by genetic disruption, physiological and transcriptomic profiling under different carbon sources, as well as CHIP-Seq and IP-MS. B The number of DEGs across mutants grown with either cellobiose (CB) or carbon mix (CM). All mutants were compared to the parental strain to count DEGs (p < 0.01 and log2 |FC| > 2, n = 3). C Changes in the number of DEGs in mutants versus the parental strain at 0, 1, 3, 6, 12 h of cellulose incubation experiments (n = 3). D GO enrichment analysis of DEGs in Δcph, ΔccpA, and ΔccpB. Δcph, ΔccpA, and the parent strain after 12 h cellulose incubation were compared. ΔccpB and the parental strain at the mid log phase of growth on cellulose were compared. E Heatmap of transcriptional abundance of select genes in mutants during cellulose incubation. The transcript abundance in each genetic background was normalized to the mean of non-incubated (0 h) conditions. Select genes were shared DEGs in Δcph vs parent and ΔccpA vs parent after 12 h cellulose incubation. The p_perm values indicate whether gene-specific response patterns between mutants and parental strains on cellulose are significantly similar. Genes associated carbohydrate transport and metabolism (Carb. met.) are annotated. Transcriptomic responses of select genes in key gene clusters related to cellulose degradation (F), cellobiose catabolism (G), xylan degradation (H), and a cellulose-responsive TCS (I). Data represents mean ± se (n = 3).

Considering the essentiality of Cph and CcpA on cellulose, we performed incubation experiments in which cellobiose grown mutants were incubated in cellulose medium for longitudinal profiling of transcriptomes (Fig. S3C). Δcph and ΔccpA quickly responded to cellulose (Fig. S3A) but exhibited a unique set of DEGs at distinct emerging rates across time points (Fig. 3C). In terms of GO enrichment, ΔccpA are highly associated with carbohydrate/cellulose metabolism (Fig. 3D); whereas ΔccpB and Δcph have broad impacts on basic biological processes and molecular functions. These results further support functional divergence of CcpA and CcpB, where CcpA exclusively regulates cellulose metabolism. Analysis of the dynamic response of 88 shared DEGs between Δcph and ΔccpA (Table S2 and Figure S3B) revealed that most DEGs were quickly suppressed but then restored or even increased in ΔccpA and the parent strain, whereas strong suppression persisted in Δcph (Fig. 3E). We observed that the sole promoter-driven cip-cel gene cluster, which is indispensable for cellulose metabolism [47,48,49], was able to quickly respond to cellulose incubation in Δcph, ΔccpA, and the parent strain (Fig. 3F). However, the enhancement decayed soon in both mutants. Loss of sustainable transcriptional enhancement in the cip-cel gene cluster could lead to a lower expression of pivotal cellulose degrading enzymes for the formation of cellulosomal complexes to hydrolyze cellulose efficiently. Other gene clusters involved in cellobiose utilization (Fig. 3G), xylan degradation (Fig. 3H), and a cellulose-responsive TCS (Fig. 3I), did not show substantial differences across these strains in response to cellulose. Therefore, Cph and CcpA are important to sustain a long-term high expression of pivotal cellulose degrading genes, rather than responding to cellulose.

CcpA and CcpB from R. cellulolyticum diverged from reported CcpAs with distinct DNA binding motifs

We then performed ChIP-seq and biochemical assays to understand the regulatory mechanisms of CcpA and CcpB. ChIP-seq analysis revealed that both regulators could not bind to the upstream of the cip-cel operon, indicating PTS-mediated regulation of cellulose degradation in R. cellulolyticum is not through direct CcpA binding of the canonical CCR mechanism [49, 50] (Supplementary text B). They also had distinct binding spectra across the genome (Fig. 4A; Table S3), irrespective of carbon sources. By integrating transcriptomic and ChIP-seq data, we found that CcpA may directly regulate itself and a neighbor gene (Ccel_1006, a FGGY-family carbohydrate kinase), by which cellulose degradation could be regulated indirectly (Supplementary text B). In comparison, there are more binding sites identified in CcpB and more genes potentially directly regulated by CcpB. They were significantly enriched for transporter activity when grown on carbon mixture, and enriched for carbohydrate metabolic and amino acid biosynthetic processes when grown on cellulose (Fig. S4A and Table S3), which are consistent with observed metabolic phenotypes in ΔccpB - an increase in arabinose consumption rate when grown on carbon mixture, and a reduced growth rate and cellulose degradation when grown on cellulose (Fig. 2). Changing the carbon source from cellobiose to cellulose slightly broadened the binding of CcpA and CcpB (Figs. 4AS4A). To validate the binding, we purified His-tagged CcpA and CcpB and performed electrophoretic mobility shift assays (Fig. 4B, Figs. S4B, C). Results showed that CcpA and CcpB efficiently bound corresponding DNA fragments at 0.2 and 0.4 μM, respectively. MEME-ChIP analysis indicated that binding sites of CcpA and CcpB have different palindromic consensus motifs mostly differing in terminal nucleic acids (Fig. 4C), where CcpA tends to have 5′-GGN but CcpB prefers 5′-CGA. Compared to previously characterized CcpA proteins, CcpA and CcpB from R. cellulolyticum have diverged and have distinct binding motifs from those previously described.

Fig. 4: Distinct DNA binding profiles of R. cellulolyticum CcpA homologs that are independent of Cph and its phosphorylation status.
figure 4

A Genome-wide binding sites of CcpA and CcpB in corresponding complementation strains (ΔccpA with pEr-CcpA-FLAG and ΔccpB with pEr-CcpB-FLAG) grown on carbon mix (CM) or cellulose (CL). Each condition has two biological replicates for chromatin immunoprecipitation sequencing (CHIP-Seq), which were combined for binding site calling (fold change>2, p adj < 0.05). B Electrophoretic mobility shift assays with purified proteins and select DNA sequences identified by CHIP-Seq. C Phylogeny of previously characterized CcpA proteins and CcpA-like homologs in R. cellulolyticum (RcCcpA and RcCcpB). Their DNA recognition profiles are shown on the right. D Interactors of Cph identified by immunoprecipitation-mass spectrometry (IP-MS). The significant bindings were indicated by the fold change of identified abundance (red, FC > 2, p adj < 0.0001) between the complementation (Δcph with pEr-0806) and control (Δcph with pEr-empty) strains. E Examination of protein-protein interactions by gel filtration. Purified CcpA (upper) and CcpB (lower) were incubated with Cph (black), Cph_S16D (blue), Cph_S46D (orange), and Cph_S16&46D (red).

Phosphorylatable Cph does not interact with either CcpA or CcpB

As HprK modulates the phosphorylation status of HPr and our genetic studies have demonstrated the involvement of HprK, Cph, and cph_S46D in carbohydrate metabolism (Fig. 2), we investigated the biochemical properties of Cph during transcriptional and post-translational regulation. By performing immunoprecipitation-mass spectrometry (IP-MS), we found both Ser16 and Ser46 of Cph were phosphorylated, with phosphorylation ratios of 28% and 38%, respectively (Table S4). To confirm if both sites could be phosphorylated by HprK, in vitro incubation assays were carried out using purified HprK, Cph, Cph-S46D, and Cph-S16A/S46A mutant proteins, with supplemented ATP. Most Cph proteins were phosphorylated by HprK whereas the two mutant proteins were not (Fig. S4D), indicating that HprK only targets the Ser46 site of Cph. The in vitro phosphorylated Cph was also subjected to MS/MS analysis and ~93% Ser46 and only ~1% Ser16 was phosphorylated, further confirming that HprK specifically targets Ser46 rather than Ser16 (Table S4). Apart from using ATP as the phosphate donor, Cph phosphorylation at S46 also occurred when pyrophosphate (PPi) and fructose-1,6-bisphosphate (FBP) were co-provided for HprK (Fig. S4D and Table S4), which is different from using either ATP when FBP is co-present or just PPi to phosphorylate HPr in B. subtilis [42]. Therefore, Cph in R. cellulolyticum can be phosphorylated by HprK upon sensing a high energy status (Supplementary text C).

To explore potential interactors of Cph, we performed IP-MS analysis and identified 60 proteins that were significantly enriched (FDR-adjusted p < 0.05 and fold change >2, Fig. 4D). These proteins have diverse functions, including pyrimidine ribonucleotide biosynthetic process, cellulose catabolic process, signal transduction, and ribosome biogenesis (Table S4 and Supplementary text B). Although HPr (P-Ser46) interacts with CcpA in the canonical PTS in Firmicutes [2, 4, 16, 17], both CcpA and CcpB were identified to have no interaction with Cph (Table S4). Undetectable interactions between unphosphorylated/phosphorylated Cph and CcpA or CcpB were further supported by isothermal titration calorimetry (Fig. S4E), gel filtration (Fig. 4E), and IP-MS analyses of CcpA and CcpB (Table S4). These biochemical observations together support the unique and independent functionalization of these miniPTS components in transcriptional regulation and microbial metabolism.

Structural basis of Cph-independent DNA binding for CcpA and CcpB

To unravel the Cph-independent DNA binding mechanism of CcpA homologs in R. cellulolyticum, we solved the crystal structures of CcpB (Fig. S5A) and Cph (Fig. S5B). CcpA crystallization was not successful. The 2.2 Å and 2.4 Å resolution structures of CcpB and Cph allowed structural comparisons with previously characterized apo-CcpA and HPr-46P-CcpA-DNA complex from Bacillus megaterium (Fig. 5). R. cellulolyticum CcpB adopted a dimeric structure, with the N-terminal DNA binding domain (1–58) missing (Fig. S5C). The absence of DNA binding domains also occurred in apo form structures of other LacI regulators [17, 51, 52], probably due to the highly dynamic nature of DNA binding domains of the LacI regulators in the absence of bound DNA. The CcpB dimer was optimally superimposed onto the B. megaterium CcpA (BmCcpA) dimer with a root-mean-square deviation (RMSD) of 2.7 Å for 449 corresponding Cα atoms. However, the conserved residues in BmCcpA that are involved in the binding of classic HPr (eg. R303, K307 and D296) were mostly changed in CcpB (Fig. 5A). Most prominently, the R303 and K307 residues in BmCcpA, which are essential to interact with phosphorylated S46 in HPr [17], were replaced by Q305 and A309 in RcCcpB (Fig. 5A). A structural model of RcCcpA was also predicted using AlphaFold2 and there was a similar change at the interface, where the basic R303 in BmCcpA was substituted by the neutral Gln residue (Fig. S5C, S5D). As a result, the surface potential of RcCcpB/RcCcpA is mostly neutral at the HPr-46P binding interface, rather than a basic interface in BmCcpA (Figs. 5A, 5B, S5D). Therefore, RcCcpB/RcCcpA presented a distinct structural interface to interact with potential ligands (Figs. 5A, 5B, S5D). Comparison of the DNA binding domains of RcCcpA with BmCcpA revealed residues I16-N24 constitute the DNA-recognizing helix, which is well conserved in CcpA orthologous proteins(Fig. S5C). On the other hand, overlay of RcCph onto BmHPr-46P revealed very small structural variations (RMSD of 1.2 Å for 66 Cα atoms) and similar residues at the CcpA binding interface (Fig. S5B). These structural comparisons suggested that the neutrally charged interface on RcCcpB and RcCcpB would disengage the binding of negatively charged RcCph-Ser46-P and supports our in vitro biochemical results that RcCcpA/RcCcpB do not interact with phosphorylated Cph in R. cellulolyticum.

Fig. 5: Structural basis of Cph-independent regulation by CcpA and CcpB in R. cellulolyticum.
figure 5

A Surface potential of the CcpB dimer from R. cellulolyticum (RcCcpB without the DNA binding domain) and CcpA from B. megaterium (BmCcpA). The RcCcpB dimer was superimposed to the BmCcpA dimer of BmCcpA-HprSer46P complex and the structure of Hpr and Cph was shown in a cyan cartoon representation, with the Ser46P shown in sticks. The BmCcpA residues responsible for interacting with HprSer46P and the equivalent residues in RcCcpB were shown in sticks and labeled. B Multiple sequence alignment showing residues involved in conformational switching are different between BmCcpA and R. cellulolyticum CcpA homologs (RcCcpA and RcCcpB). The sequence logo of the conserved segment (294–310) in CcpA homologs (n = 8651) depicts the usage frequency of amino acids, in reference to the relative position in BmCcpA. C Hydrogen bonding network essential for CcpA conformational switching in B. megaterium apo-CcpA (white), CcpA-HprSer46P complex (yellow) and R. cellulolyticum apo-CcpB (magenta). The residue labels for BmCcpA were shown in parentheses. Effect of site-specific mutations in RcCcpA (D) and RcCcpB (E) on DNA binding (n = 2–3 EMSA gels). The significance was determined by ANOVA. F Distribution of 8651 CcpA orthologs predicted in 14,998 bacterial genomes. The classic HPr-dependent CcpA orthologs (with R/H/K at 303 and T at 306) and RcCcpA/RcCcpB-like orthologs (with Q/N at 303 and V/L at 306) were colored in blue and red, respectively.

We then compared the structure of RcCcpB/RcCcpA with apo-BmCcpA and CcpA-HPr-Ser46-P-DNA ternary complex, which represent DNA-free and DNA-bound conformations, respectively (Fig. 5C,  S5E). Two pairs of hydrogen bonds, namely Y89-R303 and Y91-T306, which were uniquely formed in each of the two distinct conformations, could serve as the potential allosteric switch of BmCcpA. In the DNA-bound conformation, HPr-Ser46-P drew the sidechain of R303 close to Y89, resulting in the formation of a hydrogen bond between Y89 and R303, and this conformational change propagates to Y91, leading to the disruption of the hydrogen bond formed between Y91 and T306 [17]. In the apo-RcCcpB (Fig. 5C), the R303 equivalent residue Q305 hydrogen-bonds to Y88 (equivalent to the Y89 residue in BmCcpA) in the absence of HPr-Ser46-P and the T306 equivalent site was replaced by Val308, making it impossible to form the hydrogen bond with Y90 (equivalent to the Y91 residue in BsCcpA). Similar situation also applied to the predicted RcCcpA (Fig. S5E). These results corroborate that the solved structure of apo-RcCcpB or predicted structure of RcCcpA is in a DNA-binding competent conformation and suggest that Q305 (N301 in RcCcpA) and V308 (L304 in RcCcpA) sites are critical for maintaining HPr/Cph-independent DNA binding conformation of RcCcpB/RcCcpA. Single site mutations of Q305 and V308 in RcCcpB, N301 and L304 in RcCcpA, resulted in diminished DNA binding capacities, confirming the role of these residues in supporting the DNA-binding conformations of RcCcpB and RcCcpA (Figs. 5D, 5E, S5F). Examination of CcpA orthologs from 14,998 microbial genomes revealed that residues R303 and T306 critical for the HPr/Cph-dependent allosteric switch, are not conserved in the majority of species except for most bacilli genomes (Fig. 5F). This suggests that HPr/Cph-dependent allosteric switching of CcpA orthologs in the canonical CCR is not a universal mechanism across the bacterial kingdom but mainly evolved in certain bacilli species.

Discussion

The PTS is prominently known for its role in microbial carbon metabolism through direct sugar transport and regulatory functions. Since PTS-mediated sugar uptake has only been characterized in a few microbial representatives, our study provides, for the first time to our knowledge, a systematic view of its prevalence and distribution in nearly 15,000 genomes from 4293 prokaryotic species. We revealed that the canonical PTS is present in only 26.3% of sequenced microbial species; the incomplete PTS is prevalent and distributed in a phylogeny-independent manner. These data suggest that the PTS is neither a dominant nor conserved mechanism for prokaryotes to transport carbohydrates, supporting the hypothesis that PTS-mediated sugar transport evolved recently [2, 16, 53]. Key players of PTS-mediated CCR in Enterobacteria and Firmicutes are EIIA and HPr, respectively [2, 16, 21]. However, 27% and 23% of microbial species in our large data set have no predictable homologs of EIIA and HPr, suggesting the well-known CCR mechanism mediated by PTS is not applicable to most prokaryotes.

This study raises a question about the role of incomplete PTSs in CCR, particularly the relationship between CCR and the completeness of PTS in carbohydrate utilization. It is known that canonical PTS-mediated CCR allows microbes to efficiently use sugars, grow fast and remain competitive in natural environments [2, 16]. However, our results indicated that bacterial strains carrying incomplete PTS have a different strategy for carbon utilization, for example, co-consumption of multiple carbohydrates in R. cellulolyticum. If all bacteria in a microbial community only contain canonical PTS components and strictly execute PTS-mediated CCR, competition for preferred carbon could happen between bacteria, leading to inefficient carbon utilization and inhibition of carbon degradation. However, with diversified carbon utilization strategies depending on PTS regulatory machineries, the cooperation and efficiency of carbon degradation can be greatly improved within a microbial community that constantly faces complex and dynamic supply of carbon sources. On the other hand, given that PTS-mediated CCR presumably improves bacterial competitiveness, bacteria with canonical CCR may be more competitive when facing stresses such as warming or drying. For example, most species in Bacilli and Clostridia are endospore-forming and can remain in the dormant state for years [54, 55], which could offer competitive advantages for both bacilli and clostridia under selective pressure from increased temperature or drying. However, our previous study found that warming can only increase the relative abundance of Bacilli rather than Clostridia and the warming-enhanced homogeneous selection was mainly attributed to the positive responses of Bacilli [56]. This differential impact on Bacilli and Clostridia in the warming field can be explained by the bacterial PTS completeness because the completeness of core PTS components in bacilli is much higher than clostridia (Fig. 1C), and the HPr/Cph-dependent allosteric switching mainly exists in bacilli (Fig. 5F). These indicate that more species in Bacilli may execute canonical PTS-mediated CCR, which can finally enhance the competitiveness of Bacilli in response to warming stress compared to Clostridia.

Our study in R. cellulolyticum, which was selected as a representative of incomplete PTS carriers in cellulose-degrading clostridia, demonstrated a mild reverse CCR and catabolite promotion by incomplete PTS components, such as Cph and CcpA. The incomplete PTS regulates metabolism through molecular mechanisms distinct from the long-standing model established in B. subtilis, where the CcpA-HPr-S46-P complex represses the catabolism of unfavorable sugars. First, R. cellulolyticum CcpA plays a counterintuitive role in metabolic regulation since it is not involved in the utilization of co-existing soluble sugars both metabolically and transcriptionally, but it is essential for cellulose metabolism by maintaining a sustainable high expression of the cellulose degrading gene cluster through yet to be characterized mechanisms (Supplementary text B). Second, CcpA and CcpB in R. cellulolyticum have mechanistically diverged from previously characterized CcpA orthologs because of distinct DNA binding motifs identified experimentally and changes in ligand-dependent allosteric regulation. In this incomplete PTS model, CcpA and CcpB also have been functionally specified, differing in metabolic regulation from the aspect of carbohydrate consumption, transcriptional association, and DNA binding profiles. Third, H15-free Cph in R. cellulolyticum has a similar structure to other HPr but does not interact with either CcpA or CcpB to alter DNA binding, regardless of its phosphorylation status mediated by HprK using either ATP or newly discovered PPi as the energy donor (Supplementary text C). Fourth, Cph and its phosphorylated form (e.g., cph_S46D) regulate carbohydrate metabolism and affect broad transcriptional profiles, which are distinct from CcpA-/CcpB-associated metabolic and transcriptional regulation. Instead of interacting with CcpA regulators, we found Cph binds to a variety of proteins, such as ribosomal components, anti-sigma factor antagonists, and transcriptional termination factors, which could link to transcriptional regulation and physiological adaptation to changing carbon sources (Supplementary text B).

In terms of sugar uptake in anaerobic cellulose degraders with an incomplete PTS, we would consider the incomplete PTS as a beneficial evolutionary outcome for economical and efficient carbohydrate transport for cellulose-degrading bacteria. First, inducer exclusion led by PTS sugar transport would not exist such that these microbes can consume multiple sugars simultaneously as shown in R. cellulolyticum. Second, ABC transporters that preferably pump disaccharides or oligosaccharides, such as the preference for cellobiose over glucose in R. cellulolyticum, stoichiometrically consume less energy to internalize sugar units for central metabolism, when compared to monosaccharide ABC transporters. This is of more significance to cellulose degraders that require energy-demanding synthesis and secretion of cellulose degrading enzymes.

Taken together, these genetic, metabolic, transcriptional, and biochemical data concordantly support that incomplete PTS components independently regulate carbohydrate metabolism. Given that lignocellulose utilization is critical to carbon cycling in the global ecosystem, our findings can also provide new insights into selections of functional gene information for incorporating into ecosystem models for improving accuracy in carbon decomposition and reducing model uncertainty (e.g., selecting both incomplete PTS genes and carbon degradation genes instead of only choosing carbon degradation genes).

Conclusion

Spanning from massive genome mining to mechanistic dissection, this study brings novel understanding of the limited applicability of the long-standing canonical PTS in prokaryotes and the diversified regulatory mechanisms of PTS components for carbohydrate metabolism. Distinct from catabolite repression of canonical PTS, the representative incomplete PTS in R. cellulolyticum promotes carbohydrate utilization, especially the indispensable role for Cph and CcpA in cellulose metabolism. Systematic investigation into R. cellulolyticum PTS components revealed independent functioning of Cph and CcpA, which is determined by structural changes in CcpA. These fundamental discoveries allow us to rethink the role of PTS in microbial competitiveness in microbiomes and will guide the design of microbiome-oriented nutrient intervention for improved functions in different settings.

Methods

Bacterial strains and plasmid construction

Strains, plasmids, and primers used in this study are listed in the Table S5. Bacterial mutant strains of R. cellulolyticum were constructed in the ΔmspI background using an all-in-one CRISPR-Cas9 nickase editing system [41]. By inserting a 7-bp DNA fragment to cause frameshift and introduction of early stop codons, the knockout mutants of Ccel_0806, Ccel_1005, and Ccel_1438 were generated and named Δcph, ΔccpA, and ΔccpB, respectively. We also performed point mutagenesis in Ccel_0806, yielding cph_S46D in which Ser at 46 was substituted by Asp to simulate phosphorylation [57]. However, we failed to create a S46A mutant in which Ser at 46 would have been substituted by Ala to mimic loss of phosphorylation. The genome editing plasmid in R. cellulolyticum mutants was cured as previously reported [58]. As we failed to knock out the hprK gene using either Clostron or Cas9 nickase genome editing, RNA interference (RNAi) was applied to downregulate the expression of the hprK gene by plasmid-based overexpression of antisense RNAs that are complementary to hprK transcripts (pRNAi-hprK). The empty plasmid transformant (pRNAi) was used as an experimental control.

Measurement of bacterial growth and carbohydrate utilization

All R. cellulolyticum mutants and control strains were revived in complex VM medium [59] with 5 g/L cellobiose. Unless otherwise indicated, all strains were cultured anaerobically at 34 °C. To test microbial physiology and metabolism, a carbon mixture or Avicel PH101 crystalline cellulose was used as the carbon source. The carbon mixture consisted of 3 g/L cellobiose, 2.5 g/L glucose, 2 g/L xylose, and 1 g/L arabinose to mimic the complexity of released sugars during lignocellulose degradation. When using carbon mixture as a carbon source, we cultured bacterial strains to OD600 0.5–0.6 in the defined VM medium, and then transferred into the same fresh medium for sample collection and growth monitoring. Cellobiose-grown cultures at the exponential growth phase (OD600 0.5–0.6) were inoculated into 50 ml of defined VM medium with 10 g/L Avicel PH101 crystalline cellulose. Unless otherwise indicated, 15 μg/ml thiamphenicol was added to maintain the plasmid in pRNAi-hprK and pRNAi strains. Three biological replicates of each strain were used for analyses.

Bacterial growth on carbon mixture and cellulose was estimated by optical density and total protein measurement, respectively [58]. High-performance liquid chromatography was used to measure cellobiose, glucose, xylose, and arabinose in the supernatant of spent medium [58]. The specific rate of sugar consumption (q substrate) in mmol per gram of cells per hour was the derivative of the time course plots [60]. The remaining cellulose during fermentation was measured by the phenol-sulfuric acid method [61].

Microarray analysis

R. cellulolyticum strains (i.e., the parent strain, ΔccpA, ΔccpB, Δcph, and cph_S46D) were cultivated in the defined VM medium with either carbon mixture or cellulose as mentioned above. On carbon mixture, each strain had three biological replicates and 10 ml cell samples were collected at three time points (16, 20, and 24 h) during the exponential phase. On 10 g/L cellulose, ΔccpB and the parent strain were cultivated with six biological replicates and collected at mid-exponential growth phase. Since ΔccpA and Δcph could not grow on cellulose, we performed an incubation experiment (Fig. S3C) to understand how ΔccpA and Δcph respond to cellulose. Specifically, the parent strain, ΔccpA, and Δcph were grown to an OD600 of 0.5–0.6 in 50 ml defined VM media with cellobiose as the carbon source. Each strain had three biological replicates. Bacterial cells from each biological replicate were then collected by centrifugation at 4000 g and washed twice with the defined VM medium (no carbon added). Finally, washed cells from each biological replicate were inoculated into the defined VM medium with 10 g/L cellulose. During shaking incubation at 34 °C, samples were collected at five time points (0,1, 3, 6, 12 h). After centrifugation at 4 °C, 5000 × g for 10 min, cell pellets were immediately flash frozen with liquid nitrogen and then stored at −80 °C for further use.

Total RNA extraction, microarray hybridization, and data analysis were performed as described previously [62]. Venn diagram graphs were generated with the online tool (http://bioinformatics.psb.ugent.be/webtools/Venn/). ClueGO v 2.5.5 and CluePedia v 1.5.5 were used for functional enrichment analysis with a two-sided minimal-likelihood test, and p values were corrected using the Benjamini-Hochberg method [63]. PCA analysis was conducted with the “prcomp” function in R using the normalized data.

ChIP-seq

ΔccpA and ΔccpB mutants were transformed with pEr-CcpA-FLAG and pEr-CcpB-FLAG respectively. The parent strain carrying pEr-Empty-FLAG was used as the negative control. The ChIP-seq experiment was performed as reported previously with some minor modifications [64, 65]. In brief, bacterial cells grown on carbon mixture in 50 ml of the defined VM medium were harvested at OD600 0.5; to keep consistent with transcriptomic analysis on cellulose, bacterial cells grown on cellobiose were incubated with 10 g/L cellulose for 8 h (Fig. S3C). Each strain has two biological replicates under each condition. Cell pellets from 50 ml of these cultures were treated with 1% formaldehyde for 10 min at 34 °C, then quenched by 250 mM glycine. Cells were then collected by centrifugation, washed twice with the PBS buffer (pH 7.4) and resuspended in 750 μL of the IP-buffer (100 mM Tris-HCl pH 8.0, 1% Triton X-100, 500 mM NaCl, and 1 mM PMSF). Cells were disrupted by bead beating and then sonicated to shear DNA/protein to an average size of 300–500 bp. After centrifugation, the supernatant was incubated with anti-FLAG M2 Magnetic Beads at 4 °C overnight. In the next day, the anti-FLAG M2 Magnetic Beads were washed twice with wash buffer 1 (100 mM Tris-HCl pH 8.0 and 600 mM NaCl), wash buffer 2 (100 mM Tris-HCl pH 8.0 and 300 mM NaCl) and wash buffer 3 (100 mM Tris-HCl pH 8.0 and 100 mM NaCl), and finally eluted by the ChIP elution buffer (100 mM Tris-HCl pH 8.0, 10 mM EDTA, 1% SDS) with 50 μg proteinase K at 56 °C for 2 h and then at 65 °C overnight. Immunoprecipitated DNA samples were purified using the ChIP DNA Clean & Concentrator (ZYMO). DNA was used for the construction of sequencing libraries using a KAPA DNA Library Preparation kit (Kapa Biosystems) and then sequenced on MiSeq (Illumina) with the Miseq Reagent Kit V2.

ChIP-seq reads were mapped to the genome of R. cellulolyticum H10, using BWA-MEM (Galaxy Version 0.7.17.1) with Illumina mode [66], and then filtered (MAPQ quality score > 20) by Filter SAM or BAM (Galaxy Version 1.8+) [67]. The enriched peaks were identified using MACS2 callpeak (Galaxy Version 2.1.1) [68]. The consensus motifs for Ccel_1438 and Ccel_1005 were predicted by MEME-ChIP (version 5.3.3) [69].

Immunoprecipitation-mass spectrometry (IP-MS)

The parent strain w/pEr-Empty-FLAG (negative control), ΔccpA w/pEr-CcpA-FLAG, ΔccpB w/pEr-CcpB-FLAG, and Δcph w/pEr-Cph-FLAG were used for the IP-MS experiment to detect potential proteins that interact with CcpA, CcpB, and Cph. The IP-MS was performed using two independent cultures for each strain. All IP-MS experiments were conducted in medium with 10 g/L cellulose using the same incubation experiment design (Fig. S3C). All strains were incubated with cellulose for 8 h and then were collected by centrifugation, washed twice with PBS buffer and resuspended in 750 μL IP-buffer (100 mM Tris-HCl pH 8.0, 1% Triton X-100, 500 mM NaCl, and 1 mM PMSF). The cells were disrupted by bead-beating and then centrifugation at 13,000 g for 10 min. The supernatant was incubated with Anti-FLAG M2 Magnetic Beads (Sigma Aldrich) at 4 °C overnight. The next day, the Anti-FLAG M2 Magnetic Beads were washed twice with wash buffer 1(100 mM Tris-HCl pH 8.0 and 600 mM NaCl), wash buffer 2 (100 mM Tris-HCl pH 8.0 and 300 mM NaCl) and wash buffer 3 (100 mM Tris-HCl pH 8.0 and 100 mM NaCl), and finally target proteins were eluted from the antibody using 100 μl elution buffer (100 mM Tris–HCl pH 8.0, 10 mM EDTA, 1% SDS) at 65 °C for 2 h. The eluted proteins were prepared by filter aided sample preparation for mass spectrometry as previously reported [70]. Tryptic peptides were separated by reverse phase XSelect CSH C18 2.5 um resin (Waters) on an in-line 150 ×0.075 mm column using an UltiMate 3000 RSLCnano system (Thermo). Peptides were eluted using a 90 min gradient from 98:2 to 65:35 buffer A:B ratio. Eluted peptides were ionized by electrospray (2.2 kV) followed by mass spectrometric analysis on an Orbitrap Fusion™ Tribrid™ Mass Spectrometer (Thermo). MS data were acquired using the FTMS analyzer in profile mode at a resolution of 240,000 over a range of 375–1500 m/z. Following HCD activation, MS/MS data were acquired using the ion trap analyzer in centroid mode and normal mass range with normalized collision energy of 28–31% depending on charge state and precursor selection range. The acquired MS/MS data were searched using Sipros algorithm against the R. cellulolyticum H10 protein database with the FDR < 0.01 at protein level [71].

Protein expression and purification

The Cph, CcpA, CcpB, and HprK genes were amplified by PCR from R. cellulolyticum H10 genomic DNA, and cloned into the modified pET28a vector (Novagene) which contains a C-terminal 6×His tag. The cloning junctions were confirmed by DNA sequencing. Each recombinant plasmid was transformed into the E. coli Rosetta (DE3) strain (Novagen). Cells were grown at 37 °C in Luria Bertani (LB) medium containing 100 μg/ml kanamycin and 40 μg/mL chloramphenicol. Expression of Cph, CcpA, CcpB, and HprK were induced at an OD600 nm of 0.8 by adding 0.2 mM isopropyl-β-D-1-thiogalactopyranoside followed by incubation at 16 °C for 20 h. The cells encoding Cph or HprK were collected and lysed by sonication in 20 ml buffer A (20 mM Tris-HCl buffer pH 8.0, 100 mM NaCl) in an ice-water bath. After centrifugation, the His-tagged fusion protein Cph or HprK was isolated with Ni-NTA affinity column (GE Healthcare) and further purified by gel filtration (Superdex 75, GE Healthcare) in buffer A. The cells encoding CcpA were sonicated in 20 ml buffer B (20 mM Tris-HCl buffer pH 8.0 containing 300 mM NaCl) in an ice-water bath. After lysis by sonication, CcpA were isolated with Ni-NTA affinity column (GE Healthcare) and further purified by gel filtration (Superdex 200, GE Healthcare) in buffer B. The cells expressing CcpB were lysed by sonication in 20 ml buffer C (20 mM Tris-HCl buffer pH 7.0 containing 300 mM NaCl) in an ice-water bath. After centrifugation, the CcpB were isolated with Ni-NTA affinity column (GE Healthcare) and further purified by gel filtration (Superdex 200, GE Healthcare) in buffer C. The Cph, CcpA, and CcpB mutants were obtained by site-directed mutagenesis using the plasmid encoding the wild-type proteins as a template, overexpressed, and purified the same way as the wild-type protein. All eluted proteins were collected, concentrated using centrifugal ultrafiltration respectively for further study. The purity of protein samples was assessed by denaturing polyacrylamide gel electrophoresis (SDS-PAGE). Purified CcpA (ε280 = 15,000 M−1 cm−1) and CcpB (ε280 = 24,000 M−1 cm−1) were incubated with Cph, Cph_S16D, Cph_S46D, and Cph_S16&46D (ε280 = 8500 M−1 cm−1). CcpA/CcpB with Cph and Cph mutant mixtures (1:10 molar ratio) were analyzed on analytical gel filtration chromatography (Superdex 75 HR 10/300 GL) in buffer B or buffer C, respectively. The elution volumes for CcpA, CcpB, and Cph (Cph mutants) were 8.5 ml,11 ml, and 15 ml, respectively.

Isothermal titration calorimetry (ITC) assay

The ITC experiments were performed with a MicroCal PEAQ-ITC instrument (Malvern, USA) and all statistical analyses were performed with the origin ITC analysis software package (MicroCal). All samples were centrifuged and degassed extensively prior to the experiments. Cph (400 µM; Syringe) was firstly titrated into CcpA (20 µM; sample cell) and CcpB (20 µM; sample cell), respectively. Then, the Tris solution (20 mM Tris-HCl, pH 7.5, and 100 mM NaCl) was also titrated into the CcpA sample or CcpB sample, and the resulting heat was subtracted from the experimental curve. The titration was performed with a total of 19 injections with constant stirring. The data was finally fitted to the one-site binding model to generate thermodynamic parameters.

In vitro phosphorylation of Cph

The phosphorylation assays were performed in the presence of His6-Cph (1 mM) and His6-HprK/P (500 µM) in phosphorylation buffer (10 mM Tris-HCl, pH 7.4, 100 mM NaCl, 5 mM MgCl2, 1 mM dithiothreitol [DTT], and 10% glycerol, 20 mM ATP or pyrophosphate) with or without FBP (10 mM). The reactions were carried out at 25 °C for 60 min. The samples were loaded onto either nondenaturing 17% polyacrylamide gels for separation of unphosphorylated Cph and phosphorylated Cph or 12% Native-PAGE for further mass spectrometry analysis of phosphorylated Cph. The phosphorylation ratios were estimated using the formula:

$$ Phosphorylation\,ratio \\ = \frac{{spectral\,counting\,(SC)\,of\,the\,phosphorylated\,peptide}}{{SC\,of\,the\,phosphorylated\,peptide + SC\,of\,the\,non - phosphorylated\,counterparts}} \times 100{{{{{{{\mathrm{\% }}}}}}}}$$

Phosphorylation site determination by liquid chromatography-tandem mass spectrometry (LC-MS/MS)

The In-gel digestion of the phosphorylated Cph were performed as described previously [72]. Briefly, the band slices were reduced with 10 mM Tris (2-carboxyethyl) phosphine in 100 mM NH4HCO3, alkylated with 40 mM 2-chloroacetamide in 100 mM NH4HCO3, and digested with trypsin (Sigma) at a 1:50 enzyme-to-substrate ratio. Digested peptides were purified via homemade C18 Spin Tips, lyophilized, dissolved in 0.1% formic acid (10 μl), and measured by LC-MS/MS. 2 μg of each sample peptides were loaded onto an Easy-nL 1200 ultra-high-performance liquid chromatography (UHPLC) instrument (Thermo Fisher Scientific) and transferred into an online coupled Orbitrap Fusion Lumos mass spectrometer (Thermo Fisher Scientific) by nanoelectrospray ionization (nanoESI). Peptides were eluted from a 20-cm-long analytical column packed with 1.9-μm reverse-phase C18 particles using a 60-min segmented gradient of 5% to 45% solvent B (90% [vol/vol] acetonitrile, 0.1% [vol/vol] formic acid) at a constant flow rate of 300 nl/min. For the MS/MS measurement, a data dependent Top Speed (3 s) method was used to collect spectra: high resolution MS1 spectra (Orbitrap resolution: 120,000; mass range: 350–1400 Th; and automatic gain control (AGC) target: 4 × 105; maximum injection time 50 ms) and high resolution MS2 spectra (Quadrupole isolation window: 1.6 Th; Orbitrap resolution: 30,000; HCD energy: 30%; AGC target: 5 × 105; maximum injection time: 22 ms). Dynamic exclusion was enabled with a duration time of 120 s. Acquired raw data were processed using MaxQuant software [73] with Methionine oxidation and serine, threonine, and tyrosine residues phosphorylation set as variable modifications, and cysteine carbamidomethylation set as fixed modification. All the other parameters were set as default. The raw data file was searched against a reference E. coli K-12 proteome and the protein sequences of Cph and HprK from R. cellulolyticum H10 using a built-in Andromeda search engine [74]. The maximum number of missed cleavages allowed was set to 3 for trypsin. Phosphopeptides with an Andromeda score of >70 and a localization probability of >0.75 were considered, and their MS/MS spectra were then analyzed manually.

Electrophoretic mobility shift assay

6-carboxyfluorescein (FAM)-labeled DNA fragments were obtained by annealing oligos (Table S5). The DNA fragments (potential binding motif for CcpA) were added at a concentration of 0.05 μM and incubated at room temperature for 10 min with an excess of CcpA, CcpAN301R, or CcpAL304T in a total volume of 20 μl mixture. The mixture contained 20 mM Tris pH 7.5, 500 mM KCl, 10 mM MgCl2, 10% (vol/vol) glycerol, 2 mM DTT, and 2 mM EDTA. Bovine serum albumin (BSA) was added at 3 μM to prevent nonspecific binding. The DNA fragments (potential binding motif for CcpB) were added at a concentration of 0.1 μM and incubated at room temperature for 10 min with an excess of CcpB, CcpBQ305R, and CcpBV308T in a total volume of 20 μl mixture. The mixture contained 20 mM Tris pH 7.5, 250 mM KCl, 10 mM MgCl2, 10% (vol/vol) glycerol, 2 mM DTT, and 2 mM EDTA. BSA was added at 3 μM to prevent nonspecific binding. After incubation for 30 min at 25 °C, the protein-DNA mixtures were directly subjected to 6% native PAGE with 1×Tris acetate-EDTA buffer. For FAM-labeled probes, the images were collected and analyzed on a Typhoon FLA-9500 imaging system (GE Healthcare).

Crystallization and data collection

The purified Cph and CcpB were concentrated by ultrafiltration (Millipore Amicon) to 10 mg/ml and 3 mg/ml, respectively. Initial crystal screening was set up with Crystal Screen, Crystal Screen 2, and PEG/Ion Screen reagent kits (Hampton Research) at 16 °C by using the hanging-drop vapor-diffusion method. Crystals appeared within 2 weeks. Further crystal optimization experiments were performed by systematic variation of the precipitant concentration and protein concentration. The Cph crystals were produced by mixing 1 μl protein solution (~10 mg/ml) and an equal volume of reservoir solution containing 0.2 M sodium sulfate, 20% w/v PEG 3350, 10% v/v ethylene glycol, and incubating at 15 °C. Crystallization of CcpB was performed at 16 °C using the hanging-drop vapor diffusion method. The appropriate crystals of CcpB were obtained from the reservoir solution of 0.1 M Bis Tris propane pH 6.5, 0.2 M trisodium citrate dihydrate,20% w/v PEG 3350,12% v/v ethylene glycerol after 3 weeks.

The crystals were harvested using cryo-loops and immersed briefly in a cryoprotectant solution consisting of 80% (v/v) reservoir solution and 20% (v/v) glycerol. The crystals were subsequently flash-cooled and stored in liquid nitrogen and transferred to beamline BL18U1 of the SSRF (Shanghai Synchrotron Radiation Facility) for X-ray diffraction analysis and data collection. The data set of Cph and CcpB were processed using HKL3000. Crystals of Cph and CcpB belong to P41212 and H3, respectively. Table S5 gives a summary of the data-collection statistics.

Structure determination, refinement, and prediction

The structures of Cph and CcpB were determined by the Molecular Replacement method using the structures of Crh (PDB, 1mo1) from B. subtilis and CcpA (PDB, 1rzr) from Priestia megaterium as the search models, respectively. The final structures of Cph and CcpB were solved at a resolution of 2.4 and 2.2 Å, respectively. The resultant model was refined using the maximum likelihood method implemented in REFMAC5 as part of the CCP4i program suite and the PHENIX program refinement and rebuilt interactively using the program COOT. Both the structures were assessed with MolProbity, and the Ramachandran outliers were less than 1% in the final refined structures. Crystallographic parameters are listed in Table S5. The predicted structure of CcpA was obtained by submitting its sequence to the AlphaFold2 server at https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb using the default settings [75]. All structure figures were prepared with PyMOL.

Genome mining

We downloaded both genomic sequences and protein annotation files (*_genomic.fna.gz and *_protein.faa.gz) of 14,998 prokaryotic genomes from the NCBI database, which were labeled as complete genomes in December 2019. 99.37% of their genomes have completeness reported to be >90% in the GTDB database (https://data.gtdb.ecogenomic.org/releases/release207/207.0/) [76]. Each genome was then taxonomically classified by GTDB-Tk v2.1.0 and R207v2 reference database using the classify_wf function with default setting [76]. All protein sequences were indexed by adding corresponding GenBank accession and assembly name to their headers, then pooled together to generate a single inhouse protein database with 56,869,107 sequences, which is publicly available (http://www.ou.edu/ieg/publications/datasets). All hmm files of reported PTS components were downloaded from the Pfam database (Table S1). The number of homologs in each genome was examined by hmm search from the hmmer-3.3 package against the protein database. The E-value threshold we applied for hmm search was at 0.01, which was the gathering threshold for building Pfam families [77]. Protein homologs were grouped into two categories based on protein architecture, including singular or modulary/fused forms. Counts of each architecture-specific homolog in each genome, along with corresponding genome identification and taxonomic information, were summarized in a single master file for downstream analysis (Table S1).

For each hmm profile of PTS components, we calculated its averaged copy number per genome with protein architecture features considered, and further examined the constitution of singular or fused architecture for each hmm profile across genomes. To understand the presence of PTS components, including HPr, EI, EIIA, EIIB, EIIC, HprK, and LacI family regulators, we summed the count number of hmm profiles of each PTS component in each genome. Since the EII complex consists of EIIA, EIIB, and EIIC, the completeness of EII is determined by the co-presence of all three components in the same genome. We then determined microbial species with unique PTS configurations according to their taxonomic information. This also allowed us to interrogate the possibility of inter-strain variations in PTS configuration in some relevant microbes. A Venn diagram was used to illustrate the number of microbial species with some unique or shared PTS configurations.

To understand the association between microbial phylogeny and PTS configuration, we generated a phylogenetic tree for all species at the class level with at least 50 genomes included. The taxid information of downloaded genomes was applied to infer the phylogenetic relationship between microbial strains using the online phyloT tree generator (https://phylot.biobyte.de/). For each class on the phylogenetic tree, we calculated the percentage of genomes that carry a certain copy number of PTS components or different completeness of EII (including absent, partial, and complete). The phylogenetic tree from phyloT was visualized by iTOL (https://itol.embl.de/). The integrated visualization of a heatmap showing averaged counts of each hmm profiles in genomes of all classes, and their phylogeny displayed by cladogram, was plotted by complex heatmap [78]. The Shannon index of PTS configuration was calculated for each genome by the “diversity()” function of the vegan package [79] in R and then statistically summarized to class levels to indicate the diversity of PTS configuration within each class. The higher Shannon index indicates likely coexistence and similar copy number of homologs in a genome, thereby reflecting the completeness of all hmm profiles. The variation of Shannon index indicates the consistency of PTS configuration within each class.

To evaluate the co-occurrence of PTS components, we computed Jaccard coefficients in R. At the module level, the copy number of each hmm profile was converted to a binary data before computing; at the protein level, copy numbers of PTS components, or the total copy number of hmm profiles belonging to specific PTS components was converted binarily for computing.

All HPr homologs predicted in 8651 genomes by hmm search were pooled together for motif analyses. Sequences were grouped into three HPr categories (H15, H15-free, and S16), by the amino acid at the pseudo position 15, which was determined by searching for similar conserved motifs. Since Crh and Npr are too similar to be distinguished from canonical HPr, here we named all of them as HPr homologs. The probability of amino acids used in the conserved motif was plotted by ggseqlog [80]. For those H15-free HPr homologs, we further examined the presence of other PTS components, including HprK, EI, and completeness of EII components, then projected the PTS configuration data to the phylogenetic tree of these H15-free carrying microbes. Cellulose degraders were manually curated.