Introduction

The health risks of silica (silicon dioxide, SiO2) primarily responsible for silicosis, one of the most frequent pneumoconiosis are highly documented as an occupational disease in mining workers and others with silica exposures1,2,3,4,5. Silicosis is characterized by injury to the alveolar cells preceded by an initial immune response and followed by expansion and activation of fibroblasts, and finally the deposition of the extracellular matrix (ECM). Exposure dose, both duration and concentration plays an important role in the development of silicosis types6,7. Exposure to crystalline silica (particles < 10 μm in diameter), amorphous silica (non-crystalline) and nano-silica (particles < 100 nm in diameter) exhibit a different effect on the development of silicosis, pulmonary fibrosis, and inflammation and cytotoxicity, respectively8. Silicosis is broadly categorized into three forms namely, Chronic silicosis, (develops due to exposure at low-moderate exposure for 10 or more years), Accelerated silicosis (develops from moderate to high-level exposure within 10 years) and Acute silicosis (results from intense exposure for a few weeks or to 5 years from the time of initial exposure).

Silicosis is illustrious irreversible lung fibrosis simultaneously coupled with many other diseases, such as pulmonary tuberculosis , lung cancer, renal failure, systemic sclerosis, rheumatoid arthritis, systemiclupus erythematosus, gastrointestinal problem, and autoimmune conditions, etc.8,9. Silica-induced lung damage occurs by several mechanisms including cell death by apoptosis, fibrosis and production of cytokines10. Due to the graveness of the element, International Agency for Research on Cancer (IARC) enlisted crystalline silica as a carcinogen11. There is possibility of no apparent symptoms at the initial stage, however, silicosis can continue even after cease of the silica exposure making the situation critical, non-curable, irreversible and uncontrolled immune processes12. Scientist and Medicine personnel recommend identification of workers at risk, prevention of further exposure to silica dust by job rotation and use of personal protective equipment, etc. to control the issue enhanced with early diagnosisand prediction9,13. Bandyopadhyay et al.14 stated that diagnostic challenges arise as silicosis shows resemblance in radiological and clinical overlap with pulmonary tuberculosis and neoplastic lesions.Therefore, it is necessary to investigate the complex molecular mechanisms that underlie the disease. Identification of differentially expressed genes (DEGs) in response to silica exposure and detail examination of the DEGs using the suitable statistical and computational approach may provide valuable information about the molecular mechanism(s) underlying the toxicity of crystalline silica15. The DEGs (upregulation/ downregulation) and/or their products, regulators, mostly transcription factors (TFs) and micro RNA (miRNA) following appropriate validation to recognize as a suitable biomarker(s) for silicosis is under investigation. A thorough investigation of possible molecular targets and mechanisms can make a way to achieve successful treatment options as well as prevention of potential adverse effects of silica exposure. The behavior of genes and genetic components due to silica exposure and toxicity are still rarely explained and are a vast area of scientific research16. Zhang et al.17. attempted to identify genome-wide aberrant DNA methylation profiling in lung tissues from silicosis patients using Illumina Human Methylation 450 K bead chip arrays. In the past, microarray-based transcriptomics studies have been successfully employed to gain insights into the molecular mechanisms underlying the toxicity of chemicals7 as well as to identify molecular markers for their toxicities18. Advances in high throughput gene expression profiling, such as transcriptomics, microarray analysis can enlighten in a better understanding of the effects of toxic agents in biological systems.

Wang et al.19 reported phagocytosis of SiO2 into the lung causes inflammatory on the cascade resulting in fibroblast proliferation and migration followed by fibrosis associated with monocyte chemotactic protein 1 and sustaine increase in p53 and PUMA protein levels. Wang et al.19 interpreted involvement of MAPK and PI3K pathways in the SiO2 induced alteration of p53 and PUMA expression and to have possibility of link between SiO2-induced p53/PUMA expression in fibroblasts and cell migration on the basis of miRNAs that can interference with p53 and PUMA and prevent SiO2-induced fibroblast activation as well as migration. The study provide insight into the potential use of p53/PUMA in the development of novel therapeutic strategies for silicosis treatment. Therefore, in the current study we target to investigate the molecular mechanism underlying lung cell differentiation by identifying DEGs involved in silicosis, their expression (up/down-regulation), produce the possible network and gene ontology term analysis namely biological pathway and biological process for differentially expressed genes. We have also considered for identification of transcription factors (TFs) and microRNA (miRNA) associated with these DEGs for the regulation of the disease gene. A common network construction combining the relevant of TFs, miRNA, gene network, and small molecules for regulation of the disease progress is targeted.

Methodology

Microarray datasets

In this study, we have considered the work “Mechanisms of crystalline silica-induced pulmonary toxicity revealed by global gene expression profiling” and Dataset GSE30180 from the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo)16. Datasets GSE30180 contains information regarding clinical tissues of human lung epithelial cells (A549 cells) considering 10 samples (5 control and 5 crystalline silica exposed to 800 ug/ml for 6 h using differential gene expression profile induced by silica conducted by National Institute for Occupational Safety and Health USA16). We have considered the available maximum concentration (800 ug/ml) from the study considering general worker’s shift exposure time period to have an idea at extreme conditions. Quality Control assessment of data quality is a major concern in microarray analysis. arrayQualityMetrics20 is from Bioconductor package that provides a report with diagnostic plots for one or two colour microarray data are used.

We have used arrayQualityMetrics for data quality assessment of microarray data and Quality Control is performed to identify potential low-quality arrays. The removal of low-quality arrays is desirable to avoid negative impact in downstream analysis procedures, by introducing invalid information and ultimately impairing statistical and biological significance. Array quality is assessed through the computation of commonly used statistical measures arrayQualityMetrics, an R package for quality control and quality assessment analysis that supports different types of microarrays in R. Here, we use an additional QC/QA that provides an HTML report with interactive plots. The instensity distribution of the arrarys of each box correspeonds to one array which indicated the over all quatity of each array with other corresponds are goods (Figure S1, S2).

Screening of differentially expressed genes

To screen DEGs between control and crystalline silica exposed cell, differential expression analysis is conducted using Bioconductor. Bioconductor operates in R (a statistical computing environment) and is applied for genomic data analysis and comprehension. The normalized data is analysed for identification of DEGs by limma package 3.26.8 in R (following adjust = “fdr” sort by B”, number 250). The robust MultiArray average method21 is applied to perform background correction and data normalization using default parameters in the limma package22. Subsequently, a differential analysis between silica exposed and the no silica exposed is performed using the limma package22, a modified version of the standard t-test incorporating the Benjamini-Hochberg (BH) multiple hypotheses correction technique23. DEGs are defined as the false discovery rate (FDR) set as the cut‑off parameters to screen out 250 significant increases or decreases in gene expression levels24.

Identification of transcription factor

IRegulon plugin25 in Cytoscape (version 3.8.0)26 is used to detect transcription factors using motif2TF and their optimal sets of direct targets for a set of genes Chip-seq data. The minimum identity between orthologous genes is 0.05%, while the maximum FDR on motif similarity is 0.001. The normalized enrichment score (NES) > 5 is considered as a threshold value for the selection of potential relationships. The NES for a given motif/track is computed as the Area Under the Recovery Curve (AUC) value of the motif/track minus the mean of all AUCs for all motifs (or tracks), and divided by the standard deviation of all AUCs. When the distribution of AUCs follows a normal distribution then the NES score is a z-score indicative of the significance. To maintain high accuracy of network inference large motif collections are collected from various species, and linking these to candidate human TFs via a motif2TF 25.

Identification of miRNA (microRNA) for regulation Silicosis

To identify targets, regulators and interactions of the molecular factors included in the deferential gene expressed network, we search for gene-miRNA target and cross-validation using microRNA Data Integration Portal (mirDIP)27 in 30 database sources such as BCmicrO, BiTargeting, CoMeTa, Cupid, DIANA, ElMMo3, GenMir +  + , MAMI, MBStar, MirAncesTar, MirMAP, MirSNP, MirTar, Mirza-G, MultiMiTar, PACCMIT, PITA, PicTar, RNA22, RNAhybrid, RepTar, TargetRank, TargetScan, TargetSpy, miRDB, miRTar2GO, miRcode, microrna.org, mirCoX and miRbase for regulating the expressions using both unidirectional and bidirectional search method. All the database considered in the present study is from authenticated and publically available sites27.

Functional enrichments analysis

ClueGO and CluePedia28 plug-in of Cytoscape is used for functional enrichment analysis. ClueGO plug-in translates functionally grouped Gene Ontology (GO) and pathway annotation networks with a hypergeometric test along with the kappa coefficient28 of pathways as well as functional correlations among pathways. ClueGO provides enrichment scores for selected gene sets against the user-provided gene list. CluePedia29 finds new markers that are potentially related with pathways and extend ClueGO functionality with other biological data deriving screened results. Default parameters are used, and only GO terms with P < 0.05 are selected in ClueGO with a Benjamini–Hochberg correction and a kappa score of 0.5 (medium).

Network construction and analysis of clusters

Finally, regulatory networks are constructed for silicosis by merging selected DEGs and TFs-DEGs pairs using Cytoscape. Thereafter, the Molecular Complex Detection (MCODE) plug-in23 is used to screen Clusters of hub genes from the network with degree cut-off = 10, haircut on, node score cut-off = 0.2, k-core = 2, and max. depth = 100.

Prediction of small-molecule signatures considering DEGs of silicosis

L1000CDS2 web tool is used for prediction of the potential small-molecule signature that matches user input signature genes expressions based on characteristic direction method in the underlying dataset. It is an ultra-fast LINCS L1000 Characteristic Direction Search Engine for prediction the potential small-molecule signature30 and DEGs are pasted into up/down text box and the top 50 signatures are considered.

Results and discussion

Identification of differentially expressed genes

The considered dataset is applied for significant DEGs identification by Bioconductor and 235 DEGs are found to be significantly associated with silicosis disease on the basis of the considered criteria. Table 1a, b enlist total of 235 DEGs, where down-regulated genes and up-regulated genes are 150 and 85 nos, respectively. On the basis of important pathway analysis and GO term Sellamathu et al.16 identified 60 DEGs for crystalline silica exposure in their study.

Table 1 DEGs gene in Silicosis.

Identification of transcription factors (TFs) for differentially expressed genes

Adopting the iRegulon plugin in Cytoscape, 24 TFs are identified from publicly available database signatures/genesets (GeneSigDB, Ganesh clusters or/and MSigDB) that are significantly associated with the DEGs involved in the silicosis disease. It is found that 20 TFs influence both up and down-expressed genes whereas four TFs are solely involved in controlling only the down-regulated genes (Table 2).

Table 2 Identified TFs for regulation DEGs genes in Silicosis.

Identification of miRNA for DEGs from database-driven expansion of the network

Using CyTargetLinker31 authors observed initially only 2716 miRNAs responsible for silicosis affected gene regulation from databases as mentioned in the methodology. Therefore, the authors further explored a few other databases as mentioned in the methodology (public databases around 30 nos). We applied mirDIP and observed a total of 2586 miRNAs associated with targeted DEGs responsible for silicosis and categorized as “very high” (top 1%) and “high” (top 5% (excluding top 1%))27. A total of 1105 miRNA is categorized as very high where 846 miRNAs influence upregulated genes, 1055 miRNAs regulate down-regulated genes and 801 miRNAs regulates both (Supplementary file 1). For example, among identified miRNA, miRNA- 29 influences epithelial-mesenchymal transition32, and miRNA-489 can repress its target genes MyD88 and Smad3 responsible for silicosis33. Chen et al.34 reported the involvement of IL-10-producing B cells in the development of silica-induced lung inflammation and fibrosis of mice and many such in the list. Yang et al.35 observed significant genetic heterogeneity involved in the origin and development of silicosis from research data and recommended relevant miRNAs as biomarkers having a role in regulating pulmonary fibrosis. The research group reported differential miRNAs in leukocytes as up-regulated (18 nos.) and down-regulated (20 nos.) during silicosis, compared with the control group miR-19a in peripheral blood leukocyte35. Faxuanet al.36 observed 39 differential expression miRNAs (14 up-regulated and 25 down-regulated in silicosis sample) between silicosis and normal lung tissues. Zhang et al.37 performed genome-wide miRNAs expression profiling in BALF cell fraction of 3 silicosis observation stages simultaneously with 6 silicosis patients. Among the identified 110 dysregulated miRNAs having down-regulation trend, 23 miRNAs abundantly expressed in stage I and Stage II silicosis suggesting different stages of silicosis are associated with distinct changes in miRNAs expression.

Regulatory network construction

We identified targets, regulators and integrators of the molecular factors included in the DEG Network. Gene expressed network is a node and edge interaction between gene–gene or gene-miRNA and regulatory transcription factors. Precisely, we searched for gene regulatory interaction network as a) TF-target gene interactions and b) miRNA–gene interactions. We merged all the extracted interactions with the gene expressed Network using the same annotation principles as above.

TFs-DEGs network analysis

The interaction network is derived using plugin Cytoscape with 147 Nodes and 769 edges (Fig. 1). The blue nodes indicate here are the genes namely, RELA, JUND, and CEBPB which act simultaneously as TF, so function as both regulator and regulated gene.

Figure 1
figure 1

The gene-transcription factor regulation network. Notes: a round node represents a gene (yellow node) with differencial expresstion, a triangle nodes (green node) represents the TFs, and a rectangular rectangular nodes (pink node) represent the TFs that plays a role as regulator and regulated.

The most interacting TFs are CHD1, CEBPG, FOXF1, CREB3, RELA, and FOXP2, etc. as given in Table 2. The TF CHD1 is a gene similar to various other genes that act as TF and can regulate the activity of certain genes providing instructions to make epithelialcadherin or E-cadherin (a protein) having influence in cell adhesion, the transmission of chemical signals within cells, control of cell maturation and movement38. ATF4 TF is known to regulate memory, metabolism, and adaptation of cells to stress factors such as anoxic insult, endoplasmic reticulum stress, and oxidative stress39. In normal bronchial epithelial cells, CEBPG TF correlates with antioxidant and DNA to repair genes which is absent for individuals with bronchogenic carcinoma40. Expression of KAT2A(Tyr645Ala), reduces gene expression and inhibits tumor cell proliferation as well as tumor growth41. CREB3 influences leukocyte migration, tumor suppression, and endoplasmic reticulum stress-associated protein degradation42,43. PAX3 plays critical roles during fetal development as well as neural crest44,45. Mutations in paired boxgene 3 are associated with Waardenburg syndrome, craniofacial-deafness-handsyndrome, and alveolar rhabdomyosarcoma46. FOXF1 promote slung regeneration after partial pneumonectomy and involved in murine vasculogenesis, lung, and foregut development. Mice with reduced levels of pulmonary FOXF1 may face death due to pulmonary hemorrhages with deficient alveolarization and vasculogenesis47,48. POLR2A geneis associated with poor overall and disease-free survival of patients with early-stage non-small cell lung cancer49. FOXP2 is responsible for defective postnatal lung alveolarization resulting in postnatal lethality in mice. T1 alpha, a lungalveolar epithelial type 1 cell-restricted gene crucial for lung developmentand function, is a direct target of FOXP2 and FOXP0. Both FOXP2 and FOXP1 are crucial regulators of lung and oesophageal development50. E2F1significantly influences S phase progression and apoptosis as well FOXM1 expression, cell survival, epirubicin resistance51 and its low level in primary lung adenocarcinoma may lead to cancer52. In small cell lung cancer, activation of oncogene EZH2 is often triggered by genomic deregulation of the E2F/Rb pathway53. Alveolar macrophages from RELA deficient animals are significantly less capable to involve in the canonical NF-κB pathway (a prototypical immune transcription pathway) and stimulate epithelial cells54.

miRNA regulatory network analysis

In present study, from the dataset GEO30180 we enlisted significant DEGs and investigated associated miRNAs using CyTargetLinker and in-silico validation for miRNA through cross validation with the multiple database mirDIP is carrird out for creation of Gene-TFs-miRNAs regulatory network. We have chosen the miRNA and mRNA from microRNA Data Integration Portal (mirDIP) web tools. In present manuscript, authors tried to identify the significant DEs, regulatory TFs and relevant miRNAs and predict the small molecules for drug combination of silicosis. Interaction network for genes and miRNAs is derived with 2461 Nodes and 13,343 edges as given in Fig. 2 using a plug-in CyTranslinker. An enlarged part is showing GPRY gene with its regulating miRNAs as an example. A grand total of 1100 number of very high target miRNA is identified from the considered database mirDIP using score class “very high”. For many upregulated genes, miRNA (numbers are given in parenthesis) is associated in several hundred; for example, CDK19 (617), OGT (552), LGR4 (426), MAT2A (409),NREP (408), TIA1 (387), and TMEM2 (202).

Figure 2
figure 2

Gene–miRNA interactions network; index red node (miRNA); grey node (gene).

Similarly, for down-regulated genes, a grand total of 1055 very high target miRNA is identified to be associated with theconsidered database mirDIP using score class “very high”. For example, RYBP (899), SERTAD2 (666), CREBRF (675), NAV3 (629), PPP3R1 (560), TSC22D2 (555), STC1 (523), MCL1 (463), EEA1(414), and ZFP36L1(387).

Integrating TF and miRNA regulatory networks

TFs induce micro RNAs (miRNAs) transcription and miRNAs influence mRNA translation as well as transcript degradation for the regulation of gene expression and all these results in complex relationships feedback or feed-forward loops54,55,56.

Furthermore, miRNAs and TFs are capable to alter each other's expression which results in difficulties for ascertaining the effect either one has on the target gene (TG) expression. The Integrated network (Fig. 3) consists of a total of 1396 nodes and edges 17,248 is constructed using Cytoscape from 235 (85 + 150) DE TGs, 24 regulating TFs and 1100 (846 up regulators + 1055 down regulators; 801 are common for both) very high target miRNA. In the network analysis, 14 clusters are achieved. The highest score of the cluster 1 is found to be 5.22 as given in Table 3. The present Gene-TFs-miRNAs regulatory network to find out the potential regulatory and regulated nodes provide detail insight for the gene associations, regulations by relevant TFs, and controlling the miRNAs.

Figure 3
figure 3

Gene–TF-miRNA regulatory network for Silicosis disease.

Table 3 GO: biological process of Gene of all the clusters.

Extraction of the cluster network

Using MCODE we extracted total 14 clusters according to the score computed along with nodes and edges, in which only six possesses Gene-TFs-miRNA interactions. In cluster 1 with maximum score, there are mainly Gene–Gene-miRNA interactions. However, gene FOXQ1 is observed in the clusters which is also identified as TF in literature. Few clusters are found to have only gene–gene or gene-miRNA interaction over Gene-TFs or Gene-TFs-miRNA interaction and is given in Supplementary Table S1.

From Fig. 4 in cluster 3 (Nodes-312, Edges-621, Score = 3.968), for example, TF JUND is associated with various genes as well as numerous miRNAs altogether. A similar type of network is observed for TFs RELA and PEBPB. In present study, considering the data with crystalline silica exposure of 0 and 800 µg/mL for 6 h exposure, 235 DEGs are enlisted on the basis of FDR set as the cut‑off parameters to screen out 250 significant increases or decreases in gene expression levels. Previous study by Sellamatu et al.16 considered a set of exposure (0, 15, 30, 120, 240 µg/cm2 for 0–6 h) for their study and on the basis of potential pathway and gene ontology term identified significantly expressed 60 DEGs. Current study emphasise on identification and enlist of the biological pathway and process individually where identified DEGs are involved identifying potential regulating TFs and target miRNA according to the DEGs.

Figure 4
figure 4

Cluster 3 for Gene–TFs-miRNA regulatory network in Silicosis disease.

Functional enrichments analysis

In present study, biological processes and pathways involoving the 235 DEGs with silica exposure are evaluated and significant biological process with DEGs involved are given in Table 3. A maximum of 11 genes is responsible for Rab proteins signal transduction followed by 9 genes for DNA integrity checkpoint (Fig. 5a). Various researchers reported that overexpression of Rab GTPases have a striking relationship with carcinogenesis and dysregulation of Rab proteins can be linked to the progression of already existent tumors contributing to their malignancy57,58.

Figure 5
figure 5

(a) GO: Biological process of Gene involved in Silicosis disease. (b) GO: Biological process of Gene of all the clusters involved in Silicosis disease.

For Cluster 1, the biological process identified and the genes involved are biological adhesion (TNS3, CD274, and CDH1), biological phase (TOP2A), biological regulation (TOP2A, CD274), cellular process (TN53, IDH1, TOP2A, UBAP1, CD274, and CDH1), immune system process (TN53, CD274), metabolic process (IDH1, and UBAP1) and response to a stimulus (CD274) (Fig. 5b). The biological pathways observed in the study is shown in Fig. 6a,b. Major pathways with higher involvement of genes (numbers given in parenthesis) are Cytokines and Inflammatory Response (6), Rheumatoid arthritis (11), Signaling by Nuclear Receptors (14), NF-kappa B signaling pathway (10), Cellular responses to stress (21), Interleukin-4 and Interleukin-13 signaling (10), Kaposi sarcoma-associated herpes virus infection (13), Interleukin-10 signaling (8), TNF signaling pathway (13), Spinal Cord Injury (13), Circadian rhythm related genes (15), Nuclear Receptors Meta-Pathway (19), IL-17 signaling pathway (16), Cellular Senescence (20), Adipogenesis (11), ESR-mediated signaling (13), Photodynamic therapy-induced NF-kB survival signaling (9), Osteoclast differentiation (11), Senescence-Associated Secretory Phenotype (SASP) (13), and Estrogen-dependent gene expression (13).

Figure 6
figure 6figure 6

(a) GO: Biological pathways of Gene involved in Silicosis disease. (b) GO: Biological pathways of Gene from six clusters involved in Silicosis disease.

Biological pathway analysis of the genes from the six clusters having Gene-TF-miRNA interactions and found the maximum genes are involved in Human T-cell leukemia virus 1 infection as well as Kaposi sarcoma-associated herpesvirus infection (14 nos) followed by Hepatitis B (11 nos) [Supplementary Table (S2)].

Prediction Small molecule signatures

Table 4 displays the Rank (based on the synergy score), the perturbed molecule (names of the chemical perturbations), Dose, Cell-line, time, the direction of regulation, and the GEO ID from which thesignature is extracted. We next sought to predict the drug combination with the ranked small molecule drug signature list (Table 5). From L1000CDS2 search engine forgene-set search, for differentially expressed genes provided nearly fifty significant combinations. These help infinding reverse and mimic combinations of an input gene expression signature for controlling Silicosis. The maximum scoring small-molecule combinations are CGP-60774 (total 20 combinations) followed by alvocidib (total 15 combinations) and with AZD-7762 (total 24 combinations) as can be seen from Table 5 with a few other drugs having a high probability of success.

Table 4 Drug signature predicted at each time point.
Table 5 Small molecule drug signature and combination.

Conclusions

Our study targets to provide detailed guidance for future fundamental researches along with some key genes, TFs and miRNAs, which are potential biomarkers for silicosis. The approach used in the manuscript allows us to incorporate various data sources into an integrated network, analysis of network parameters in order to find key network elements. Using various data sources, we found the relationships between different molecular components to support our comprehension of how silicosis progresses. In this study, 235 differentially expressed genes (150 down-expressed and 85 up-expressed) are identified as affected by exposure to crystalline silica. These genes are regulated by 24 TFs and 1100 very high target miRNAs. The network between DEGs, TFs and miRNAs are constructed using the various plug-in of Bioconductor, Cytoscape and MCODE to find the associateship of their various aspects. Total of 14 clusters of the network is achieved where only six clusters there is Gene-TFs-miRNA interaction and other eight clusters posess DEGs–miRNAs interactions or DEGs–DEGs–miRNAs interactions The most targeted genes and TFs as well as miRNAs are observed in cluster 3 to cluster 5 in the network analysis that may help in providing a detailed diagnosis of the disease for its cure. Maximum interacted DEGs with miRNA in terms of category are CDK19 (up-regulated) and ARID5B (down-regulated). Maximum interacted DEGs with DEGs and TFs are CEBPG, RELA, BACH1, FOXF1 and CHD1 wheras maximum interacted TFs are NR4A2, CDKN1A, ATF3, ERRFI1, FOSB and EGR1. Functional analysis of the DEGs given that the highest number (11) genes are responsible for Rab proteins signal transduction (Biological Process). Also, Cellular Senescence (20), IL-17 signaling pathway (16) and Signalling by Nuclear Receptors (14) is the dominant Biological Pathway among others. Maximum scoring small-molecule combinations are CGP-60774 (total 20 combinations) followed by alvocidib (total 15 combinations) and with AZD-7762 (total 24 combinations) is found with few other drugs having a high probability of success.