Introduction

Long non-coding RNAs (lncRNAs) represent a subgroup of non-coding RNAs with more than 200 nucleotides in length [1]. LncRNAs were previously believed to be transcriptional noise, but accumulating evidence has suggested that they play a regulatory role in a wide range of biological processes [2]. Therefore, the dysfunctions of lncRNAs are associated with various human diseases, including cancers [3,4,5].

Long intergenic non-coding RNAs (lincRNAs) that located in the interval between protein-coding genes are emerging as key regulators of diverse cellular processes [6]. Increasing studies have demonstrated that lincRNAs are regulatory elements in both oncogenic and tumor suppressive pathways [7,8,9,10]. The expression of lincRNAs can be affected by inherent genetic factors such as single-nucleotide polymorphisms (SNPs) in the promoter region, which are likely to disrupt transcription factor binding sites (TFBS). For example, a study reported that a functional SNP, rs944289, in the promoter region of lincRNA named papillary thyroid carcinoma susceptibility candidate 3 (PTCSC3) predisposeed to papillary thyroid carcinoma (PTC) through dysregulating the expression of PTCSC3 by suppressing the binding activity of both C/EBPα and C/EBPβ [11].

To date, genome-wide association studies (GWAS) have successfully identified a large number of SNPs that are associated with CRC [12,13,14,15], whereas a number of SNPs located in non-coding regions still have not been well explained. Thus, further characterization of lncRNA related SNPs may open a new avenue for functional analysis of the susceptibility of CRC. Moreover, the database developed by National Human Genome Research Institute (NHGRI) in USA (http://www.ebi.ac.uk/gwas/) indicated that disease-associated SNPs are largely located in the intergenic regions. We hypothesize that SNPs located in the promoter region of potentially related lincRNAs may contribute to the occurrence of CRC. To test this hypothesis, we focused on 10 lincRNAs (CTD-2147F2.1, RP11-384P7.7, RP11-3N2.1, RP3-525N10.2, RP11-89K21.1, CCAT1, AK055145, RP11-58A12.3, UCA1 and AY343891) with top fold-change value based on lncRNA microarray data from our previous study [16], to assess whether promoter SNPs in the lincRNAs were associated with risk of CRC.

In this study, we conducted a two-stage case–control study to identify possible biomarkers for predicting the risk of CRC. Furthermore, we evaluated gene by environment (G × E) interactions between the significant SNPs and smoking (or drinking) in relation to risk of CRC.

Materials and methods

Subjects

The source population was previously enrolled from Jiashan County, which had been approved by the Medical Ethical Committee of Zhejiang University School of Medicine. The recruitment details have been described previously [17]. Briefly, CRC cases were identified from local Cancer Surveillance and Registry System, mentally competent to complete an interview and with no previous history of familial adenomatous polyposis, ulcerative colitis, or Crohn’s disease. Healthy controls were recruited in parallel from the same population and were matched to cases by age (±5 years), gender and residential area. All participants were unrelated ethnic Han Chinese. Ultimately, a total of 821 cases and 857 controls were included in the two-stage case–control study (test set: 320 cases and 319 controls; validation set: 501 cases and 538 controls).

Data collection

A face-to-face interview was conducted using a structured questionnaire including demographic characteristics (e.g., age, gender, and body mass index), family history of cancer and life style (e.g., smoking and drinking). The definition of smokers was those who smoked at least 1 cigarette per day for more than 1 year, or more than 300 cigarettes in less than 3 months. An alcohol drinker was defined as someone who consumed at least one drink per day for more than 3 months. Subsequently, ~5 ml venous blood was collected from each participant and stored at −80 °C for the preparation of DNA isolation. This study was approved by the Medical Ethical Committee of Zhejiang University School of Medicine.

SNP selection and genotyping

SNPs located in the promoter region (2000 bp upstream region of the transcription start site) of each lincRNA and their minor allele frequency (MAF) value within the Chinese Han in Beijing (CHB) populations were extracted by utilizing the 1000 Genomes data (http://www.1000genomes.org/). Tag SNPs representing SNPs with pairwise correlation r 2 ≥ 0.8 and MAF ≥ 0.1 were selected.

Genomic DNA was isolated from peripheral blood samples using the modified salting-out procedure [18]. Genotyping for all SNPs was performed by the MassARRAY molecular weight array analysis system (BioMiao Biological Technology Co., Beijing, China).

Statistical analysis

Student’s t-test and Pearson χ 2-test were applied to evaluate the differences of continuous and categorical variables between cases and controls, respectively. Hardy–Weinberg equilibrium (HWE) for all SNPs was evaluated by goodness-of-fit χ 2-test. The associations between SNPs and CRC were estimated by calculating the adjusted odds ratios (ORs) and corresponding 95% confidence intervals (95% CIs) in multivariate logistic regression model. The Benjamini–Hochberg multiple testing correction method was applied, while after the P value correction, no or only few significant SNPs remained. Therefore, the uncorrected P values were used for the selection of significant SNPs between cases and controls. Pooled ORs in the combined set were estimated by the method of random-effect model when the value of I 2 > 50%, otherwise a fixed-effect model was used. Crossover analysis was used to explore the effect of gene factor only, environment factor only, and both gene and environment factors on CRC. Multiplicative model was conducted to evaluate the influence of gene–environment interactions on risk of CRC. Statistical analyses were performed using the SAS software, version 9.2 (SAS Inc., Cary, NC, USA) and Stata software, version 11.2 (StataCorp LP, TX, USA). P values < 0.05 were considered significant.

Results

Of the 30,586 lncRNAs in the previous genome-wide expression profile, which was performed by using Arraystar Human LncRNA Microarray v3.0 in six paired CRC and adjacent normal tissues, 556 lncRNAs were found to be significantly upregulated, and 1040 lncRNAs were found to be downregulated in tumor tissues. There were 795 lincRNAs in the abnormally expressed lncRNAs, and the lincRNAs with top 10 fold-change value and corresponding –log10 P value were presented in Fig. 1.

Fig. 1
figure 1

Top 10 aberrantly expression lincRNAs with their fold-change and –log10 P value

There were no significant differences in demographic characteristics and life-style factors between cases and controls in both test and validation set. However, CRC patients were more likely to have family history of cancer than controls (P < 0.001 in the test set; P = 0.039 in the validation set) (Table 1).

Table 1 Distributions of basic characteristics of colorectal cancer cases and controls

Nineteen SNPs in the ten lincRNAs were selected and genotyped. The call rate for each SNP was >98%. The chromosome region of lincRNA, ID (rs number) of candidate SNPs and allele (major/minor) frequency were listed in Table 2. Moreover, RP11-384P7.7 rs146223339, RP3-525N10.2 rs2256337, and RP11-58A12.3 rs147827294 that deviated from HWE significantly among controls in the test set were excluded in further analysis.

Table 2 The basic information of genotyped SNPs

As shown in Table 3, RP11-3N2.1 rs13230517 and RP11-89K21.1 rs741813 polymorphisms were observed to be significantly associated with risk of CRC adjusted for age, gender, BMI, family history of cancer, smoking, and drinking. By additive model, RP11-89K21.1 rs741813 had an adjusted OR of 1.90 (95% CI: 1.19–3.04). By dominant model, we found that individuals with RP11-3N2.1 rs13230517 GA/AA genotypes had a reduced risk of CRC (OR = 0.70, 95% CI = 0.51–0.98) and RP11-89K21.1 rs741813 AT/TT carriers had an increased risk of CRC (OR = 1.86, 95% CI = 1.14–3.02). In the validation set, RP11-3N2.1 rs13230517 polymorphism kept to have a significantly decreased risk of CRC by dominant model (OR = 0.76, 95% CI = 0.59–0.98). However, no statistical evidence of associations between RP11-89K21.1 rs741813 and CRC were observed (Table 4). Pooled analysis revealed that individuals RP11-3N2.1 rs13230517 variants were less susceptible to CRC by additive model (OR = 0.85, 95% CI = 0.74–0.97) and dominant model (OR = 0.74, 95% CI = 0.60–0.90) in the combined set (Table 4).

Table 3 Associations between individual SNPs and colorectal cancer in test set
Table 4 Associations of rs13230517 and rs741813 with colorectal cancer in the validation and combined sets

Moreover, crossover analysis and gene–environment interactions were conducted for RP11-3N2.1 rs13230517. As shown in Table 5, no single effect of smoking and its combined effect with RP11-3N2.1 rs13230517 polymorphism on risk of CRC were observed. In addition, non-drinkers with GA/AA genotypes had a decreased risk of CRC when compared to those with GG genotype (OR = 0.68, 95% CI = 0.47–0.99) in the test set, and similar result was continued in the combined set (OR = 0.74, 95% CI = 0.59–0.93). However, no evidence of multiplicative interactions between smoking or drinking and RP11-3N2.1 rs13230517 polymorphism on CRC was observed.

Table 5 Gene–environment interactions between rs13230517 and smoking and alcohol drinking on CRC

Discussion

CRC is the third most common cancer and the fourth leading cause of cancer death worldwide [19]. The prevalence of CRC is increasing in China, but the cause of its epidemic is complicated and not entirely clear, which poses a challenge to its prevention and control. In the current study, we found that a novel SNP rs13230517, which is in the promoter region of lincRNA RP11-3N2.1 is associated with CRC.

Recently, lncRNAs were rapidly recognized as new players in tumorigenesis and tumor suppressor. They could regulate target genes by lncRNA–miRNA and lncRNA–protein interactions, or acting as miRNA precursors [20]. As a class of lncRNAs, accumulating lincRNAs were reported to participate in the complicated process of colorectal carcinogenesis. Zhai et al. [21] suggested that the expression level of lincRNA-p21 was significantly lower in CRC tumor tissues than paired normal tissues, and lincRNA-p21 level was linked to CRC stage, vascular invasion and tumor invasion. Among the selected 10 lincRNAs from microarray data, CCAT1 was proved to have promising diagnostic value in detecting CRC, because its expression was remarkably elevated in colorectal tumor tissues and adenomatous polyp tissues [22,23,24]. Besides, Han et al. [25] reported that UCA1 could affect the proliferation, apoptosis and cell cycle progression of CRC cells.

RP11-3N2.1 was found to be downregulated in CRC tumor tissues relative to paired normal tissues and its fold-change value was ranked third based on the microarray data. It is a noncoding transcript of the encoding gene zinc-finger protein 727 (ZNF727). ZNF family has been recognized as potential B-cell epitopes eliciting the production of autoantibodies in cancer [26]. The ZNF panel (ZNF346, ZNF638, ZNF700, and ZNF768) was reported to be overexpressed at the mRNA level in at least 20% of investigated tumors when compared to adjacent normal colorectal mucosa [27]. In addition, CRC patients with higher expression of ZNF545 were found to have favorable relapse-free survival than those with lower ZNF545 levels [28]. However, there are no studies on the functional similarity or association between RP11-3N2.1 and ZNF family to date. In a genome-wide analysis of lncRNA expression from the publicly available cancer microarray database Gene Expression Omnibus (GEO) under the accession number GSE95423, RP11-3N2.1 (probe name: ASHGA5P016427 in Arraystar Human LncRNA Microarray V3.0) was found to be downregulated (fold-change = 0.487, P = 0.00706) in CRC tissues with liver metastasis compared with those without metastasis, suggesting that RP11-3N2.1 might be implicated in promoting metastasis of CRC. RP11-3N2.1 is an uncharacterized RNA yet to be discovered in previous studies and limited information has been documented. RP11-3N2.1 is a type of lincRNA with a length of 748 bp located in chr7q11.21. As previous reports described that lncRNA could decrease the expression of miRNA target genes by inactivating miRNAs [29]. It was computationally predicted by DIANA-LncBase (www.microrna.gr/LncBase) that miRNA-135b may target on RP11-3N2.1. He et al. [30] indicated that miRNA-135b contributed to anti-apoptosis and chemoresistance in CRC. Besides, several lines of evidence proved that miRNA-135b was related with prognosis of CRC [31,32,33]. On the other hand, lncRNAs can also act as co-activators of transcription factors by interacting with RNA binding proteins (RBPs) and the interaction finally alters the localization and activity of the proteins [34]. The prediction tool starBase v2.0 (http://starbase.sysu.edu.cn/) presented that the RBPs might be FUS or SFRS1. However, the functional mechanisms of RP11-3N2.1 need to be further explored and confirmed at molecular level based on the above findings.

Currently, supporting data indicated that a number of disease-associated SNPs reside in lincRNAs [35]. As the binding of a transcription factor (TF) to promoter region can result in local structural modification, making the removal of a preexisting component or recruitment of a new component [36], SNPs located in the promoter region are regarded as functional SNPs to influence gene expression and subsequently disease involvement. In our study, we detected that a promising SNP RP11-3N2.1 rs13230517 was implicated in CRC. This SNP is about 1842 bp upstream of RP11-3N2.1 and was selected as one of the Tag SNPs by candidate SNP strategy. It is plausible that RP11-3N2.1 rs13230517 mutant may affect the process of transcription by binding with a certain TF, and the abnormal transcription inhibited the expression of RP11-3N2.1. According to HaploReg v4.1 (http://compbio.mit.edu/HaploReg), rs13230517 polymorphism has been predicted to alter seven regulatory motifs, including TFII-I, T3R, MAZR, and so on. Furthermore, the SNP is detected within a region with promoter histone markers (H3K4m3) in colonic and rectal mucosa. However, the lack of experimental data requires further investigation into the functional relevance of rs13230517 to be performed, to conclude the nature of the conferred phenotypic effects.

As for the environmental factors, we focused on the factors of CRC related life styles (smoking and drinking), which have been identified and established in epidemiological studies [37, 38]. Crossover analysis suggested that RP11-3N2.1 rs13230517 polymorphism was negatively associated with CRC among non-drinkers, which proved the protective role of rs13230517 polymorphism in CRC as well. Even though other associations did not reach statistical significance, their directions (positive/negative) were in accordance with previous studies, such as the deleterious role of smoking and drinking on CRC. Nevertheless, we found no multiplicative gene–environment interactions in related to CRC. Accordingly, the sample size still needs to be enlarged for exploring the interactions.

Although the findings from our genetic association analysis suggest a link between RP11-3N2.1 rs13230517 polymorphism and CRC, several limitations of our study should be noted. Firstly, the participants in the case–control study were ethnic Han Chinese living in a small rural county, confining its representativeness for Chinese population. Besides, because of the relatively small sample size of our study, we might not be able to detect weak genetic-disease associations and gene–environment interactions. However, it is less of concern as we combined the results from the test set and validation set to cover the shortage. Thirdly, due to the lack of staging information of CRC patients in the current study, we cannot evaluate the association between RP11-3N2.1 rs13230517 polymorphism and clinical features among CRC patients. Finally, no functional data were achieved for the novel SNP, so the mechanism at the molecular level remains largely unknown and needs to be elucidated.

In conclusion, the present study first provides the evidence of the association between RP11-3N2.1 rs13230517 polymorphism and colorectal carcinogenesis in a Han Chinese population. The findings might provide alternative biomarkers for the risk prediction, prevention and early diagnosis of CRC. Additional epidemiological studies with large sample size and further mechanistic investigations into the function of the biomarker are warranted to improve our understanding of the role in colorectal tumorigenesis more comprehensively.