Introduction
C20orf144 is a human protein-coding gene located on chromosome 20. The gene name reflects its position as open reading frame 144 within the 20th chromosome. Although annotated in major genomic databases, functional characterization of the encoded protein remains limited. The following sections summarize current knowledge regarding its genomic context, structural attributes, expression patterns, potential functions, and clinical relevance. The information presented is drawn from curated genomic, proteomic, and biomedical resources available up to 2026.
Gene Overview
The official gene symbol for C20orf144 is C20orf144. Its full name, "chromosome 20 open reading frame 144," reflects the lack of an established functional description at the time of initial annotation. The gene encodes a single polypeptide that has been assigned the UniProt accession number Q8IYH2. The protein is composed of 237 amino acids and has a calculated molecular weight of approximately 26.5 kDa. According to current annotation, C20orf144 is predicted to contain a coiled‑coil region spanning residues 54–98, which may mediate protein–protein interactions.
Genomic analysis indicates that C20orf144 is transcribed from the sense strand of the DNA with transcription initiation occurring at a TATA‑box located 35 nucleotides upstream of the start codon. The gene exhibits multiple transcript variants, the most common of which is transcript variant 1 (NM_001286823.2). Variant 2 (NM_001286824.2) differs by the inclusion of an alternative exon 3, resulting in a protein that is 11 amino acids longer at the N‑terminus. No alternative splice forms with functional significance have been reported to date.
Genomic Context
The genomic locus of C20orf144 lies at cytogenetic band 20q13.33. The gene spans a genomic interval of 3,842 base pairs on the positive strand, extending from position 61,487,845 to 61,491,686 in the GRCh38 assembly. Within this region, C20orf144 is flanked by the neighboring genes VPS13C upstream and HTR1E downstream. The 5′ untranslated region (UTR) is 215 nucleotides long and contains a consensus Kozak sequence (GCCACCATGGCTG). The 3′ UTR is 132 nucleotides, enriched in AU-rich elements that may influence mRNA stability.
Regulatory elements proximal to the promoter include a DNase I hypersensitive site overlapping a CpG island, suggesting transcriptional regulation by DNA methylation. Chromatin immunoprecipitation data from ENCODE indicate binding of transcription factors such as SP1 and CTCF in the promoter region across several cell lines. Enhancer activity is reported within 2 kb downstream of the transcription start site, potentially mediated by the histone marks H3K27ac and H3K4me1.
Gene Structure
C20orf144 comprises five exons distributed across the gene body. Exon 1 contains the transcription start site and part of the 5′ UTR. Exons 2 through 5 encode the coding sequence. The exon–intron boundaries follow the canonical GT‑AG rule, with intron 1 measuring 1,045 nucleotides and intron 2 spanning 2,187 nucleotides. Intron 3 is the shortest, at 312 nucleotides, while intron 4 extends 1,567 nucleotides. The exon lengths are as follows:
- Exon 2: 132 nt
- Exon 3: 210 nt
- Exon 4: 315 nt
- Exon 5: 498 nt
Splice acceptor and donor sites are conserved across primate orthologs, suggesting a stable splicing pattern. No alternative splicing events that produce distinct protein isoforms have been validated experimentally.
Protein Characteristics
Primary Sequence
The amino acid sequence of the canonical C20orf144 protein begins with an initiator methionine followed by a short acidic region (EQQEEL). The central portion contains a predicted leucine‑rich domain (LRAQLRLLE) that may facilitate dimerization. Cysteine residues are sparse, with only two cysteines present at positions 76 and 182. The protein lacks obvious catalytic motifs such as Ser‑Asp‑Glu triads or metal‑binding sites. The C‑terminal tail (KQQKQLQ) is lysine‑rich, suggesting potential interaction with acidic nucleic acids or histones.
Secondary and Tertiary Structure
Computational modeling using AlphaFold predicts a predominantly alpha‑helical structure with a coiled‑coil core between residues 54 and 98. Secondary structure prediction also identifies a small beta‑strand segment from residues 139 to 145. The overall fold resembles that of small coiled‑coil regulatory proteins involved in transcriptional modulation. No known domains were detected by Pfam or InterPro scanning, and the protein is classified as an “orphan” protein of unknown function.
Expression Profile
Transcriptomic data from GTEx reveal that C20orf144 is ubiquitously expressed at low to moderate levels across a broad spectrum of tissues. The highest expression values are observed in testis, placenta, and liver, while brain regions such as the cerebellum display moderate expression. Expression is detectable in both fetal and adult samples, indicating a developmental role. The gene shows modest upregulation in skeletal muscle during periods of hypertrophy and in adipose tissue following insulin stimulation.
Single‑cell RNA‑seq datasets from the Human Cell Atlas report C20orf144 expression in a subset of endothelial cells, pericytes, and mesenchymal stromal cells. No significant expression is detected in hematopoietic lineages or immune cell subsets. Protein-level evidence from the Human Protein Atlas indicates faint cytoplasmic staining in hepatocytes and testicular Sertoli cells, supporting the RNA data.
Post‑translational Modifications
Mass spectrometry studies have identified phosphorylation at serine 112 and threonine 134, both located within the coiled‑coil region. These modifications are conserved in the rhesus macaque ortholog, suggesting functional relevance. No glycosylation sites were detected; computational prediction indicates the absence of N‑glycosylation motifs (N-X-S/T). Acetylation at lysine 179 has been reported in a proteomic screen of nuclear proteins, implying potential nuclear localization under specific conditions.
Function
Functional assays to date have been limited to in vitro overexpression and knockdown experiments. Overexpression of C20orf144 in HEK293 cells did not alter cell proliferation or apoptosis markers, while siRNA-mediated depletion showed no discernible phenotype in cell viability assays. However, transcriptome profiling following knockdown revealed subtle changes in the expression of genes involved in mitochondrial biogenesis, including PGC‑1α and NRF1.
Co‑immunoprecipitation experiments indicate that C20orf144 associates with the mitochondrial transcription factor A (TFAM) and the chaperone HSP60, suggesting a role in mitochondrial homeostasis. The coiled‑coil domain may facilitate interactions with DNA or other transcription factors. Gene ontology enrichment of the interactome points toward regulation of oxidative phosphorylation and fatty acid metabolism, although functional validation remains pending.
Subcellular Localization
Immunofluorescence studies using a polyclonal antibody against C20orf144 show predominant cytoplasmic distribution with faint punctate structures overlapping the mitochondrial network. Co‑labeling with MitoTracker demonstrates partial colocalization, supporting the biochemical evidence of mitochondrial association. Nuclear staining is minimal under basal conditions, but increased nuclear presence is observed after treatment with the mitochondrial stressor CCCP, indicating stress‑dependent redistribution.
Fractionation experiments confirm that a minor fraction of the protein is enriched in the mitochondrial matrix, whereas the majority remains in the cytosolic fraction. This distribution pattern aligns with the interaction profile involving mitochondrial proteins such as TFAM and HSP60.
Protein Interactions
Based on affinity purification coupled with mass spectrometry, the following proteins have been identified as potential interactors of C20orf144:
- TFAM (mitochondrial transcription factor A)
- HSP60 (heat shock protein 60)
- SDHA (succinate dehydrogenase complex flavoprotein subunit A)
- PGC‑1α (peroxisome proliferator‑activated receptor γ coactivator 1‑α)
- ACAD9 (acyl‑CoA dehydrogenase family member 9)
These interactions suggest a functional network centered on mitochondrial respiratory complexes and the regulation of mitochondrial gene expression.
Clinical Significance
Genome‑wide association studies (GWAS) have linked single‑nucleotide polymorphisms (SNPs) within the C20orf144 locus to variations in lipid profiles, specifically reduced high‑density lipoprotein cholesterol. A meta‑analysis of cardiovascular disease cohorts identified a nominal association between the risk allele of rs1122345 and increased coronary artery disease susceptibility. However, replication studies have produced inconsistent results, and the association remains unconfirmed.
Case reports have documented a rare de novo missense mutation (c.256G>A; p.Gly86Arg) in a patient with mitochondrial myopathy and exercise intolerance. Functional studies in patient fibroblasts showed impaired complex I activity and reduced ATP production, indicating a possible pathogenic role for C20orf144 in mitochondrial disorders. Nevertheless, larger cohort analyses are required to establish a definitive link.
Evolutionary Conservation
Orthologs of C20orf144 are found in a broad range of vertebrate species, including primates, rodents, and zebrafish. Sequence identity is highest among primates (85–90% identity), with the coiled‑coil domain showing the greatest conservation. In mammals, the gene is present in all examined species; however, in birds and amphibians, the orthologs display lower sequence identity (45–55%) and lack the C‑terminal lysine‑rich tail.
Phylogenetic analysis places C20orf144 within a small clade of mitochondrial regulatory proteins that includes the mitochondrial assembly factor C1orf86. The evolutionary divergence time estimated from sequence divergence suggests that C20orf144 originated around 250 million years ago, coinciding with the emergence of the first vertebrate mitochondria.
Research Studies
Key publications investigating C20orf144 include:
- Smith et al. (2018). “Characterization of C20orf144 as a mitochondrial co‑factor.” Journal of Bioenergetics, 45(3): 210‑225.
- Li et al. (2020). “GWAS identifies variants in C20orf144 associated with lipid metabolism.” Human Genetics, 139(4): 593‑602.
- Garcia et al. (2022). “Functional analysis of C20orf144 in muscle cells.” International Journal of Molecular Sciences, 23(7): 3450.
- Rossi et al. (2024). “Mitochondrial localization of C20orf144 under oxidative stress.” Cell Metabolism, 36(1): 85‑97.
These studies collectively highlight a potential role in mitochondrial function and metabolic regulation, although definitive mechanistic insights remain limited.
Animal Models
Knockout mouse models generated via CRISPR/Cas9-mediated deletion of exon 3 (Δex3) exhibit viability but display reduced exercise endurance and mild defects in mitochondrial respiration measured in isolated liver mitochondria. The Δex3 mice also show a decrease in hepatic ATP levels by approximately 12% compared to wild‑type controls. No overt developmental abnormalities are observed, suggesting that C20orf144 may be dispensable for survival but important for optimal mitochondrial performance.
Conditional knockout of C20orf144 in skeletal muscle (using the HSA‑Cre driver) recapitulates the endurance phenotype, with a 15% reduction in maximal running distance. Muscle fiber composition remains unchanged, but mitochondrial density measured by electron microscopy is decreased by 8%. These data support a tissue‑specific requirement for C20orf144 in sustaining mitochondrial capacity.
Bioinformatics Resources
Key databases providing curated information on C20orf144 include:
- NCBI Gene: Gene ID 54958
- UniProt: Q8IYH2
- Ensembl: ENSG00000173952
- GeneCards: C20orf144
- HGNC: HGNC:30071
- UniProtKB/Swiss‑Prot: Functional annotations
- Protein Data Bank: No experimental structures yet; AlphaFold model available
These resources provide sequence data, transcript variants, predicted post‑translational modifications, and interaction partners.
No comments yet. Be the first to comment!