DNA methylation, RdDM machinery, and genetic structure across 20 Zostera marina Nanopore methylomes — the analytical foundation for an RNA-directed methylation strategy.
This page reports the computational analyses performed on 20 Zostera marina (eelgrass) samples sequenced by Decibel Bio (Oxford Nanopore, 5-methylcytosine). It is the technical companion to the project overview. Every figure and table here is reproducible from the data bundle linked at the bottom.
RNA-directed DNA methylation (RdDM) is the plant's natural, non-transgenic route to targeted gene silencing. The whole strategy depends on that pathway being present and complete in Zostera marina, so we inventoried all 20 of its core genes.
Twenty reviewed Arabidopsis thaliana RdDM / DNA-methylation reference proteins were searched (DIAMOND blastp, --very-sensitive, e<1e-5) against the Zostera marina proteome (20,648 proteins). Each was scored by amino-acid identity and query coverage into Strong / Moderate / Weak ortholog-confidence calls.
The complete pathway is present: Pol IV (NRPD1) and Pol V (NRPE1) as distinct genes, the shared second subunit NRPD2, the siRNA-biogenesis arm (RDR2, DCL3), the AGO4/AGO6 effectors, the recruitment/chromatin module (SHH1, CLSY1, DRD1, DMS3, RDM1, SUVH2/9), the de novo methyltransferase DRM2, the maintenance methyltransferases (MET1, CMT3, CMT2), and — relevant for epiallele stability — the demethylases ROS1 and DME.
All 20 components, grouped by pathway module, with the best ortholog hit and confidence call in each species. Both seagrasses carry every component; Thalassia values come from the comparative scan in §2.
| Gene | Module | Role | Zostera hit | Z. %id | Z. call | Thalassia hit | T. %id | T. call |
|---|---|---|---|---|---|---|---|---|
| NRPD1 | Pol IV/V | Pol IV largest subunit | KMZ70232.1 | 33% | Moderate | Thate01g01330 | 36% | Moderate |
| NRPE1 | Pol IV/V | Pol V largest subunit | KMZ58649.1 | 49% | Strong | Thate01g10480 | 41% | Moderate |
| NRPD2 | Pol IV/V | Pol IV/V 2nd subunit | KMZ68176.1 | 57% | Strong | Thate05g12060 | 58% | Strong |
| CLSY1 | Recruitment | Chromatin remodeler | KMZ57147.1 | 36% | Moderate | Thate01g02240 | 37% | Moderate |
| SHH1 | Recruitment | Pol IV recruitment | KMZ64381.1 | 39% | Moderate | Thate09g03450 | 42% | Moderate |
| RDR2 | siRNA biogenesis | dsRNA synthesis | KMZ58731.1 | 50% | Strong | Thate08g04430 | 54% | Strong |
| DCL3 | siRNA biogenesis | 24-nt siRNA dicer | KMZ59536.1 | 42% | Moderate | Thate02g35430 | 43% | Moderate |
| AGO4 | Effector | siRNA effector | KMZ71001.1 | 65% | Strong | Thate01g03750 | 68% | Strong |
| AGO6 | Effector | siRNA effector | KMZ69032.1 | 58% | Strong | Thate03g26310 | 61% | Strong |
| DRD1 | DDR / recruitment | DDR complex helicase | KMZ74703.1 | 56% | Strong | Thate02g12400 | 58% | Strong |
| DMS3 | DDR / recruitment | DDR complex | KMZ73474.1 | 49% | Weak/divergent | Thate09g08050 | 51% | Strong |
| RDM1 | DDR / recruitment | DDR complex | KMZ64644.1 | 60% | Strong | Thate03g04120 | 57% | Strong |
| SUVH2 | Pol V recruitment | Pol V recruitment | KMZ64142.1 | 48% | Strong | Thate03g07440 | 46% | Strong |
| SUVH9 | Pol V recruitment | Pol V recruitment | KMZ64142.1 | 53% | Strong | Thate04g01200 | 58% | Strong |
| DRM2 | Methyltransferase | de novo (CHH) | KMZ59545.1 | 52% | Strong | Thate02g36290 | 53% | Strong |
| MET1 | Methyltransferase | CG maintenance | KMZ71047.1 | 73% | Strong | Thate08g10510 | 65% | Strong |
| CMT3 | Methyltransferase | CHG maintenance | KMZ58195.1 | 51% | Strong | Thate05g07140 | 48% | Strong |
| CMT2 | Methyltransferase | CHH maintenance | KMZ58195.1 | 52% | Moderate | Thate05g07140 | 57% | Moderate |
| ROS1 | Demethylase | active demethylation | KMZ69251.1 | 48% | Strong | Thate08g07090 | 45% | Moderate |
| DME | Demethylase | active demethylation | KMZ69251.1 | 45% | Moderate | Thate07g01050 | 48% | Moderate |
Confidence calls: Strong ≥45% identity & ≥70% query coverage; Moderate ≥30% & ≥50%; Weak/divergent below that but a clear reciprocal match. Hit IDs are the best-scoring protein in each proteome (Zostera KMZ from GenBank; Thalassia Thate from the annotated proteome).
Two structural notes: AGO4 has expanded into a small clade in Zostera (distinct AGO4A/AGO4B genes), while several Arabidopsis paralog pairs (SUVH2/9, CMT2/3, ROS1/DME) collapse onto single shared Zostera genes — the ancestral single-copy state, with the duplications being Arabidopsis-lineage specific.
Finding the pathway complete in Zostera does not by itself show it is a general seagrass feature. To test that, we searched for the same 20 genes in five more seagrasses covering all four seagrass families.
We searched the same 20 Arabidopsis queries against five additional seagrasses spanning all four seagrass families: Thalassia testudinum and Halophila stipulacea (Hydrocharitaceae); Cymodocea nodosa and Amphibolis antarctica (Cymodoceaceae); and Posidonia oceanica (Posidoniaceae) — versus Zostera marina in Zosteraceae. Thalassia was searched against its annotated proteome (25,744 proteins); the other four, whose annotations are not openly available, were searched directly against their genome assemblies by splice-aware protein-to-genome alignment (miniprot). Two of the six — Amphibolis and Halophila — were added in a second pass to bring within-family replication to two of the four families; both were scanned with the identical query set, aligner, and confidence thresholds.
Because the same genes match in the same way across six separate genomes, these are almost certainly the true seagrass versions of the RdDM genes and not lookalikes. One gene, DMS3, looked weak in Zostera only because its gene model there is incomplete; it is a strong match in all five other seagrasses, so the pathway is clearly complete in the group. Two of the six species — Amphibolis and Halophila — were added as second members of families we already had, which lets us check that the pathway holds up within a family and not just between families. Amphibolis matched the same pattern as its relatives Cymodocea and Posidonia, including the same two hard-to-detect genes, confirming that those gaps are a feature of that plant family rather than a failure of the search. The two genes that remove methylation (ROS1 and DME) are present in every species, which matters because it means any change we add can also be undone by the plant itself. Since the machinery is shared across all four seagrass families, a method that works in Zostera is a strong candidate to transfer to the others, most immediately to Thalassia (whose genome is about ten times larger) and to the heat-tolerant Cymodoceaceae.
The table below is the assembly picture from NCBI GenBank at the time of writing. Two things stand out: the best Zostera marina assembly is not on GenBank at all (the fragmented 2015 scaffold set is; the chromosome-scale v3.1 this project uses lives at JGI/Phytozome), and Halodule wrightii — the most heat-tolerant species — has no public assembly of any kind. Where good genomes do exist (Thalassia, Cymodocea, Posidonia, all assembled 2024), there is still no base-resolution methylome for any seagrass beyond the Zostera data generated here.
| Species | Family | Best public genome | Level | Contiguity | Note |
|---|---|---|---|---|---|
| Zostera marina | Zosteraceae | JGI/Phytozome v3.1 (used here) | Chromosome | 6 pseudo-chromosomes, 260.5 Mb | GenBank copy (GCA_001185155.1, 2015) is fragmented (scaffold N50 0.49 Mb) |
| Halodule wrightii | Cymodoceaceae | none | — | — | No public assembly — highest-value new genome |
| Thalassia testudinum | Hydrocharitaceae | GCA_037157565.1 (2024) | Scaffold | contig N50 ~112 Mb, 4.26 Gb | Good genome; no methylome |
| Cymodocea nodosa | Cymodoceaceae | GCA_036874045.1 (2024) | Chromosome | N50 ~21 Mb, 375 Mb | Good genome; no methylome |
| Posidonia oceanica | Posidoniaceae | GCA_037176725.1 (2024) | Scaffold | contig N50 ~68 Mb, 2.96 Gb | Good genome; no methylome |
| Amphibolis antarctica | Cymodoceaceae | GCA_024699675.1 (2022) | Contig | N50 ~40 kb, 245 Mb | Fragmented — presence checks only |
| Halophila stipulacea | Hydrocharitaceae | GCA_047748335.1 (2025) | Scaffold | N50 ~8 kb, 3.2 Gb | Fragmented — presence checks only |
This is the evidence base for the collection priorities in §4: generate a genome where none exists (Halodule), and generate methylomes broadly, since whole-genome base-resolution methylation maps are effectively absent from the public record for every seagrass.
With the machinery confirmed, this section lays out how it would be put to use: which traits methylation could plausibly change, how the change would be delivered, and how each trait would be measured. Specific target genes await phenotype-linked methylation data.
RdDM is the sequence-targetable arm of the plant methylation system. A 24-nt small-RNA trigger complementary to a locus recruits the pathway to methylate that locus, biased toward the CHH context. Because targeting derives from an RNA sequence rather than a nuclease or an inserted gene, RdDM is compatible with a non-transgenic strategy: the machinery a plant already uses to silence its transposons can be directed to a chosen regulatory region.
The feature-level data locate the natural handle. Promoters carry the highest non-CG methylation of any genic feature (CHH 3.3%; §5), and non-CG methylation is the RdDM-maintained mark, so the pathway already acts at the regulatory regions relevant to gene control. Prior seagrass work supports a methylation–phenotype link: in a clonal Zostera marina meadow, gene-body CG methylation covaried with photosynthetic performance independent of genetic background, with the associated genes enriched for light-harvesting and protein-folding functions (Jueterbock et al. 2020). In Posidonia oceanica and Cymodocea nodosa, warm-origin plants carry higher global methylation than cold-origin plants, and heat-responsive genes are concentrated in the low-gene-body-methylation, high-plasticity fraction of the genome (Entrambasaguas et al. 2021).
The traits below are the current candidates for engineering or selection. Each one is tied to a documented cause of seagrass-restoration failure: the 2024 USDA/NOAA report to Congress catalogues why restorations fail (Appendix 4, Orth, Lefcheck and colleagues), and nearly every trait here addresses one of those failure modes. Targeting a known failure mode means a success in the lab translates into better establishment and survival in the field, which is the main bottleneck in restoration. Each trait needs a quantitative assay before it can be used as a selection endpoint; the proposed measurements are listed alongside.
| Trait | Rationale | Restoration failure mode addressed (federal report) | Proposed quantitative readout |
|---|---|---|---|
| Pathogen resistance | Labyrinthula zosterae (wasting disease) is a primary threat to Zostera meadows | "Disease (e.g., fungal attack on seeds or seedlings)" | Lesion area / infection score; challenge assay survival |
| Temperature tolerance | Central climate-resilience goal | Marine heatwaves named the emerging primary threat; assisted gene flow from warm-adapted stock is the stated strategy | Biomass retention and leaf coloration under heat ramp (multi-spectral imaging) |
| Photosensitivity | Light regime varies across restoration sites | "Too shallow (desiccation) or too deep (insufficient light)"; poor water clarity | Biomass and coloration under controlled irradiance |
| Salinity tolerance | Estuarine and dredging-affected sites vary widely | Site-selection failures at variable-salinity estuarine and placed-sediment sites | Growth vs. water conductivity gradient |
| Carbon fixation | Directly tied to blue-carbon value | (value-enhancing rather than failure-linked) | Leaf/biomass accrual plus sediment contribution modeled from root oxygenation |
| Seed success ratio | Restoration throughput from seed | "Lack of donor material or seed stock (e.g., no flowering)" | Germination and establishment fraction |
| Sulfide (H₂S) tolerance | Sulfidic sediments limit survival | "Sediment instability … smothering and burial of seedlings" | Growth/survival across H₂S dosing |
| Hypoxic-sediment durability | Low-oxygen sediments common at degraded sites | Sediment/anchorage failures; night-time hypoxia in dense beds | Survival and rhizome growth under sediment hypoxia |
| Epiphyte-overgrowth resistance | Epiphyte load reduces photosynthesis; grazers (nudibranchs) mediate cleaning | "Algal blooms and/or excessive epiphyte growth"; "(over)grazing of transplants" | Epiphyte cover; grazer preference assay (harder to standardize) |
Non-CG methylation status varies by tissue and ramet age, so assays and sampling must be standardized: roots are methylomically distinct from leaves and rhizomes (thousands of differentially methylated sites, versus ~18 between leaves and rhizomes), and young versus old ramets separate in methylation ordination (Nilsen et al. 2025). Leaves and rhizomes are interchangeable sampling proxies; roots are not.
Naming a trait is not the same as knowing which cytosines to change to affect it. Going from one to the other is a distinct step. Much of the groundwork for it is what the analyses in this document produce; the final target-ranking is where Decibel's platform comes in.
The target-identification logic runs in five stages:
The output is a short list of candidate sites: a specific promoter or region, in the facultative fraction, on a gene with a plausible tie to the trait. Ranking those candidates, deciding which to write, and designing the small-RNA trigger that writes them is the step a trait-prediction platform such as Decibel's is expected to perform. Decibel's internal methods are proprietary, so this states where their platform fits rather than how it works. In this project's terms: the analyses here produce the reference methylomes, the genome coordinates, the separation of inherited from facultative marks, and the trait-contrasting source plants; the platform uses those to name and prioritize the specific targets to write.
The current gap is stage 1. We have the methylomes (stage 2) and the analytical methods for stages 3–5, but not yet the phenotype-linked collection stage 1 requires. Assembling that trait contrast is the near-term experimental priority, and it is what the collection strategy in §4 is designed to produce.
There are two fundamentally different ways to get a plant to carry a useful methylation mark, and they lead to three practical routes. The first way is to add the mark ourselves — deliver a small-RNA trigger that tells the RdDM pathway to methylate a chosen gene. The second way is to skip induction entirely and instead find plants that already carry the mark naturally, then propagate them. The three cards below follow that split: Routes A and B are two ways of adding the mark (they differ only in how the trigger reaches the plant), and Route C is the no-induction alternative. All three are non-transgenic, and each has been shown to work in other plants but not yet in a seagrass.
Route A is the one Decibel Bio's platform performs: it reads the methylome and then writes a new mark using an externally applied spray or seed treatment. Routes B (letting the plant move the trigger internally) and C (selecting plants that already carry the mark) are alternatives that would not rely on Decibel's write step.
Candidate traits would be screened in aquarium trials using a multi-spectral camera to quantify biomass and coloration. Two starting materials are available on different timelines: eelgrass grown from seed reaches assay size in approximately six months, while live plants collected from multiple geographic locations provide immediate material together with the environmental and trait data of their source sites, supporting comparison of natural variation before any manipulation. Once RdDM material is in hand, water-dispersal and pelletized-dispersal methods can be tested on live plants alongside RdDM pre-treated seed, with parallel trials of co-inoculation with beneficial bacterial colonies dispersed simultaneously.
The near-term priorities are: (1) generate phenotype-linked methylomes to identify specific target loci for the traits above; (2) pilot one delivery route — SIGS has the lowest barrier — against a reporter locus to confirm the trigger-to-methylation step operates in Zostera; and (3) screen warm-adapted ecotypes for pre-existing favourable epialleles that require no induction.
Route A above is the path we expect to use. We plan to make the methylation change using technology from Decibel Bio, a Berkeley company that has already sequenced the 20 Zostera marina genomes and methylomes described in these findings. Decibel's platform has two parts: it reads a plant's methylation pattern and predicts traits from it, and it writes new methylation using a spray or seed treatment that does not alter the plant's DNA. The change is reversible, keeps the plant's own genetic diversity, and can be applied at planting or later in the season — the same conditions this project requires. Decibel's published example treats a crop so it protects its photosynthesis under extreme heat, which is the trait we lead with in eelgrass. One caution: Decibel's public materials describe the platform only in general terms and do not spell out the molecular chemistry, so the exact design of the trigger should be confirmed with the company rather than assumed from press coverage.
Decibel's platform was built for land crops, where a spray lands on leaves in the air. Eelgrass grows underwater, so the main thing to work out is how to apply the treatment to a submerged plant. There are three ways to do this. The simplest is to treat the seed before we plant it: this uses Decibel's existing seed treatment with little change, and since we plant from seed anyway it needs no new underwater step. Treated seed alone is enough to move the project forward, so the underwater methods below would add reach rather than being required. The second way is to release the treatment into the water or sediment around established plants. The third is to deliver it in pellets — a technology being developed by Chris Oakes (former CEO and founder of ReefGen) that would protect the treatment and release it in a controlled way at the seafloor, which helps with the dilution and wash-away problems of loose release. The second and third methods would let us treat existing meadows, not just newly planted seed; they are more valuable but less proven, and are the ones to test next.
The underlying method is known to work in a land plant: spraying short RNA molecules onto tobacco leaves added methylation to a target gene and switched it off (Walz et al., "Induction of promoter methylation and transcriptional gene silencing upon high-pressure spraying of 24-nt small RNAs in Nicotiana benthamiana," bioRxiv 2022, doi:10.1101/2022.03.14.484340). So the biology is sound; what remains is adapting the delivery for use underwater, which is the practical problem to work through with Decibel and the pelletization partner. A fuller write-up of Decibel's public technology is in decibel_bio_research_brief.md (Downloads).
To breed or select for a trait like heat tolerance, we need starting plants that already vary in that trait. This section maps where each project species lives, the environmental conditions each population experiences, and which populations are practically collectable — with emphasis on temperature and on sites shippable to the Bay-Area sequencing location. Data: 65,932 georeferenced records from OBIS (marine, with per-record sea-surface temperature, salinity and depth) and GBIF (freshwater).
Among the six project species, Zostera marina occupies by far the widest thermal niche — a 24 °C span from 5 °C in Arctic Norway, Alaska and the White Sea to 29 °C at its warm range edge — and the widest salinity range, from full marine to the near-fresh Baltic. Warm- and cold-adapted ecotypes of the same genome are therefore available to collect and contrast, which is exactly the raw material an epigenetic temperature program requires. The strictly warm species (Thalassia median 26 °C, Halodule 25 °C) and the thermally narrow Mediterranean endemic Posidonia (15–22 °C) offer much less internal contrast.
Each occurrence carries its in-situ SST, salinity and depth, so every collected population arrives with an environmental label already attached — the substrate for associating methylation marks with thermal origin (cf. Jueterbock et al. 2020; Entrambasaguas et al. 2021, §3).
The Northeast Pacific is the most practical first collection axis: a single coastline carrying a strong thermal gradient, reachable from the Bay Area by cold-chain courier. Occurrences of Zostera marina were binned by latitude and ranked by distance from Berkeley.
| Role | Candidate zone | Median SST | Nearest distance from Berkeley |
|---|---|---|---|
| Local reference | SF Bay / Monterey | 11.5 °C | ~5 km |
| Warm-edge | S California (Channel Is. / San Diego) | 17–17.5 °C | ~590–730 km |
| Warm-edge (extreme) | N Baja California | 25 °C | ~1,770 km |
| Cold-edge | BC / N Washington | 9.8–10.9 °C | ~1,100–1,600 km |
| Cold-edge (extreme) | SE Alaska | 8.5 °C | ~2,100 km |
A minimal design takes three points on one coast — San Diego/Channel Islands (warm), SF Bay (local reference), and BC/Puget Sound or SE Alaska (cold) — to bracket a ~7–9 °C thermal contrast within a single genetic lineage and a single shipping corridor. That is enough to build the first phenotype-linked methylome contrast the trait program needs; the Kuroshio Current (S Japan, median 19 °C) and Mediterranean populations extend the warm end further if a wider gradient is wanted later.
The eelgrass population-genetics literature sets the resolution at which provenance must be tracked. Zostera marina produces dormant, negatively buoyant seeds that disperse only a few metres, so meadows differentiate genetically on a scale below 100 km, and local adaptation between meadows separated by as little as 10 km can determine transplant survival (DuBois et al. 2022). Genome-wide association work in adjacent bays has resolved specific loci underlying eelgrass thermal tolerance (Schiebelhut et al. 2023), so a heritable, mappable warm/cold signal already exists in this species. Collecting the warm and cold ecotypes above with both genotype and methylome in hand lets the plastic (methylation) and heritable (sequence) components of thermal response be separated in the same material — a stronger design than either layer alone, and one that stays within locally-appropriate genetic backgrounds rather than moving fixed alleles across the range.
Vallisneria americana, the project's one freshwater species, lives in interior lakes and rivers of North America and is included as a heat-tolerant aquatic-plant comparator (Chesapeake ecotypes tolerate 33–36 °C); it is treated separately from the marine collection plan. Full distribution maps, per-species ecosystem rankings, and data caveats (survey bias, undersampling of the immediate California coast) are in the biogeography report under Downloads.
The 20 Zostera marina methylomes analyzed throughout this document come from a structured collection: five collection groups of four samples each (sample codes 1.1–5.4, where the code is group.replicate). The group axis is collection site/population; the four replicates capture within-site variation. The full sample-to-group key is provided as a downloadable resource — sample_metadata.xlsx — so any per-sample result in the sections below (methylation landscape, genetic structure, clustering) can be traced back to its collection group.
Splitting the methylation by genomic feature — promoters, exons, introns, gene bodies, intergenic regions, and repeats — shows which parts of the genome are methylated and, in particular, where RdDM is most active.
Using the v3.1 gene models and a repeat annotation, coverage-≥5 sites in all 20 samples were assigned to promoters (2 kb upstream of the TSS), exons, introns, gene bodies, intergenic regions, and repeats/TEs, and weighted methylation was computed per feature × context.
| Feature | CG | CHG | CHH |
|---|---|---|---|
| Promoter (2 kb up) | 43.9% | 18.7% | 3.3% |
| Exon | 31.3% | 5.2% | 0.3% |
| Intron | 74.7% | 18.1% | 0.6% |
| Gene body | 49.9% | 12.0% | 0.5% |
| Intergenic | 80.6% | 35.7% | 2.2% |
| Repeat / TE | 90.1% | 40.2% | 2.2% |
Two features of the distribution are relevant here. Repeats/TEs are hypermethylated in all three contexts (CG 90%, CHG 40%, CHH 2.2%), the expected transposon-silencing signature and a check that the methylation calls behave as they should. Promoters carry the highest CHH methylation of any genic feature (3.3%), above intergenic and repeat regions. Because CHH methylation is maintained by ongoing RdDM, promoters are where the pathway concentrates its activity, and are the region an RNA-directed strategy would engage to modulate a target gene without altering its coding sequence.
A de novo transposable-element annotation of the v3.1 assembly (RepeatModeler2 built a library of 1,463 families; RepeatMasker applied it) resolves the "repeat/TE" category above into its component classes. 68.9% of the eelgrass genome is repetitive — and the composition is lopsided in a way that matters directly for RdDM.
LTR retrotransposons alone occupy 45.5% of the genome — Gypsy/DIRS1 at 31.0%, Ty1/Copia at 11.6%, and a further 2.9% LTR-classified but not assigned to either superfamily — making them by far the largest sequence class in the eelgrass genome. DNA transposons contribute 5.0%, LINEs 3.6%, and Helitrons a negligible 0.03%; a further 13.2% is repetitive but unclassifiable against known families. This is precisely the substrate RdDM evolved to silence: the CHH/CHG hypermethylation of the "repeat/TE" row in the table above is, in the main, methylation of these LTR elements. An assembly this LTR-heavy is a genome under sustained transposon pressure, and a large, intact RdDM inventory (§2) is what keeps that pressure contained. It also delimits where an RNA-directed intervention has natural traction — the pathway is already engaged across nearly half the genome.
The masked fraction (68.9%) is slightly higher than the 66.5% softmasked repeat annotation used for the feature partition above, because the de novo library recovers eelgrass-specific families a homology-only mask misses; the feature-level methylation percentages are unaffected (they were computed against the softmasked BED, and the two annotations agree on which regions are repeat-dense). A parallel EDTA annotation is running as an independent cross-check and will be noted here if it materially revises these class fractions.
Genome-wide DNA methylation across the 20 samples, by sequence context. Weighted methylation = (modified reads) / (modified + canonical reads), pooled over all sites with coverage ≥5.
Zostera marina shows the canonical plant methylation hierarchy: high symmetric CG methylation, moderate CHG, and low but non-zero asymmetric CHH. The three contexts are strikingly reproducible across all 20 samples — CHH in particular varies by less than a tenth of a percent between individuals.
The per-site distribution is bimodal: most CG sites are either near-fully methylated or near-fully unmethylated, while CHH sites are overwhelmingly low-methylated with a small hypermethylated tail — the fraction of the genome under active RdDM/CMT2 control.
Methylation is not uniform across the six chromosomes. Chr02 and Chr04 are consistently hypermethylated in every context and in every one of the 20 samples, consistent with higher transposable-element / repeat density on those chromosomes.
| Context | Genome-wide mean | Chr02 (high) | Chr04 (high) | Other chromosomes |
|---|---|---|---|---|
| CG | 70.5% | 81.2% | 78.8% | ~66–68% |
| CHG | 27.0% | 33.3% | 29.7% | ~24–26% |
| CHH | 1.44% | 1.74% | 1.71% | ~1.3–1.4% |
Seagrasses reproduce both sexually and clonally, so before interpreting any epigenetic differences we established the genetic relationships among the 20 samples from their SNPs.
Per-sample SNP calls were merged into a union of biallelic sites on the six chromosomes (3,983,427 SNPs; 3,741,213 segregating). Pairwise identity-by-state (IBS) distances were computed across all 20 individuals.
All 20 samples are genetically distinct — the minimum pairwise distance (0.124) is far from zero, so there are no clones in this collection. Samples S7 and S11 have the lowest variant yields (1.42M and 1.46M vs. a ~2.1M median), consistent with the note that some samples are more degraded; they appear as apparent outliers in distance space for technical, not biological, reasons.
Some of the methylation differences between samples are inherited with the genome, and some are environmental responses. Separating the two matters for target choice, because only the environmental part is a useful handle — so we tested how closely methylation similarity tracks genetic similarity.
A per-sample methylation feature matrix (per-chromosome × context weighted methylation plus genome-wide context means) was clustered and compared against the genetic distance matrix.
A Mantel test shows methylation distance and genetic distance are significantly correlated (r = 0.77, p = 0.0001; and still r = 0.41 after excluding the low-yield S7/S11). Much of the between-individual methylation variation is therefore genotype-associated. For an RdDM engineering program this is the central practical result: to find facultative, environmentally responsive marks (the useful targets), one must first control for the substantial genotype-linked component measured here.
For the improved plants to have an effect, they have to be planted at scale into a use that is wanted and paid for. Seagrass planting has now been mechanized, and the treated seed this project would produce fits the machines that already exist. The work also sits inside an active, funded U.S. federal restoration agenda. This section covers the planting hardware, the restoration market, the carbon value, and the federal funding.
The delivery routes in §3 end with seed reaching the seafloor, and there is now hardware that plants it. ReefGen — the restoration-robotics company founded by our delivery partner Chris Oakes — builds seafloor robots that inject seagrass seed and shoots directly into sediment, a mechanized version of the manual seed-injection methods used in the field. Its eelgrass planter (nicknamed Grasshopper) weighs about 23 kg and can hold up to 20,000 seeds in a 20-litre bag, planting up to roughly 60 seeds per minute — on the order of ten times a diver's rate, with little disturbance to the surrounding sediment. In its first pilot in 2022 the machine-planted plots grew eelgrass of the same quality and density as hand-planted controls, which established that mechanized planting is feasible at speed. The scale gap this addresses is large: the largest manual seagrass restoration (Chesapeake Bay) plants under 55 acres a year, and the 2024 federal report projects that mechanized planting can increase the area covered roughly a hundredfold. (Robot specifications from ReefGen and CNN reporting, 2024; scale figures from the 2024 USDA/NOAA report.)
The robot plants seed into sediment, and the seed can be delivered mixed into a matrix — the same kind of carrier that can hold an RdDM seed pre-treatment (Route A, §3). The environmentally-matched, treated seed this project would produce therefore fits an existing, already-deployed planting system, and reaching established meadows or new sites does not require building new hardware.
Replanting is one use of an improved plant; protecting a meadow that is still there is the other, and for climate it is the more valuable of the two. An intact meadow already holds decades to centuries of carbon in its sediment, and if it dies that carbon is released and the die-off stimulates methane production (§ Blue carbon, below). Preventing a loss therefore avoids an emission, on top of preserving the fisheries, water-quality and storm-buffering services the meadow provides — whereas replanting a meadow that has already been lost starts the carbon clock over from zero.
Existing meadows can be given adaptive traits in place. A vulnerable meadow — one sitting at the warm edge of its thermal range, or under recurring heat stress — can be interplanted with RdDM-treated, heat-resilient seed or shoots of the same locally-sourced genotype, raising the resilience of the stand without removing or replacing the plants already growing there. Because RdDM changes methylation state rather than inserting genes and uses local genetic backgrounds (§3), this stays within the same non-transgenic, native-range constraints as new-site planting. In practice the program has two deployment modes on the same delivery hardware: replant cleared or degraded sites with treated seed, and reinforce at-risk standing meadows in place so they survive the heatwaves that would otherwise convert them from a carbon sink into a carbon and methane source.
Seagrass restoration is a growing activity, and several programs have set specific acreage goals that create demand for planting stock:
Washington State has adopted a kelp and eelgrass restoration target for Puget Sound (reported at 10,000 acres), and the seafloor planting robots described above have been deployed there in support of it (press reporting; exact statutory figures should be confirmed against the state program before citation).
The National Park Service is funding eelgrass restoration across five National Seashores (North Carolina to Massachusetts), with the stated aim of moving heat-tolerant seed north by assisted gene flow — the same approach the collection design in §4 follows.
The U.S. Army Corps of Engineers targets 70% beneficial reuse of dredged sediment by 2030 (up from 30–40% historically — the "70/30 goal"), and seagrass establishment is a qualifying use — creating standing demand for stock suited to placed sediment.
The demand exists because seagrass is being lost. Global seagrass has declined about 29% since records began, losing roughly 110 km² per year since 1980, and marine heatwaves are now the leading emerging threat. Each of the common reasons a restoration fails and has to be replanted is one of the trait targets in §3 (see the failure-mode table there).
Seagrass meadows store carbon more efficiently per unit area than many temperate forests. Two figures, both calculated with region- and species-specific (IPCC Tier-2) methods, give a sense of the scale:
Two caveats go with these numbers. A valuation of restored meadows on Virginia's Eastern Shore found that carbon removal was less than half of the total ecosystem-service value; the rest came from fisheries, water quality, storm buffering and biodiversity. The case for seagrass is therefore stronger when it rests on the full set of services rather than carbon alone. The second caveat is that when seagrass dies it releases its stored carbon and stimulates methane production (methane traps roughly 30× more heat than CO₂). A meadow that is restored but then fails can reverse its own carbon gain, so a durable, heat-resilient meadow — the goal of this program — is worth considerably more for climate than one that does not persist.
Federal money is already committed to this area. R&D funding into farmed seagrass and seaweed since 2014 exceeds US$325 million across seven agencies (in millions of dollars: Dept. of Commerce ≈121.5, USDA ≈90.4, NSF ≈41, Dept. of the Interior ≈35, DOE ≈23, DHHS ≈14, EPA ≈5). Three points connect that funding to the science in this document:
Full extraction of the federal report is in federal_aquaculture_report_digest.md (Downloads).
Every figure, summary table, and report from these analyses. All files are in the accompanying data bundle; links below are relative to the downloads/ folder.
Full detail in methods_reproducibility.md (download above). Summary of pipeline and provenance.
20 samples, Oxford Nanopore 5mC calls (modkit bedMethyl, Decibel Bio). Reference: Zostera marina v3.1 (Ma et al. 2021, Phytozome) — 6 pseudo-chromosomes, 260.5 Mb, 21,483 genes. Sequence context (CG/CHG/CHH) is pre-annotated in the bedMethyl calls.
Weighted methylation per context and per feature at coverage ≥5. Feature intervals from v3.1 gene_exons GFF3 (promoters = 2 kb upstream of TSS by strand) and softmasked-repeat BED (66.5% of genome). Aggregation with awk + bedtools.
DIAMOND blastp (--very-sensitive, e<1e-5) of 20 reviewed Arabidopsis RdDM proteins vs. each seagrass proteome. Calls: Strong ≥45% id & ≥70% cov; Moderate ≥30% & ≥50%; Weak ≥25% & ≥30%.
Per-sample VCFs merged (bcftools), restricted to biallelic SNPs on Chr01–06. IBS distances and clustering via scikit-allel; methylation–genetics coupling by Mantel test. Note: merge fills absent genotypes as reference, so IBS is a relatedness proxy, not a formal population-genetic estimate.
Field collection and computational analysis credits for the work reported here.
Field collection of the seagrass samples was performed by Cameron Colby Thomson, Morgan Peterson, Chris Oakes, and Parker Bonnell.
Genome, methylome, and comparative analyses reported here were performed by Cameron Colby Thomson (Allied Strategy LLC).
Sequencing of the 20 Zostera marina genomes and methylomes (Oxford Nanopore, 5mC) and the RdDM methylation-writing platform are provided by Decibel Bio — Brandon Pfannenstiel, Travis Bayer, and Jack Colicchio.
Seagrass epigenetics literature underlying the interpretation above. Full annotated list in seagrass_rddm_literature_review.md (download above).
federal_aquaculture_report_digest.md, download above).