Aikium Inc. · mAbs (submitted, 2026)

Aiki-GeNano

A three-stage language-model alignment pipeline (SFT → DPO → GDPO) for target-conditioned nanobody design. Submit a target sequence — a linear peptide, intrinsically-disordered region, or whole soluble domain (e.g. the extracellular domain of a single-pass membrane protein); the model returns ten ranked candidates with developability scores, a multi-sequence alignment against the NBv1 scaffold, and an ESMFold structure prediction for the top-ranked design.

+6.6 °C mean Tm vs SFT 65 training targets 6 GDPO rewards 738 M ProtGPT2 126 AA NBv1 backbone 1st on Tm + isomerization vs 5 VHH generators

⚙

All generated sequences share a single fixed 126-residue scaffold: the camelid VHH backbone cAbBCII10 introduced by Conrath et al. 2001 (referred to throughout as NBv1; appears as the NBv1 row in the alignment grid below). Training used the synthetic VHH library of Contreras et al. 2023, which randomises only the three complementarity-determining-region loops of NBv1 (8 positions in CDR1, 4 in CDR2, 10 in CDR3, totalling 22 of 126 positions). Each generation request therefore returns CDR designs against the submitted target sequence; the framework is not redesigned. Sapiens "humanness" scores consequently reflect the camelid origin of the scaffold rather than a training deficiency. Downstream humanisation (e.g. via AbNatiV) is treated as a separate step.

Live demo

Generate nanobody candidates for your epitope

Submit a target sequence (4–244 amino acids; accepts linear peptides, disordered regions, or whole soluble domains — the construct epitope + 30-AA linker + 126-AA binder must fit ESMFold's 400-AA cap). Each request returns ten candidates ranked by the GDPO composite reward (Σ wᵢ·rᵢ), a multi-sequence alignment versus NBv1, and the ESMFold structure of the highest-ranked candidate. Structures for the remaining nine candidates are computed on demand via the Compute structures for all 10 control below.

Two example targets, neither in the paper's evaluation cohort: a 31-AA peptide hormone (GLP-1) and a 200-AA soluble extracellular domain (CD38 ECD). Click either to pre-fill the form.

Target

Candidates Model Temp

The six GDPO reward signals (each in [0, 1]; ↑ better):

r1 FR2 hydropathy — Kyte–Doolittle aggregation risk over framework-2 (positions 36–53). Weight 0.15.

r2 Hydrophobic-patch — penalises consecutive hydrophobic stretches that drive aggregation. Weight 0.20.

r3 Chemical liability — deamidation (NG/NS/NT/NN), isomerization (DG/DS/DT/DH), N-glycosylation, CDR-Met oxidation. Weight 0.25.

r4 Expression — Wilkinson–Harrison E. coli soluble-expression probability. Weight 0.15.

r5 VHH hallmark — Kabat-position FR2 tetrad characteristic of camelid heavy-chain antibodies (Muyldermans 2013). Weight 0.15.

r6 Scaffold integrity — length + cysteine count + C-terminal linker; the only ungated reward, defines validity. Weight 0.10.

Composite = 0.15·r1 + 0.20·r2 + 0.25·r3 + 0.15·r4 + 0.15·r5 + 0.10·r6 (weights from configs/gdpo/final_sft_gdpo_gated.yaml).

rank	Sequence	Valid	composite ↑	r1 FR2↑	r2 hyd-patch↑	r3 liability↑	r4 expr↑	r5 hallmark↑	r6 scaffold↑	muts vs NBv1

Multi-sequence alignment of the 10 candidates

identical to NBv1 1 mutation 2–3 mutations 4+ mutations CDR1 (26–35) CDR2 (54–60) CDR3 (100–110)

At-a-glance scorecards

Candidates are sorted by composite reward, the exact weighted objective optimised during the GDPO stage (Σ wᵢ·rᵢ, with weights 0.15, 0.20, 0.25, 0.15, 0.15, 0.10). Candidate #1 has the highest composite in the batch. Click any card to load its structure into the main viewer above; once the deep-dive grid is rendered, its panels are also clickable.

3D structure of the top-ranked candidate

ESMFold v1 prediction of the concatenated epitope + 30-AA G4S linker + binder construct. Folding runs on the same Modal A10G GPU as the GDPO model (~30–60 s on cold start, ~2 s warm); ESMFold Atlas is used as fallback if the local container is unavailable. The epitope is rendered in muted grey, the linker as a faint line, and the binder coloured by pLDDT (muted, gold-free palette). Every position differing from NBv1 is highlighted with a gold cartoon overlay, side-chain sticks, and a small gold sphere on the Cα atom. The viewer auto-loads the highest-composite candidate; any candidate can be loaded by selecting its row in the table, its card above, or its panel in the deep-dive grid below.

Loading top candidate… epitope linker pLDDT ≥ 90 ≥ 70 ≥ 50 < 50 mutation (cartoon + CA + side-chain stick)

Note: ESMFold predicts a single chain rather than performing protein–protein docking. Folding the concatenated epitope + 30-AA G4S linker + binder construct yields an informative low-resolution view of the co-folded complex. The binder fold, CDR-loop geometry, and CDR-to-epitope proximity can be assessed within ESMFold's known accuracy limits. The predicted relative orientation of binder and epitope across the flexible linker carries additional uncertainty and should not be over-interpreted as a fixed bound geometry.

Methodology

Three stages of language-model alignment

Built on ProtGPT2 (738 M parameters), each stage tightens the output distribution toward the developability frontier.

SFT

Supervised fine-tuning on 1.35 M nanobody–epitope pairs across 65 targets establishes the binding-compatible sequence prior.

DPO

Direct Preference Optimisation on 522,800 preference pairs ranked by composite developability (Tm + solubility + humanness).

GDPO

Group Reward-Decoupled Policy Optimisation against six mechanistic rewards (FR2 hydrophobicity, hydrophobic patches, chemical-liability motifs, expression, VHH hallmark, scaffold integrity).

Where it works

Known limits of this release

An explicit statement of where this release is and is not appropriate, prior to use.

Works well

Linear peptides, intrinsically-disordered regions, and soluble domains resembling the 65 training targets. The training set spans both short peptide windows (e.g. multi-pass receptor N-termini) and whole soluble extracellular domains of single-pass membrane proteins. On a shared 10-target GPCR cohort, Aiki-GeNano achieves the highest predicted Tm and the lowest isomerization severity among five contemporary VHH generators (nanoBERT, IgLM, NanoAbLLaMA, ProteinDPO, IgGM). Tm, solubility, and chemical-liability scores are predicted, not assayed.

Use with care

Targets that diverge in length or composition from the 65-target cohort, very long domains approaching ESMFold's 400-AA cap, and targets dominated by transmembrane segments. Generated sequences cluster within ≈8 amino acids of the closest training sequence (BLAST 92.8–93.6 % identity). Sampling temperature is constrained to the range evaluated in the paper [0.7, 1.5]; lower temperatures concentrate probability on training-resembling outputs and are not exposed by this demo.

Will not work

Generated sequences have not been wet-lab validated for binding, expression yield, or developability. Use as a candidate-ranking tool, not a clinical pipeline.

From the paper

Abstract

From the manuscript, verbatim. Preprint: bioRxiv 2026.04.28.721526 (submitted to mAbs).

Therapeutic nanobodies must combine target binding with biophysical and chemical properties that determine manufacturability, stability, and clinical viability, collectively termed developability, yet most computational design pipelines still treat developability as a post-hoc filter rather than an integrated training objective.

We present Aiki-GeNano, a three-stage language-model alignment pipeline for epitope-conditioned nanobody generation that integrates multiple developability signals directly into training, using only sequence information and previously published predictors. Starting from ProtGPT2, we perform supervised fine-tuning on 1.35 million nanobody–epitope pairs across 65 target epitopes generated on an mRNA-display platform, apply Direct Preference Optimization on 522,800 pairs ranked by a composite of selectivity, predicted thermal stability, solubility, and humanness, and apply Group Reward-Decoupled Policy Optimization (GDPO) against six sequence-based rewards covering FR2 hydrophobicity, hydrophobic-patch coverage, chemical-liability motifs, Wilkinson–Harrison expression probability, VHH hallmark residues, and scaffold integrity.

Across 65 targets and relative to the supervised baseline, the combined pipeline increased predicted mean melting temperature by 6.6 °C, reduced deamidation and isomerization motif severity, decreased the occurrence of N-glycosylation sequons and CDR methionine-oxidation motifs, and preserved predicted humanness and solubility. Generated sequences differed from the nearest training sequence by a mean of 8.1–9.0 amino acids out of 126. Two alternative training trajectories produced distinct amino-acid-composition strategies with similar liability outcomes but different thermal-stability gains, indicating initialization-dependent convergence of the reward-optimized policy. On a shared 10-target GPCR benchmark, the pipeline achieved the highest predicted melting temperature and lowest isomerization severity among five contemporary VHH generators.

▸ Glossary of terms used on this page

VHH / Nanobody: The antigen-binding variable domain of a heavy-chain-only antibody, found naturally in camelids (llamas, alpacas) and sharks. ~13 kDa, single-domain, three CDR loops, no light chain. Used as a small, stable binding scaffold.
NBv1 (cAbBCII10): The 126-residue camelid VHH backbone of Conrath et al. 2001. The fixed scaffold this model varies CDRs within. The row labelled NBv1 in the alignment grid is this exact 126-AA reference.
CDR1 / CDR2 / CDR3: The three Complementarity-Determining Regions: the short loops at the tip of the antibody that contact the antigen. In NBv1 they occupy positions 26–35, 54–60, and 100–110 (8, 4, and 10 randomised residues in the Contreras library, respectively); everything else is framework.
Framework / FR2: The non-loop residues that hold the CDRs in position. FR2 is the second framework segment (positions 36–53 in NBv1). VHHs have a characteristic FR2 tetrad of hydrophilic residues that compensates for the absent light chain (the "VHH hallmark").
SFT / DPO / GDPO: SFT = supervised fine-tuning on epitope→nanobody pairs. DPO = Direct Preference Optimization on (preferred, dispreferred) pairs ranked by a developability composite. GDPO = NVIDIA's Group Reward-Decoupled Policy Optimization, an extension of GRPO that learns from per-attribute rewards rather than a single scalar.
Composite reward: The weighted sum the GDPO stage was trained to maximise: 0.15·r1 + 0.20·r2 + 0.25·r3 + 0.15·r4 + 0.15·r5 + 0.10·r6, with the weights fixed in configs/gdpo/final_sft_gdpo_gated.yaml. Each rᵢ is in [0,1] (gated to 0 for invalid sequences), so the composite is also in [0,1].
r1–r6: r1 FR2 aggregation (Kyte–Doolittle hydrophobicity over FR2). r2 Hydrophobic-patch coverage. r3 Chemical-liability motifs (deamidation, isomerization, N-glycosylation, oxidation, etc.). r4 Wilkinson–Harrison E. coli expression probability. r5 VHH hallmark tetrad (Muyldermans 2013). r6 Scaffold integrity, defined as length + cysteine count + linker; the only ungated reward.
pLDDT: Per-residue confidence score from ESMFold/AlphaFold-style models, 0–100. Values ≥ 90 = very high confidence in the local structure; 70–89 = confident; 50–69 = low; < 50 = very low (typically flexible/disordered regions or termini).
ESMFold + the linker caveat: ESMFold v1 predicts the structure of a single polypeptide chain from sequence alone. To visualise the binder in the context of the epitope, the concatenated construct epitope + 30-AA G4S linker + binder is submitted as a single chain. The model runs locally on the same Modal A10G GPU that hosts the GDPO checkpoint; ESMFold Atlas is used as a fallback. The result is a low-resolution co-folded view, informative for the binder's intrinsic fold quality, CDR-loop geometry, and CDR-to-epitope proximity, within ESMFold's known accuracy limits (which are higher for single domains than for engineered fusions). The flexible 30-residue G4S linker adds further uncertainty to the predicted relative orientation of binder and epitope, so this view should be read as an assessment rather than as a definitive docked complex.
NetSolP / TEMPRO / Sapiens / Biopython: The four external predictors used as ground-truth signals during training and as evaluation tools. TEMPRO = nanobody Tm. NetSolP-1.0 = solubility. Sapiens = humanness (this scaffold is camelid, so values reflect that; see the scaffold callout above). Biopython ProteinAnalysis = pI, instability, GRAVY, etc.

Reproducibility

Paper, code, data, and container image

Everything needed to verify the paper's numerical claims and to run the same pipeline locally on a GPU.

Paper

Meda et al., bioRxiv preprint (DOI 10.64898/2026.04.28.721526)Submitted to mAbs (2026); preprint posted 2026-04-28.

Source code

github.com/aikium-public/aiki-genanoMIT licence. Training, inference, evaluation, analysis notebooks.

Docker image

ghcr.io/aikium-public/aiki-genano:1.0.0CUDA 12.1 + PyTorch 2.2 + TRL-GDPO. Runs end-to-end with --gpus all.

Figure data + property tables

Zenodo · 10.5281/zenodo.19757842CC-BY-NC-4.0. All numerical figure data; per-sequence properties for the 10 disclosed targets (sequences stripped).

Acknowledgements

Foundation models, predictors, and infrastructure

Aiki-GeNano would not exist without the models, datasets, and tools released by these teams. If you build on top of Aiki-GeNano, please cite them too.

Foundation models

ProtGPT2 — Ferruz et al. (Apache 2.0)
ESM-2 650M — Lin et al. (MIT)
Sapiens — Prihoda et al. (MIT)

Predictors & structure

TEMPRO — Alvarez 2024 (MIT)
NetSolP-1.0 — Thumuluri 2022 (academic)
Therapeutic Nanobody Profiler — Gordon et al. 2026 (Oxford OPIG)
ESMFold v1 — Lin et al. (Meta FAIR / EvolutionaryScale); local on Modal A10G, Atlas fallback

Training framework

TRL + NVIDIA GDPO fork (Apache 2.0)
transformers, peft, accelerate (Apache 2.0)
Google for AI Startups Cloud Program — GCP credits partly enabled foundation-model embedding extraction and model development
3Dmol.js (BSD-3) · Modal (hosting)

Each upstream model and dataset retains its own licence. Users deploying Aiki-GeNano, its outputs, or derived sequences in their own workflows are responsible for complying with the respective upstream terms; follow the links above for each. The Aiki-GeNano source code is MIT-licensed; the figure-data Zenodo deposit is CC-BY-NC-4.0; the trained model checkpoints are proprietary to Aikium Inc. and available under NDA via partnerships@aikium.com.

Custom design campaigns and licensed deployments

This public demonstrator runs the GDPO (DPO-initialised) checkpoint at 10 requests per hour per IP. Aikium offers higher-throughput inference, target-specific design campaigns, evaluation-only deployments, and licensed access to the trained checkpoints under non-disclosure agreement. Inquiries are welcome.

Contact partnerships@aikium.com