
A three-stage language-model alignment pipeline (SFT → DPO → GDPO) for target-conditioned nanobody design. Submit a target sequence — a linear peptide, intrinsically-disordered region, or whole soluble domain (e.g. the extracellular domain of a single-pass membrane protein); the model returns ten ranked candidates with developability scores, a multi-sequence alignment against the NBv1 scaffold, and an ESMFold structure prediction for the top-ranked design.
NBv1 row in the alignment grid below). Training used the synthetic VHH library of
Contreras et al. 2023,
which randomises only the three complementarity-determining-region loops of NBv1
(8 positions in CDR1, 4 in CDR2, 10 in CDR3, totalling 22 of 126 positions).
Each generation request therefore returns CDR designs against the submitted
target sequence; the framework is not redesigned. Sapiens "humanness" scores
consequently reflect the camelid origin of the scaffold rather than a
training deficiency. Downstream humanisation (e.g. via
AbNatiV)
is treated as a separate step.
Submit a target sequence (4–244 amino acids; accepts linear peptides, disordered regions, or whole soluble domains — the construct epitope + 30-AA linker + 126-AA binder must fit ESMFold's 400-AA cap). Each request returns ten candidates ranked by the GDPO composite reward (Σ wᵢ·rᵢ), a multi-sequence alignment versus NBv1, and the ESMFold structure of the highest-ranked candidate. Structures for the remaining nine candidates are computed on demand via the Compute structures for all 10 control below.
Two example targets, neither in the paper's evaluation cohort: a 31-AA peptide hormone (GLP-1) and a 200-AA soluble extracellular domain (CD38 ECD). Click either to pre-fill the form.
Built on ProtGPT2 (738 M parameters), each stage tightens the output distribution toward the developability frontier.
Supervised fine-tuning on 1.35 M nanobody–epitope pairs across 65 targets establishes the binding-compatible sequence prior.
Direct Preference Optimisation on 522,800 preference pairs ranked by composite developability (Tm + solubility + humanness).
Group Reward-Decoupled Policy Optimisation against six mechanistic rewards (FR2 hydrophobicity, hydrophobic patches, chemical-liability motifs, expression, VHH hallmark, scaffold integrity).
An explicit statement of where this release is and is not appropriate, prior to use.
Linear peptides, intrinsically-disordered regions, and soluble domains resembling the 65 training targets. The training set spans both short peptide windows (e.g. multi-pass receptor N-termini) and whole soluble extracellular domains of single-pass membrane proteins. On a shared 10-target GPCR cohort, Aiki-GeNano achieves the highest predicted Tm and the lowest isomerization severity among five contemporary VHH generators (nanoBERT, IgLM, NanoAbLLaMA, ProteinDPO, IgGM). Tm, solubility, and chemical-liability scores are predicted, not assayed.
Targets that diverge in length or composition from the 65-target cohort, very long domains approaching ESMFold's 400-AA cap, and targets dominated by transmembrane segments. Generated sequences cluster within ≈8 amino acids of the closest training sequence (BLAST 92.8–93.6 % identity). Sampling temperature is constrained to the range evaluated in the paper [0.7, 1.5]; lower temperatures concentrate probability on training-resembling outputs and are not exposed by this demo.
Generated sequences have not been wet-lab validated for binding, expression yield, or developability. Use as a candidate-ranking tool, not a clinical pipeline.
From the manuscript, verbatim. Preprint: bioRxiv 2026.04.28.721526 (submitted to mAbs).
▸ Glossary of terms used on this pageTherapeutic nanobodies must combine target binding with biophysical and chemical properties that determine manufacturability, stability, and clinical viability, collectively termed developability, yet most computational design pipelines still treat developability as a post-hoc filter rather than an integrated training objective.
We present Aiki-GeNano, a three-stage language-model alignment pipeline for epitope-conditioned nanobody generation that integrates multiple developability signals directly into training, using only sequence information and previously published predictors. Starting from ProtGPT2, we perform supervised fine-tuning on 1.35 million nanobody–epitope pairs across 65 target epitopes generated on an mRNA-display platform, apply Direct Preference Optimization on 522,800 pairs ranked by a composite of selectivity, predicted thermal stability, solubility, and humanness, and apply Group Reward-Decoupled Policy Optimization (GDPO) against six sequence-based rewards covering FR2 hydrophobicity, hydrophobic-patch coverage, chemical-liability motifs, Wilkinson–Harrison expression probability, VHH hallmark residues, and scaffold integrity.
Across 65 targets and relative to the supervised baseline, the combined pipeline increased predicted mean melting temperature by 6.6 °C, reduced deamidation and isomerization motif severity, decreased the occurrence of N-glycosylation sequons and CDR methionine-oxidation motifs, and preserved predicted humanness and solubility. Generated sequences differed from the nearest training sequence by a mean of 8.1–9.0 amino acids out of 126. Two alternative training trajectories produced distinct amino-acid-composition strategies with similar liability outcomes but different thermal-stability gains, indicating initialization-dependent convergence of the reward-optimized policy. On a shared 10-target GPCR benchmark, the pipeline achieved the highest predicted melting temperature and lowest isomerization severity among five contemporary VHH generators.
NBv1 in the alignment grid is this exact 126-AA reference.0.15·r1 + 0.20·r2 + 0.25·r3 + 0.15·r4 + 0.15·r5 + 0.10·r6, with the weights fixed in configs/gdpo/final_sft_gdpo_gated.yaml. Each rᵢ is in [0,1] (gated to 0 for invalid sequences), so the composite is also in [0,1].epitope + 30-AA G4S linker + binder is submitted as a single chain. The model runs locally on the same Modal A10G GPU that hosts the GDPO checkpoint; ESMFold Atlas is used as a fallback. The result is a low-resolution co-folded view, informative for the binder's intrinsic fold quality, CDR-loop geometry, and CDR-to-epitope proximity, within ESMFold's known accuracy limits (which are higher for single domains than for engineered fusions). The flexible 30-residue G4S linker adds further uncertainty to the predicted relative orientation of binder and epitope, so this view should be read as an assessment rather than as a definitive docked complex.Everything needed to verify the paper's numerical claims and to run the same pipeline locally on a GPU.
ghcr.io/aikium-public/aiki-genano:1.0.0CUDA 12.1 + PyTorch 2.2 + TRL-GDPO. Runs end-to-end with --gpus all.Aiki-GeNano would not exist without the models, datasets, and tools released by these teams. If you build on top of Aiki-GeNano, please cite them too.
Each upstream model and dataset retains its own licence. Users deploying Aiki-GeNano, its outputs, or derived sequences in their own workflows are responsible for complying with the respective upstream terms; follow the links above for each. The Aiki-GeNano source code is MIT-licensed; the figure-data Zenodo deposit is CC-BY-NC-4.0; the trained model checkpoints are proprietary to Aikium Inc. and available under NDA via partnerships@aikium.com.
This public demonstrator runs the GDPO (DPO-initialised) checkpoint at 10 requests per hour per IP. Aikium offers higher-throughput inference, target-specific design campaigns, evaluation-only deployments, and licensed access to the trained checkpoints under non-disclosure agreement. Inquiries are welcome.
Contact partnerships@aikium.com