Primary navigation
Codex

Codex use case

Discover protein folding architectures

Turn protein-folding hypotheses into benchmarked experiment loops.

Difficulty Advanced
Time horizon Long-running

Use Codex with Goal Mode to research and implement novel architectural modifications to AlphaFold2 for improved protein folding performance.

Best for

  • Computational biologists exploring architecture, loss, or curriculum changes against an automatically scorable benchmark.
  • Researchers who have a scientifically motivated hypothesis and want to compress the path from idea to working experimental fork.
  • ML engineers running long-lived autoresearch loops that require persistent experiment tracking and iterative debugging.

Contents

    ← All use cases

    Discover protein folding architectures

    Turn protein-folding hypotheses into benchmarked experiment loops.

    Use Codex with Goal Mode to research and implement novel architectural modifications to AlphaFold2 for improved protein folding performance.

    Advanced
    Long-running

    Use Codex with Goal Mode to research and implement novel architectural modifications to AlphaFold2 for improved protein folding performance.

    Advanced
    Long-running

    Best for

    • Computational biologists exploring architecture, loss, or curriculum changes against an automatically scorable benchmark.
    • Researchers who have a scientifically motivated hypothesis and want to compress the path from idea to working experimental fork.
    • ML engineers running long-lived autoresearch loops that require persistent experiment tracking and iterative debugging.

    Starter prompt

    Use Goal Mode to improve the validation lDDT-Cα score of this AlphaFold2-style protein-structure model on the NanoFold public benchmark. The scientific hypothesis is that persistent higher-order geometric states may help the model learn protein geometry more efficiently from limited data: - retain the standard MSA and pairwise representations; - add sparse learned 2-simplex face states for selected residue triplets; - add sparse learned 3-simplex tetrahedral states for selected residue quadruplets; - construct topology only from official benchmark inputs and model-generated recycled geometry; - keep the implementation computationally practical under NanoFold constraints. Maintain durable tracking files for: 1. The current strategy, status, and proposed next steps in PLAN.md 2. A structured log of experiments and results in EXPERIMENTS.md 3. A running scratchpad of notes and thoughts in EXPERIMENT_NOTES.md For each iteration: 1. state the hypothesis being tested; 2. make the smallest coherent code or configuration change; 3. run the relevant tests and benchmark slice; 4. record metrics, latency, memory, and failure modes; 5. decide whether to keep, revert, or refine the change; 6. periodically reassess the architecture-level search direction rather than only tuning local hyperparameters. Do not claim generalization gains from smoke tests or single-chain overfit diagnostics. Prefer matched comparisons and preserve the evidence boundary.
    Use Goal Mode to improve the validation lDDT-Cα score of this AlphaFold2-style protein-structure model on the NanoFold public benchmark. The scientific hypothesis is that persistent higher-order geometric states may help the model learn protein geometry more efficiently from limited data: - retain the standard MSA and pairwise representations; - add sparse learned 2-simplex face states for selected residue triplets; - add sparse learned 3-simplex tetrahedral states for selected residue quadruplets; - construct topology only from official benchmark inputs and model-generated recycled geometry; - keep the implementation computationally practical under NanoFold constraints. Maintain durable tracking files for: 1. The current strategy, status, and proposed next steps in PLAN.md 2. A structured log of experiments and results in EXPERIMENTS.md 3. A running scratchpad of notes and thoughts in EXPERIMENT_NOTES.md For each iteration: 1. state the hypothesis being tested; 2. make the smallest coherent code or configuration change; 3. run the relevant tests and benchmark slice; 4. record metrics, latency, memory, and failure modes; 5. decide whether to keep, revert, or refine the change; 6. periodically reassess the architecture-level search direction rather than only tuning local hyperparameters. Do not claim generalization gains from smoke tests or single-chain overfit diagnostics. Prefer matched comparisons and preserve the evidence boundary.

    Explore a protein-folding architecture hypothesis

    Use Codex Goal Mode when you have a protein-folding hypothesis that needs more than one implementation pass. Give Codex a bounded scientific direction, a working baseline, and an automatically scorable benchmark. Codex can implement the architecture fork, track experiments, diagnose failures, and continue iterating while you review the evidence.

    This example started with a specific question: could an AlphaFold2-style model learn useful protein geometry more efficiently if its trunk represented not only residues and residue pairs, but also explicit higher-order topological objects?

    Define a bounded experiment

    AlphaFold2 already uses powerful pairwise and triangle-style reasoning inside the Evoformer. Its triangle operations improve edge representations, but still write back into a pair tensor. The scientist proposed testing whether persistent learned representations for triangular faces and tetrahedral cells could provide a useful inductive bias in a data-limited setting.

    The resulting public repository, SimplexFold, adds sparse face states F_ijk and tetrahedral states U_ijkl alongside the conventional pair representation Z_ij.

    MSA representation M
            <-> pair / edge tensor Z_ij
            <-> sparse face tensor F_ijk
            <-> sparse tetra tensor U_ijkl
            -> structure module
            -> recycled geometry
            loops back into the next pass

    Start with the starter prompt on this page, a minimal AlphaFold2-style baseline, and the public NanoFold benchmark. The benchmark provides a small, curated fixed-data and automatically scorable substrate for structural-biology experimentation. Keep the first implementation small enough to test with targeted unit tests and microbenchmarks before launching expensive training runs.

    Run the search with Goal Mode

    1. Supply a falsifiable, high-level scientific hypothesis instead of asking the model to invent an entire research agenda from scratch.
    2. Use GPT-5.5 Pro in ChatGPT to convert that direction into an implementation plan with explicit constraints and ablations.
    3. Ask Codex to implement the smallest runnable SimplexFold baseline, then verify it with targeted unit tests and microbenchmarks.
    4. Give the resulting repository to Codex Goal Mode and instruct it to hill-climb validation lDDT-Cα on the NanoFold benchmark while preserving experiment logs, plans, and artifact references.
    5. Run Goal Mode continuously while it uses benchmark feedback to iterate on the architecture, training recipe, and experimental harness. In this example, the loop ran for more than 150 hours.

    Use PLAN.md for the current strategy and next steps, EXPERIMENTS.md for a structured log of results, and EXPERIMENT_NOTES.md for the running scratchpad. These artifacts make a long-running search auditable and give you a stable place to steer the next iteration.

    Goal Mode is useful here because the search requires repeated implementation, testing, experiment tracking, failure diagnosis, and benchmark-driven iteration. Unguided autoresearch often drifted toward familiar local changes such as losses, optimizers, and hyperparameters. A compact scientist-supplied architecture hypothesis gave Codex a more meaningful search space while still leaving room to test, diagnose, and refine the implementation.

    This workflow is also useful for teams evaluating how scientist-in-the-loop steering changes the quality of agentic scientific search.

    Example result

    The result of this workflow was SimplexFold, an experimental architecture with explicit higher-order simplex states. Review the topology alongside the benchmark logs to confirm that each iteration still tests the original scientific idea.

    A comparison of 1-, 2-, and 3-simplex protein geometry.

    The useful lesson is not that Codex autonomously solved protein folding. The workflow shows how Goal Mode can act as a persistent scientific engineering loop: a scientist contributes the conceptual move, and Codex compresses the implementation, experimentation, debugging, and follow-up search cycle.

    Treat promising diagnostics as evidence that the implementation path works, not as proof of generalization. Review the agent’s trajectory periodically, steer it back toward scientifically meaningful architecture questions if it collapses into local hyperparameter tuning, and promote claims only after matched public-validation comparisons and appropriate replicates.

    Resources

    Related use cases