Primary navigation

Legacy APIs

Citation Formatting

Allow models to generate reliable citations.

Reliable citations build trust and help readers verify the accuracy of responses. This guide provides practical guidance on how to prepare citable material and instruct the model to format citations effectively, using patterns that are familiar to OpenAI models.

Overview

A citation system has many parts: you decide what can be cited, represent that material clearly, instruct the model how to cite it, and validate the result before it renders to the user.

This guide covers five core elements experienced directly by the model:

  1. Citable units: Define what the model is allowed to cite.
  2. Material representation: Present the source material in a clear, structured format.
  3. Citation format: Specify the exact format the model should use for citations.
  4. Prompt instructions: Tell the model when to cite and how to do it correctly.
  5. Citation parsing: Extract the citations from the model’s response for downstream use.

Choose citable units

Before writing prompts, clearly define what the model can cite. Common options include:

Citable unitBest used forDownsideExample
DocumentYou only need to show which document the answer came from.Not very precise.Cite the entire employee handbook when you only need to show which document supports the claim.
Block / chunkYou want a good balance between simplicity and precision.Still not exact down to the line.Cite the specific contract paragraph or retrieved chunk that contains the clause.
Line rangeYou need to show the exact supporting text.More difficult for the model.Cite lines L42-L47 when the user needs to verify the precise passage.

A good citable unit should be:

  • Consistent: the same source should keep the same ID across runs.
  • Easy to inspect: a person should be able to read it and understand the surrounding context.
  • The right size: large enough to make sense, but small enough to stay precise.

For most systems, block-level citations are the best default. They are usually easier for the model than line-level citations and more useful to users than document-level citations.

Represent citable material

The model cannot cite material that has not been presented clearly. Whether material comes from a tool or is injected directly, ensure it has:

  • Stable Source ID: Consistent identifier like file1 or block1.
  • Readable Text: Clearly formatted source material.
  • Metadata (optional): URLs, timestamps, titles, and similar context.

Source IDs vs. locators: A source ID is a stable, model-generated identifier such as block1. A locator is the precise UI-rendered highlight, such as lines L8-L13 or Paragraph 21. In general, the model should emit the source ID, while your system resolves or renders the locator. Mixing the two too early tends to increase formatting errors.

Define citation format

You need to define the citation format that the model will generate. Use a format that is explicit, consistent, and easy for the model to reproduce reliably.

Below is our recommended citation format and the markers we recommend. These citation markers are highly recommended because they closely match the markers our models are trained on. If you choose different marker values, keep the overall citation format as similar as possible.

PieceWhat it doesRecommended
CITATION_STARTOpens the citation marker.\ue200
Citation familyIdentifies the citation type. Use cite for all supported sources.cite
CITATION_DELIMITERSeparates fields inside the marker.\ue202
Source IDIdentifies the cited unit. turn# is the turn number. item# is the specific file, block, or URL.turn0file1, turn0block1, turn0url1
Locator (optional)Narrows the citation to a precise span.L8-L13
CITATION_STOPCloses the citation marker.\ue201

For tool calls, turnN increments once per tool invocation, not once per individual result. Within a single invocation, sources are distinguished by suffixes such as file0, file1, and so on. In a single-response system, all references will be turn0… only if the model makes exactly one tool call before answering. If it makes multiple tool calls, you may instead see references like turn0fileX, turn1fileX, and so on.

Template

{CITATION_START}<citation_family>{CITATION_DELIMITER}<source_id>{CITATION_DELIMITER}<locator>{CITATION_STOP}

Example

{CITATION_START}cite{CITATION_DELIMITER}turn0file1{CITATION_DELIMITER}L8-L13{CITATION_STOP}

If your system does not use locators, omit that field:

{CITATION_START}cite{CITATION_DELIMITER}turn0file1{CITATION_STOP}

Write effective citation instructions

To maintain maximum accuracy, use familiar citation patterns. Custom or unfamiliar formats increase cognitive load on the model, leading to citation errors, especially in:

  • low reasoning effort, where the model has less budget to recover from formatting mistakes.
  • high-complexity tasks, where most of the reasoning budget is spent on solving the task itself rather than cleaning up citation syntax.

Below, we recommend a citation format that is close to patterns the model is familiar with. You can use it as-is or adapt it to fit your own system.

If you want to define your own prompt, define:

  • the exact marker syntax.
  • where citations go.
  • when to cite and when not to cite.
  • how to cite multiple supports.
  • what formats are forbidden.
  • what to do when support is missing.

Parse citations

Once the model emits citations, you need to extract them from the response text so you can resolve source IDs, render links, or remove the raw markers before showing the answer to users.

The helper below is designed to be copied directly into your application. It parses single-source citations, multi-source citations, and optional line-range locators while preserving character offsets in the original text.

This example supports line locators only and should be adapted if your system uses a different locator format.

If your source IDs use a different shape, update SOURCE_ID_RE to match your system.

Examples

The examples below show two common citation patterns:

  • Retrieved tool context, where your tool returns citable material and IDs.
  • Injected context, where you provide citable blocks directly in the prompt.

Format citations for retrieved tool context

Use this pattern when the model retrieves context through a tool and cites that retrieved context in its answer.

Define citable units

You should choose the citable units based on the precision required for your use case. The examples below show a few possible tool outputs.

The examples below show a few recommended tool output formats. The underlying tool may vary by application, but what matters most is that the output is presented in a clear, stable structure like these examples.

Write prompt instructions

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
## Citations

Results are returned by "tool_1". Each message from `tool_1` is called a "source" and identified by its reference ID, which is the first occurrence of `turn\\d+file\\d+` (for example, `turn0file0` or `turn2file1`). In this example, the string `turn0file0` would be the source reference ID.

Citations are references to `tool_1` sources. Citations may be used to refer to either a single source or multiple sources.

A citation to a single source must be written as:
{CITATION_START}cite{CITATION_DELIMITER}turn\d+file\d+{CITATION_STOP}

If line-level citations are supported, a citation to a specific line range must be written as:
{CITATION_START}cite{CITATION_DELIMITER}turn\d+file\d+{CITATION_DELIMITER}L\d+-L\d+{CITATION_STOP}

Citations to multiple sources must be written by emitting multiple citation markers, one for each supporting source.

You must NOT write reference IDs like `turn0file0` verbatim in the response text without putting them between {CITATION_START}...{CITATION_STOP}.

- Place citations at the end of the supported sentence, or inline if the sentence is long and contains multiple supported clauses.
- Citations must be placed after punctuation.
- Cite only retrieved sources that directly support the cited text.
- Never invent source IDs, line ranges, or block locators that were not returned by the tool.
- If multiple retrieved sources materially support a proposition, cite all of them.
- If the retrieved sources disagree, cite the conflicting sources and describe the disagreement accurately.

Example output:

The on-call handoff process is documented in the weekly support sync notes. \ue200cite\ue202turn0file0\ue202L8-L13\ue201

Format citations for injected context

Use this pattern when you retrieve or prepare the context ahead of time and inject it directly into the prompt.

Define citable units

For injected context, a common pattern is to wrap source segments in explicit tags with stable reference IDs.

1
2
3
4
5
6
7
8
9
10
<BLOCK id="block1">
The service agreement states that termination for convenience requires thirty (30) days’ written notice, unless superseded by a customer-specific addendum.
In practice, renewal terms auto-extend for successive one-year periods when no written non-renewal notice is received before the deadline.
Appendix B further clarifies that pricing exceptions must be approved in writing by both Finance and the account owner.
</BLOCK>

<BLOCK id="block2">
Syllabus
</BLOCK>
...

This makes the citable unit explicit and easy for the model to reference.

Write prompt instructions

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
## Citations

Supporting context is provided directly in the prompt as citable units. Each citable unit is identified by the value of its `id` attribute in the first occurrence of a tag such as `<BLOCK id="block5"> ... </BLOCK>`. In this example, `block5` would be the source reference ID.

Because this pattern does not invoke tools, there is no tool turn counter to increment. That means you do not need to use a `turn#` prefix for the citation marker. You can keep IDs in a `turn0block5` style if that matches the rest of your system, or use plain IDs like `block5` as shown here. The key requirement is that the citation marker matches the injected context ID exactly and consistently.

Citations are references to these provided citable units. Citations may be used to refer to either a single source or multiple sources.

A citation to a single source must be written as:
{CITATION_START}cite{CITATION_DELIMITER}<block_id>{CITATION_STOP}

For example:
{CITATION_START}cite{CITATION_DELIMITER}block5{CITATION_STOP}

Citations to multiple sources must be written by emitting multiple citation markers, one for each supporting block.

You must NOT write block IDs verbatim in the response text without putting them between {CITATION_START}...{CITATION_STOP}.

- Place citations at the end of the supported sentence, or inline if the sentence is long and contains multiple supported clauses.
- Citations must be placed after punctuation.
- Cite only blocks that appear in the provided context.
- Never invent new block IDs.
- Never cite outside knowledge or outside authorities.
- If multiple blocks materially support a proposition, cite all of them.
- If the provided blocks conflict, cite the conflicting blocks and describe the conflict accurately.

Example output:

The Court held that the District Court lacked personal jurisdiction over the petitioner. \ue200cite\ue202block5\ue201

Note: OpenAI-hosted tools such as web search provide automatic inline citations. If you want to use hosted tools instead, see the tools overview, web search guide, and file search guide.