Reliable citations build trust and help readers verify the accuracy of responses. This guide provides practical guidance on how to prepare citable material and instruct the model to format citations effectively, using patterns that are familiar to OpenAI models.
Overview
A citation system has many parts: you decide what can be cited, represent that material clearly, instruct the model how to cite it, and validate the result before it renders to the user.
This guide covers five core elements experienced directly by the model:
- Citable units: Define what the model is allowed to cite.
- Material representation: Present the source material in a clear, structured format.
- Citation format: Specify the exact format the model should use for citations.
- Prompt instructions: Tell the model when to cite and how to do it correctly.
- Citation parsing: Extract the citations from the model’s response for downstream use.
Choose citable units
Before writing prompts, clearly define what the model can cite. Common options include:
| Citable unit | Best used for | Downside | Example |
|---|---|---|---|
| Document | You only need to show which document the answer came from. | Not very precise. | Cite the entire employee handbook when you only need to show which document supports the claim. |
| Block / chunk | You want a good balance between simplicity and precision. | Still not exact down to the line. | Cite the specific contract paragraph or retrieved chunk that contains the clause. |
| Line range | You need to show the exact supporting text. | More difficult for the model. | Cite lines L42-L47 when the user needs to verify the precise passage. |
A good citable unit should be:
- Consistent: the same source should keep the same ID across runs.
- Easy to inspect: a person should be able to read it and understand the surrounding context.
- The right size: large enough to make sense, but small enough to stay precise.
For most systems, block-level citations are the best default. They are usually easier for the model than line-level citations and more useful to users than document-level citations.
Represent citable material
The model cannot cite material that has not been presented clearly. Whether material comes from a tool or is injected directly, ensure it has:
- Stable Source ID: Consistent identifier like
file1orblock1. - Readable Text: Clearly formatted source material.
- Metadata (optional): URLs, timestamps, titles, and similar context.
Source IDs vs. locators: A source ID is a stable,
model-generated identifier such as block1. A locator is the
precise UI-rendered highlight, such as lines L8-L13 or
Paragraph 21. In general, the model should emit the source ID,
while your system resolves or renders the locator. Mixing the two too early
tends to increase formatting errors.
Define citation format
You need to define the citation format that the model will generate. Use a format that is explicit, consistent, and easy for the model to reproduce reliably.
Below is our recommended citation format and the markers we recommend. These citation markers are highly recommended because they closely match the markers our models are trained on. If you choose different marker values, keep the overall citation format as similar as possible.
| Piece | What it does | Recommended |
|---|---|---|
CITATION_START | Opens the citation marker. | \ue200 |
| Citation family | Identifies the citation type. Use cite for all supported sources. | cite |
CITATION_DELIMITER | Separates fields inside the marker. | \ue202 |
| Source ID | Identifies the cited unit. turn# is the turn number. item# is the specific file, block, or URL. | turn0file1, turn0block1, turn0url1 |
| Locator (optional) | Narrows the citation to a precise span. | L8-L13 |
CITATION_STOP | Closes the citation marker. | \ue201 |
For tool calls, turnN increments once per tool invocation, not
once per individual result. Within a single invocation, sources are
distinguished by suffixes such as file0, file1, and
so on. In a single-response system, all references will be
turn0… only if the model makes exactly one tool call before
answering. If it makes multiple tool calls, you may instead see references
like turn0fileX, turn1fileX, and so on.
Template
{CITATION_START}<citation_family>{CITATION_DELIMITER}<source_id>{CITATION_DELIMITER}<locator>{CITATION_STOP}Example
{CITATION_START}cite{CITATION_DELIMITER}turn0file1{CITATION_DELIMITER}L8-L13{CITATION_STOP}If your system does not use locators, omit that field:
{CITATION_START}cite{CITATION_DELIMITER}turn0file1{CITATION_STOP}Write effective citation instructions
To maintain maximum accuracy, use familiar citation patterns. Custom or unfamiliar formats increase cognitive load on the model, leading to citation errors, especially in:
- low reasoning effort, where the model has less budget to recover from formatting mistakes.
- high-complexity tasks, where most of the reasoning budget is spent on solving the task itself rather than cleaning up citation syntax.
Below, we recommend a citation format that is close to patterns the model is familiar with. You can use it as-is or adapt it to fit your own system.
If you want to define your own prompt, define:
- the exact marker syntax.
- where citations go.
- when to cite and when not to cite.
- how to cite multiple supports.
- what formats are forbidden.
- what to do when support is missing.
Parse citations
Once the model emits citations, you need to extract them from the response text so you can resolve source IDs, render links, or remove the raw markers before showing the answer to users.
The helper below is designed to be copied directly into your application. It parses single-source citations, multi-source citations, and optional line-range locators while preserving character offsets in the original text.
This example supports line locators only and should be adapted if your system uses a different locator format.
If your source IDs use a different shape, update SOURCE_ID_RE to match your
system.
Examples
The examples below show two common citation patterns:
- Retrieved tool context, where your tool returns citable material and IDs.
- Injected context, where you provide citable blocks directly in the prompt.
Format citations for retrieved tool context
Use this pattern when the model retrieves context through a tool and cites that retrieved context in its answer.
Define citable units
You should choose the citable units based on the precision required for your use case. The examples below show a few possible tool outputs.
The examples below show a few recommended tool output formats. The underlying tool may vary by application, but what matters most is that the output is presented in a clear, stable structure like these examples.
Write prompt instructions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
## Citations
Results are returned by "tool_1". Each message from `tool_1` is called a "source" and identified by its reference ID, which is the first occurrence of `turn\\d+file\\d+` (for example, `turn0file0` or `turn2file1`). In this example, the string `turn0file0` would be the source reference ID.
Citations are references to `tool_1` sources. Citations may be used to refer to either a single source or multiple sources.
A citation to a single source must be written as:
{CITATION_START}cite{CITATION_DELIMITER}turn\d+file\d+{CITATION_STOP}
If line-level citations are supported, a citation to a specific line range must be written as:
{CITATION_START}cite{CITATION_DELIMITER}turn\d+file\d+{CITATION_DELIMITER}L\d+-L\d+{CITATION_STOP}
Citations to multiple sources must be written by emitting multiple citation markers, one for each supporting source.
You must NOT write reference IDs like `turn0file0` verbatim in the response text without putting them between {CITATION_START}...{CITATION_STOP}.
- Place citations at the end of the supported sentence, or inline if the sentence is long and contains multiple supported clauses.
- Citations must be placed after punctuation.
- Cite only retrieved sources that directly support the cited text.
- Never invent source IDs, line ranges, or block locators that were not returned by the tool.
- If multiple retrieved sources materially support a proposition, cite all of them.
- If the retrieved sources disagree, cite the conflicting sources and describe the disagreement accurately.Example output:
The on-call handoff process is documented in the weekly support sync notes. \ue200cite\ue202turn0file0\ue202L8-L13\ue201Format citations for injected context
Use this pattern when you retrieve or prepare the context ahead of time and inject it directly into the prompt.
Define citable units
For injected context, a common pattern is to wrap source segments in explicit tags with stable reference IDs.
1
2
3
4
5
6
7
8
9
10
<BLOCK id="block1">
The service agreement states that termination for convenience requires thirty (30) days’ written notice, unless superseded by a customer-specific addendum.
In practice, renewal terms auto-extend for successive one-year periods when no written non-renewal notice is received before the deadline.
Appendix B further clarifies that pricing exceptions must be approved in writing by both Finance and the account owner.
</BLOCK>
<BLOCK id="block2">
Syllabus
</BLOCK>
...This makes the citable unit explicit and easy for the model to reference.
Write prompt instructions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
## Citations
Supporting context is provided directly in the prompt as citable units. Each citable unit is identified by the value of its `id` attribute in the first occurrence of a tag such as `<BLOCK id="block5"> ... </BLOCK>`. In this example, `block5` would be the source reference ID.
Because this pattern does not invoke tools, there is no tool turn counter to increment. That means you do not need to use a `turn#` prefix for the citation marker. You can keep IDs in a `turn0block5` style if that matches the rest of your system, or use plain IDs like `block5` as shown here. The key requirement is that the citation marker matches the injected context ID exactly and consistently.
Citations are references to these provided citable units. Citations may be used to refer to either a single source or multiple sources.
A citation to a single source must be written as:
{CITATION_START}cite{CITATION_DELIMITER}<block_id>{CITATION_STOP}
For example:
{CITATION_START}cite{CITATION_DELIMITER}block5{CITATION_STOP}
Citations to multiple sources must be written by emitting multiple citation markers, one for each supporting block.
You must NOT write block IDs verbatim in the response text without putting them between {CITATION_START}...{CITATION_STOP}.
- Place citations at the end of the supported sentence, or inline if the sentence is long and contains multiple supported clauses.
- Citations must be placed after punctuation.
- Cite only blocks that appear in the provided context.
- Never invent new block IDs.
- Never cite outside knowledge or outside authorities.
- If multiple blocks materially support a proposition, cite all of them.
- If the provided blocks conflict, cite the conflicting blocks and describe the conflict accurately.Example output:
The Court held that the District Court lacked personal jurisdiction over the petitioner. \ue200cite\ue202block5\ue201Note: OpenAI-hosted tools such as web search provide automatic inline citations. If you want to use hosted tools instead, see the tools overview, web search guide, and file search guide.