Moderations
Given text and/or image inputs, classifies if those inputs are potentially harmful.
Create moderation
ModelsExpand Collapse
Moderation = object { categories, category_applied_input_types, category_scores, flagged }
categories: object { harassment, "harassment/threatening", hate, 10 more } A list of the categories, and whether they are flagged or not.
A list of the categories, and whether they are flagged or not.
Content that expresses, incites, or promotes harassing language towards any target.
Harassment content that also includes violence or serious harm towards any target.
Content that expresses, incites, or promotes hate based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste. Hateful content aimed at non-protected groups (e.g., chess players) is harassment.
Hateful content that also includes violence or serious harm towards the targeted group based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste.
Content that includes instructions or advice that facilitate the planning or execution of wrongdoing, or that gives advice or instruction on how to commit illicit acts. For example, "how to shoplift" would fit this category.
Content that includes instructions or advice that facilitate the planning or execution of wrongdoing that also includes violence, or that gives advice or instruction on the procurement of any weapon.
Content that promotes, encourages, or depicts acts of self-harm, such as suicide, cutting, and eating disorders.
Content that encourages performing acts of self-harm, such as suicide, cutting, and eating disorders, or that gives instructions or advice on how to commit such acts.
Content where the speaker expresses that they are engaging or intend to engage in acts of self-harm, such as suicide, cutting, and eating disorders.
category_applied_input_types: object { harassment, "harassment/threatening", hate, 10 more } A list of the categories along with the input type(s) that the score applies to.
A list of the categories along with the input type(s) that the score applies to.
The applied input type(s) for the category 'harassment/threatening'.
"self-harm/instructions": array of "text" or "image"The applied input type(s) for the category 'self-harm/instructions'.
The applied input type(s) for the category 'self-harm/instructions'.
"self-harm/intent": array of "text" or "image"The applied input type(s) for the category 'self-harm/intent'.
The applied input type(s) for the category 'self-harm/intent'.