top of page



Design for AI guardrails means building practises and principles in GenAI models to minimise harm, misinformation, toxic behaviour and biases. It is a critical consideration to

  • Protect users and children from harmful language, made-up facts, biases or false information.

  • Build trust and adoption: When users know the system avoids hate speech and misinformation, they feel safer and show willingness to use it often.

  • Ethical compliance: New rules like the EU AI act demand safe AI design. Teams must meet these standards to stay legal and socially responsible.

How to use this pattern
  1. Analyse and guide user inputs: If a prompt could lead to unsafe or sensitive content, guide users towards safer interactions. E.g., when Miko robot comes across profanity, it answers“I am not allowed to entertain such language”.

  2. Filter outputs and moderate content: Use real-time moderation to detect and filter potentially harmful AI outputs, blocking or reframing them before they’re shown to the user. E.g., show a note like: “This response was modified to follow our safety guidelines.

  3. Use pro-active warnings: Subtly notify users when they approach sensitive or high stakes information. E.g., “This is informational advice and not a substitute for medical guidance.”

  4. Create strong user feedback: Make it easy for users to report unsafe, biased or hallucinated outputs to directly improve the AI over time through active learning loops. E.g., Instagram provides in-app option for users to report harm, bias or misinformation.

  5. Cross-validate critical information: For high-stakes domains (like healthcare, law, finance), back up AI-generated outputs with trusted databases to catch hallucinations. Refer pattern 10, Provide data sources.



Design for safety guardrails

bottom of page