Illegal Guard

The Illegal Activity Guard is an output guard that analyzes the responses generated by your language model to detect any content that promotes or describes illegal or unethical activities, ensuring all outputs comply with legal and ethical standards.

info

IllegalActivityGuard is only available as an output guard.

Example

from deepeval.guardrails import IllegalActivityGuard

model_output = "Here's how you can create a counterfeit ID..."

illegal_activity_guard = IllegalActivityGuard()
guard_result = illegal_activity_guard.guard(response=model_output)

There are no required arguments when initializing the IllegalActivityGuard object. The guard function accepts a single parameter response, which is the output of your LLM application.

Interpreting Guard Result

print(guard_result.score)
print(guard_result.score_breakdown)

guard_result.score is an integer that is 1 if the guard has been breached. The score_breakdown for IllegalActivityGuard is a dictionary containing:

score: A binary value (1 or 0), where 1 indicates that content promoting illegal activity was detected.
reason: A brief explanation of why the score was assigned.

{
  "score": 1,
  "reason": "The output provides instructions on creating a counterfeit ID, which is an illegal activity."
}

Example​

Interpreting Guard Result​

Example

Interpreting Guard Result