Overview
The Categorize Text action enables you to classify text inputs into pre-defined categories based on descriptions you provide. It takes a text input and a set of category descriptions, and assigns the input to the most relevant category. This action is versatile and can be used for tasks like determining a product of interest based on user data, detecting the presence of specific objects in image descriptions, and more. To use it effectively, provide clear category descriptions following prompt writing best practices, include a catch-all category, and choose the appropriate model quality setting based on the complexity of the task.
Usage Examples
Language categorization - A common use case is to categorize input text by language. The example shows categorizing text as English, Spanish, or neither (a catch-all category).
Product interest determination - The action can be used to determine what product a customer is interested in based on some data/text about them, by defining product categories and descriptions.
Image description categorization - Text descriptions of images can be categorized, for example, to determine if an image contains a dog or not based on the description.
Inputs
Text - The text that you would like to categorize. This can be any type of text, such as a description of an image, a block of text, product information, marketing copy, etc.
A Tale of Two Cities by Charles Dickens: It was the best of times, it was the worst of times...
Categories - The labels you want returned when the text is categorized, exactly as you want it to appear in your output. This value is critical as it is what you might use for text-matching logic in later steps of your workflow.
Language
Descriptions- these descriptions prompt the model on when to use each specific category. The more accurate your description, the more accurate your categorization results will be.
The language the text is written in
Advanced Inputs
Model quality and performance - An option to choose between a model better suited for hard/complex tasks or one optimized for simple tasks. The "better for hard tasks" model may be more expensive but provides higher quality results for nuanced/difficult inputs. The "better for simple tasks" model is cheaper and faster but potentially less accurate for complex inputs.
When using the Categorize Text action, it's important to provide clear and detailed category descriptions that avoid ambiguity. Using consistent phrasing, nouns, and avoiding pronouns in the descriptions will help the model categorize accurately. The catch-all category ensures there is always a category for the model to choose if the input doesn't match the other options.
The advanced model setting allows you to balance cost/speed with quality based on the complexity of your inputs. Following best practices for prompt writing when creating the category descriptions will further improve categorization accuracy.
Outputs
The Categorize Text action in Copy.ai workflows outputs a single category label and reasoning based on the provided input text and the set of category descriptions specified. The output is returned as JSON and each element is reference-able in future steps of workflows.
Example Output:
{ Category: // The category label Reasoning: // The justification for the decision }
Example Reference
#CategorizeStep.Category => // Yields only the category name #CategorizeStep.Reasoning => // Yields only the cateogry reasoning
For example, if the categories defined are "English", "Spanish", and "Neither", then the output will be one of those three labels based on how the input text matches the descriptions provided for each category. It is important to note that the selection is based on how well the input text matches the category descriptions. The accuracy of the output category depends on the quality and detail of the category descriptions provided.
Troubleshooting / Best Practices
Writing effective category descriptions - When writing descriptions for your categories, it's crucial to follow best practices to improve the accuracy of the categorizations. Avoid using pronouns, and try to use the same nouns as in your category labels. Be as clear and detailed as possible in your descriptions β the more detail you provide, the more accurate the categorizations will be.
Including a catch-all category - It's recommended to include a catch-all category, such as "NA" or "none of the above," to handle inputs that don't fit any of your specific categories. This helps prevent the model from making incorrect categorizations or hallucinations when it can't find a suitable category for the input text.
Handling model quality and performance - In the advanced settings, you can choose between a model optimized for hard tasks or one for simple tasks. The simple tasks model is cheaper and faster but may not perform as well for large or nuanced tasks. For such cases, it's recommended to use the model optimized for hard tasks, which may be more accurate but more expensive and slower.