At sheerME, I recently developed a feature to categorize beauty and wellness services using OpenAI's API. This approach allowed us to efficiently classify services into predefined categories based on their name and description. Here’s how I implemented it in Ruby on Rails.
Step 1: Real-Time Categorization with the Conversations API
For new services, I needed instant categorization. I used OpenAI's real-time Conversations API to send a prompt and receive a response. The API was instructed to return one of four categories: care, color, style, or other.
require 'openai' client = OpenAI::Client.new(access_token: ENV["OPENAI_ACCESS_TOKEN"]) def categorize_service!(service) prompt = "Categorize the following service as 'care', 'color', 'style', or 'other':\nName: #{service.name}\nDescription: #{service.description}" response = client.chat(parameters: { model: "gpt-4-turbo", messages: [{ role: "user", content: prompt }], max_tokens: 10 }) category = response.dig("choices", 0, "message", "content").strip service.update!(category: category) end
This method sends a request to OpenAI and extracts the returned category. I would then update the service with the returned category.
Step 2: Batch Processing for Existing Services
For categorizing thousands of existing services in the database, I used OpenAI's Batch API, which allows processing large volumes asynchronously. The steps needed are the following:
- Prepare the batch file: I exported each request into a JSONL file.
def create_batch_file File.open(Rails.root.join("tmp", BATCH_FILENAME), "w") do |file| services.find_each do |service| next unless can_run?(service) record = { custom_id: service.id.to_s, method: "POST", url: "/v1/chat/completions", body: { model: "gpt-4o-mini-2024-07-18", messages: [{ role: "user", content: prompt(service) }], max_tokens: 5 } } file.puts(record.to_json) end end end
- Upload the file: I uploaded the file using the Files API.
def upload_batch_file(client) file_path = Rails.root.join('tmp', BATCH_FILENAME) file = File.open(file_path, 'rb') batch_input_file = client.files.upload(parameters: { file: file, purpose: 'batch' }) file.close batch_input_file end
- Create a Batch: I then created a Batch using the Batch Input File ID.
def create_batch(batch_file_id, client) client.batches.create(parameters: { input_file_id: batch_file_id, endpoint: "/v1/chat/completions", completion_window: "24h" }) end
- Check the status of a Batch: I would check if the Batch was completed or not. This would return an object with a status attribute that could be 'processing', 'completed' or 'failed'.
def check_status(batch_id, client) client.batches.retrieve(id: batch_id) end
- Retrieve the results: After the batch processing was completed, I retrieved the JSONL file with the results and updated the services in the database.
def retrieve_results(output_file_id, client) response = client.files.content(id: output_file_id) response.each do |line| service_id = line["custom_id"] category = line.dig("response", "body", "choices", 0, "message", "content") service = Service.find_by(id: service_id) service.update_column(:category, category) if service.present? end end
Conclusion
Using OpenAI's API in these two steps provided an efficient way to categorize services in real time and at scale. The Conversations API handled new inputs instantly, while the Batch API processed large amounts of existing data asynchronously, optimizing performance and cost.
This approach is a great way to leverage AI for structured categorization tasks in a Rails application.