Paulo Silva

March 13, 2025

Categorize Data using OpenAI's API

At sheerME, I recently developed a feature to categorize beauty and wellness services using OpenAI's API. This approach allowed us to efficiently classify services into predefined categories based on their name and description. Here’s how I implemented it in Ruby on Rails.

Step 1: Real-Time Categorization with the Conversations API

For new services, I needed instant categorization. I used OpenAI's real-time Conversations API to send a prompt and receive a response. The API was instructed to return one of four categories: care, color, style, or other.

require 'openai'

client = OpenAI::Client.new(access_token: ENV["OPENAI_ACCESS_TOKEN"])

def categorize_service!(service)
  prompt = "Categorize the following service as 'care', 'color', 'style', or 'other':\nName: #{service.name}\nDescription: #{service.description}"

  response = client.chat(parameters: {
    model: "gpt-4-turbo",
    messages: [{ role: "user", content: prompt }],
    max_tokens: 10
  })

  category = response.dig("choices", 0, "message", "content").strip
  service.update!(category: category)
end

This method sends a request to OpenAI and extracts the returned category. I would then update the service with the returned category.


Step 2: Batch Processing for Existing Services

For categorizing thousands of existing services in the database, I used OpenAI's Batch API, which allows processing large volumes asynchronously. The steps needed are the following:

  • Prepare the batch file: I exported each request into a JSONL file.

def create_batch_file
  File.open(Rails.root.join("tmp", BATCH_FILENAME), "w") do |file|
    services.find_each do |service|
      next unless can_run?(service)

      record = {
        custom_id: service.id.to_s,
        method: "POST",
        url: "/v1/chat/completions",
        body: {
          model: "gpt-4o-mini-2024-07-18",
          messages: [{ role: "user", content: prompt(service) }],
          max_tokens: 5
        }
      }
      file.puts(record.to_json)
    end
  end
end

  • Upload the file: I uploaded the file using the Files API.

def upload_batch_file(client)
  file_path = Rails.root.join('tmp', BATCH_FILENAME)
  file = File.open(file_path, 'rb')

  batch_input_file = client.files.upload(parameters: {
    file: file,
    purpose: 'batch'
  })

  file.close
  batch_input_file
end

  • Create a Batch: I then created a Batch using the Batch Input File ID.

def create_batch(batch_file_id, client)
  client.batches.create(parameters: {
    input_file_id: batch_file_id,
    endpoint: "/v1/chat/completions",
    completion_window: "24h"
  })
end

  • Check the status of a Batch: I would check if the Batch was completed or not. This would return an object with a status attribute that could be 'processing', 'completed' or 'failed'.

def check_status(batch_id, client)
  client.batches.retrieve(id: batch_id)
end

  • Retrieve the results: After the batch processing was completed, I retrieved the JSONL file with the results and updated the services in the database.

def retrieve_results(output_file_id, client)
  response = client.files.content(id: output_file_id)

  response.each do |line|
    service_id = line["custom_id"]
    category = line.dig("response", "body", "choices", 0, "message", "content")
    service = Service.find_by(id: service_id)

    service.update_column(:category, category) if service.present?
  end
end


Conclusion

Using OpenAI's API in these two steps provided an efficient way to categorize services in real time and at scale. The Conversations API handled new inputs instantly, while the Batch API processed large amounts of existing data asynchronously, optimizing performance and cost.

This approach is a great way to leverage AI for structured categorization tasks in a Rails application.

About Paulo Silva

Software Engineer specialized in product development with Ruby on Rails. I help companies turn bright ideas into amazing digital products — I've worked on InvoiceXpress, ClanHR, Today and currently sheerME.