Abdullah Esmail

August 15, 2024

SQL UNION with Rails

I recently needed to combine two different queries into a single result that can be queryable using Postgres with Rails 7.1. The more I looked online, the more I was convinced that I had to use raw SQL for this. I did not mind the idea of using raw SQL. However, I needed to also modify the result of the UNION further with more business logic. I needed it to be an ActiveRecord Relation instead of just firing a predefined query and getting records back.

If you want to skip the details, check the TLDR at the end.


Set the ground for the problem

Let's first explain the current structure so we understand the problem better. For the sake of this article, I'll create imaginary models that are simpler but similar to the actual ones. We have items and each item could have many parts depending on its type.

class Item
  has_many :parts
  validates :type, inclusion: { in: %w[single combined] }
  validates :status, inclusion: { in: %w[draft available] }
  validates :level, inclusion: { in: %w[one two three] }
end

class Part
  belongs_to :item
  validates :quantity, numericality: { only_integer: true, greater_than_or_equal_to: 0}
  # arrived_date (date)
  # available_until_date (date)
end

The requirement is, we need to show all items that are available. What does available mean? Available items fall into two categories:

  1. The item's type is single and its status is available.
  2. The item's type is combined, its status is available, and has at least one part with a positive quantity and an available_until_date in the future.

How would we query the available items based on these conditions? Well, the first requirement is easy. It's a very basic ActiveRecord query:

@items = Item.where(type: "single", status: "available")

How do we continue from here? Maybe it would be easier if we add scopes to our Item models for code clarity:

class Item
  scope :single, -> { where(type: "single") }
  scope :combined, -> { where(type: "combined") }
  scope :available, -> { where(status: "available") }
  scope :has_parts, -> { joins(:parts).where(parts: {quantity: 1.., available_until_date: Date.tomorrow..}) }
end

So now we can do:

single_items = Item.single.available

This would get us all the single items that are available which satisfies requirement #1. But, what about requirement #2?

combined_items = Item.combined.available.has_parts

Alright, now we have two sets of items. The union of both sets satisfies the requirements of "available items". The issue later became that users needed to sort and filter those items based on some conditions. It would be extremely great if we could something like this:

# this is not supported
all_items = single_items.union(combined_items).order(created_at: :desc)

However, in the real world, you have to fight to get what you want. Or maybe not. We'll see.

After researching a few websites, reading a few articles, and experimenting, I found a really cool gem called active_record_extended. The gem seems to do what I want and way more, but I really didn't want to add more gems if I didn't have to. So we'll keep this as a last resort.


Going back to basics

To understand why I'm having a hard time doing this in ActiveRecord, I decided to fully write the raw SQL, and then slowly work my way backwards to ActiveRecord (if it's possible).

Writing the SQL is (almost) easy especially that ActiveRecord gives us the to_sql method to return the raw SQL that it sends to the database. So let's start with that:

single_items_sql = Item.single.available.to_sql

This produces:

SELECT "items".* FROM "items" WHERE "items"."type" = 'single' AND "items"."status" = 'available';

We'll do the same for the combined items:

combined_items_sql = Item.combined.available.has_parts.to_sql

Result:

SELECT "items".* FROM "items" INNER JOIN "parts" ON "parts"."item_id" = "items"."id" WHERE "items"."status" = 'available' AND "items"."type" = 'combined' AND "parts"."quantity" >= 1 AND "parts"."available_until_date" >= '2024-08-15';

Cool. So now we can combine the results, right? How do we do that? Back to the knowledge base of the internet. By the way, experimenting with raw SQL is extremely fun using a GUI app. I'm using Postico which is amazing if you're running on macOS.

After a few experiments, I reached the following SQL which gets me exactly what I want:

SELECT *
FROM
(
    (SELECT "items".* FROM "items" WHERE "items"."type" = 'single' AND "items"."status" = 'available')

    UNION

    (SELECT "items".* FROM "items" INNER JOIN "parts" ON "parts"."item_id" = "items"."id" WHERE "items"."status" = 'available' AND "items"."type" = 'combined' AND AND "parts"."quantity" >= 1 AND "parts"."available_until_date" >= '2024-08-15')
) AS items
ORDER BY items.created_at DESC
LIMIT 10;

This single query satisfies both requirements and gets us the list of the 10 most recent available items.

Now, how do we move this into ActiveRecord? Is it possible in the first place?


(Re)Discovering ActiveRecord

One solution I found online is to send this query directly to the database from ActiveRecord, like so:

items = ActiveRecord::Base.connection.execute("SELECT * FROM ...").to_a

This does get us what we want. The advantage of this method is that you can send arbitrary SQL statements and get whatever result it returns without being restricted to the structure of the model. However, this has a few issues:
  • It returns an array of Hashes, not ActiveRecord objects.
  • The SQL statement is "hardcoded" and not easily modifiable or chain-able.

Okay. Is there another way? Yup, again after a few minutes of looking on the internet I found the find_by_sql method from the Rails API docs:

items = Item.find_by_sql("SELECT * FROM ...")

Nice! This actually gives us back ActiveRecord objects! Getting closer...
We eliminated one issue from the previous solution. Now how do we get a relation that we can use with ActiveRecord and a one that ActiveRecord doesn't complain about?

Looking back at the raw SQL query above, it takes the form of:

SELECT [fields] FROM [table] ORDER BY [fields] LIMIT [number];

Looks like a typical SELECT structure.

So can we use that to our advantage somehow? Can we convince ActiveRecord that our table is a custom SQL query? Searching the Rails API documentation again for keywords like select and find and from, I found a method called from. The documentation says that the method:

Specifies the table from which the records will be fetched.

Well. Let's give it a try:

items = Item.from("SELECT * FROM (SELECT * ...) UNION (SELECT * ...)")

This blew up pretty fast. Looking at the error logs in the terminal:

... syntax error at or near \"SELECT\"\nLINE 1...

Apparently, the from method adds "SELECT * FROM" to its argument. So let's remove that part:

items = Item.from("((SELECT * ...) UNION (SELECT * ...)) as items")

No errors. Could it be?

items.class
#=> Item::ActiveRecord_Relation

No way! Testing further:

items.limit(10).count
#=> 10

item = items.where(level: "one").order(created_at: :desc).first
item.level #=> one

page_two = items.order(created_at: :desc).page(2).per(10) #=> works!

I couldn't believe it! It worked!


The final solution (the TLDR)

We can easily chain more ActiveRecord methods to filter/sort based on user input.

Now, we want to write the least raw SQL possible. So I created a class called AvailableItems:

class AvailableItems
  def fetch
    single_items = Item.single.available.to_sql
    combined_items = Item.combined.available.has_parts.to_sql
    Item.from("(#{single_items} UNION #{combined_items}) as items")
  end
end

# in our controller
class ItemsController < ApplicationController
  def index
    @items = AvailableItems.new.fetch
    @items = @items.where(level: params[:level]) if params[:level]
    @items = @items.page(params[:page]).per(20)
    @items = @items.order(created_at: :desc)
  end
end

All in all, I'm very satisfied with the solution.


EDITS

2024-08-16

After writing the article, I got some suggestions to try other approaches. One suggestion that stood out was using Arel directly. Arel is (still) considered a private API by the Rails core team though. You can still use it and developers do use it a lot. Just keep in mind of the fact that it could change/break without notice.

single_items = Item.single.available
combined_items = Item.combined.available.has_parts
union_items = Arel::Nodes::Union.new(single_items.arel, combined_items.arel)
items_table = Arel::Nodes::TableAlias.new(union_items, Item.table_name)
items = Item.from(items_table)

This actually feels cleaner. The only issue I feel with Arel is, being a private API, documentation is outdated or non-existent sometimes.

About Abdullah Esmail

I'm a rails developer and loving it. I love simplifying things. The author of Kaffy. When I'm not coding, I'm drinking coffee and checking what other people are coding. I also enjoy working with elixir and flutter. Still have high hopes for RubyMotion.