I was working towards leveraging PostgreSQL ranges in my Rails application. In that process, I've discovered what I think is a PostgreSQL bug.
PostgreSQL ranges
Postgres has a neat ability to represent ranges, which can come in handy in multiple situations. Under the ranges umbrella term PG contains a few different range data types
- int4range: Range of integer
- int8range: Range of bigint
- numrange: Range of numeric
- tsrange: Range of timestamp without time zone
- tstzrange: Range of timestamp with time zone
- daterange: Range of date
These ranges are especially powerful when querying on them or even adding indexes and constraints.
We can easily test this types by firing up psql
psql SELECT int4range(5,10); int4range ----------- [1,5) (1 row)
We can check weather ranges overlap using overlap operator(&&)
psql SELECT int4range(5,10) && int4range(8,15); # returns true SELECT int4range(5,10) && int4range(11,15); # returns false
To check if the value is included in a range we can use inclusion operator(@>)
psql SELECT int4range(5,10) @> 7; # returns true SELECT int4range(5,10) @> 5; # returns true SELECT int4range(5,10) @> 10; # returns false
As you can see by default beginning of the range is included in the range bounds, while the end of the range by default is excluded. This can be changed by passing a 3 argument to a range constructor
psql // Default SELECT int4range(5,10, '[)') @> 5; # returns true SELECT int4range(5,10, '[)') @> 10; # returns false // Inclusive on both ends SELECT int4range(5,10, '[]') @> 5; # returns true SELECT int4range(5,10, '[]') @> 10; # returns true // Exclusive on both ends SELECT int4range(5,10, '()') @> 5; # returns false SELECT int4range(5,10, '()') @> 10; # returns false
Alright, this is a short intro, there are a lot of super handy possibilities with PG ranges but there are a lot of great posts on this topics already out there.
Ruby ranges & ActiveRecord
Ruby also has a Range type
Range.new(5,10).cover? 10 # returns true Range.new(5,10).cover? 10 # returns false # or with syntactic sugar (5..10).cover? 10 # returns true
By default end of the range is inclusive in Ruby, while in PG it is exclusive. We can control only upper level inclusivity with ruby ranges by passing a 3rd parameter to a Range constructor or using ...(three dots).
Range.new(5,10, exclude_end: true).cover? 10 # returns false (5...10).cover? 10 # returns true
I'm not sure why there isn't ability in Ruby to control inclusivity on the lower bound maybe something like this
Range.new(5,10, exclude_beginning: true)
There is an open discussion on this topic here.
ActiveRecord, works with PG ranges beautifully. Since Ruby has ranges it was not too difficult to support this in ActiveRecord. Main gist is to convert value from PG range column to a ruby range, and vice-versa when storing the value.
You can find ActiveRecord implementation here. Main methods to look at are;
cast_value(converting from PG rangr to Ruby range)
def cast_value(value) return if value == "empty" return value unless value.is_a?(::String) extracted = extract_bounds(value) from = type_cast_single extracted[:from] to = type_cast_single extracted[:to] if !infinity?(from) && extracted[:exclude_start] raise ArgumentError, "The Ruby Range object does not support excluding the beginning of a Range. (unsupported value: '#{value}')" end ::Range.new(*sanitize_bounds(from, to), extracted[:exclude_end]) end
serialize(converting from Ruby range to a PG range)
def serialize(value) if value.is_a?(::Range) from = type_cast_single_for_database(value.begin) to = type_cast_single_for_database(value.end) ::Range.new(from, to, value.exclude_end?) else super end end
So what is the problem then?
The main issue here and what I think is a Postgres bug is how Postgres represents and I guess stores bounds for a ranges.
For int4range type PG will always store and represent range with exclusive upper value.
INSERT INTO public.test VALUES(1, '[5-10)'); INSERT INTO public.test VALUES(2, '[5-10]'); SELECT * FROM test; id | my_range ------------ 1 | '[5-11)' 2 | '[5-11)'
While mathematically [5-10] is the same set as [5-11), ranges may not represent the same thing. We don't have the ability to know what value was exactly inserted into the DB.
This in my opinion is a problem for clients and applications using this data type. Think about inserting this through active_record(or to be even more precise through pg_ruby gem )
Test.create(my_range: (5..10)) Test.create(my_range: (5...10))
They will both return same values after inserting, even same values are not inserted.
Test.first.my_range.to_s # "5...11" Test.second.my_range.to_s # "5...11"
In Ruby those things are not the same:
Range.new(5,10) == Range.new(5,10) # true Range.new(5,11, exclusive_end: true) == Range.new(5,10, exclusive_end: false) # false
A discrete range is one whose element type has a well-defined “step”, such as integer or date. In these types two elements can be said to be adjacent, when there are no valid values between them. This contrasts with continuous ranges, where it's always (or almost always) possible to identify other element values between two given values. For example, a range over the numeric type is continuous, as is a range over timestamp. (Even though timestamp has limited precision, and so could theoretically be treated as discrete, it's better to consider it continuous since the step size is normally not of interest.)
Another way to think about a discrete range type is that there is a clear idea of a “next” or “previous” value for each element value. Knowing that, it is possible to convert between inclusive and exclusive representations of a range's bounds, by choosing the next or previous element value instead of the one originally given. For example, in an integer range type [4,8] and (3,9) denote the same set of values; but this would not be so for a range over numeric.
A discrete range type should have a canonicalization function that is aware of the desired step size for the element type. The canonicalization function is charged with converting equivalent values of the range type to have identical representations, in particular consistently inclusive or exclusive bounds. If a canonicalization function is not specified, then ranges with different formatting will always be treated as unequal, even though they might represent the same set of values in reality.
The built-in range types int4range, int8range, and daterange all use a canonical form that includes the lower bound and excludes the upper bound; that is, [). User-defined range types can use other conventions, however.
✌️