Riccardo Merolla

May 10, 2022

ZIO - Pure functional programming

TLDR: "How to generate 1K entity records in 1 second and 10K in less than 4 seconds..."

Attempting to fix some performance issues ...


image.png


...I ran into the classic mistakes of the past, present, and future. The list is long, from why to build a framework at home, Hibernate and the N + 1000 mysterious queries, the classes of 3K lines of code, the spaghetti Inject, SRP and SOLID these unknowns, cyclomatic complexities beyond any measurement scale ( when you get to the eighth nested if you would like to smear wasabi on your eyes), and so on.
But this beautiful world is called legacy code, we are all part of it a bit, a Ctrl + C and Ctrl + V here and there and that's it, just add a parameter, twenty-seven if's and there it is the new requirement ... who among us has never done it?!? In the short term, the problem is "solved", but it's like putting a timer for future disasters in the source code ... afterward I will fix it, maybe when I have some time I will write a test to validate this behavior ...
I started from this point, the test, not because I feel a desperate need for extreme TDD, even if in part it would certainly bring benefits, but because when it comes to performance you need numbers, measurements, and benchmarks that can be used to validate if our changes are bringing benefit.
One of the difficulties of the legacy code, per definition code without tests, is the ability to quickly, easily and in isolation test a part of the code, so focusing on this concept and looking at the initial problem to solve, the performance, I have tried to implement a solution using a pure functional library, strictly ZIO.

What is ZIO? ZIO is a library written in Scala based on pure functional programming with a particular focus on asynchronous and concurrent or parallel programming, helping to solve complicated problems with a type-safe, testable, and composable code (Editor's note I omitted "simple"). For those who have notions of Haskell, ZIO is a data-type inspired by the Monad IO, I will simplify some concepts, but...

The data-type ZIO is defined like this:
ZIO[R, E, A]

Where R is the given type called Environment, E is the type for failure and A is the type for success, in a not entirely precise way one could imagine a function that given an environment returns an Either[E, A]:
R => Either[E, A] 

The ZIO data-type is not a function, it describes a complex effect, "encloses" a behavior, which can then be composed and transformed with the classic methods of functional programming (map, flatMap, for-comprehensions, zipping, ...) . Putting these data-types together is like describing a program, which will then only run when an Environment that meets all requirements is provided. The concept may seem trivial, but the written code is very concise, it is possible to have classes with only Business Logic, all easily composable and testable in isolation. As for parallelism and performance, ZIO uses fibers, a new concept similar to threads but much lighter.

The lever that made me experiment with this was linked to the problems relating to the performance of a legacy function in one of the projects that I working on. Now, I have to admit that a large part of the responsibility is a legacy shared, from some home-made framework, Hibernate puts his own, even if maybe it's not totally it faults, JavaEE doesn't make your life easier, but the fact is that even after some optimizations this legacy function was always slow to produce results, we are talking about tens of minutes for less than 1K entity records and sometimes fails after some timeouts. So the complaints about slowness were still present and there isn't a real measure of how long it took to complete this task, with a defined number of entities, and what will be the upper limit.

Thanks to a colleague I extracted the log of the SQL queries executed by the current version of the legacy code... crazy the number of queries that the Hibernate produces, I don't even remember how many there were, but after a reduction, we have arrived at the conclusion that about 4/5 queries were needed to generate a valid entity record for the use case.

I've started from a project template with some purely functional libraries (ZIO, tAPIr, Doobie, Http4s), but before creating endpoints, logic, and more I wanted to create an isolation test for the DB access part, which in this case was the one that had been critical from the previous analysis.

Let's start with some ZIO data-types and some interfaces or traits in Scala:
trait SerialKitRepository:
  def createIncomplete(bomId: Long, expirationDays: Int, divcod: Int, azcod: Int = 1): Task[Long]


object SerialKitRepository extends zio.Accessible[SerialKitRepository]

That's all, zio.Accessible is a macro, while the data-type Task is an abbreviation
Task[Long] = ZIO[Any, Throwable, Long]

Starting from the trait, it is then possible to implement what in "ZIO" are called Service, that is the case classes that implement the methods, where by convention the one ending with "Live" is the production one
case class SerialKitRepositoryLive(trx: Transactor[Task]) extends SerialKitRepository:
  override def createIncomplete(bomId: Long, expirationDays: Int, divcod: Int, azcod: Int = 1): Task[Long] =
    sql"""....

object SerialKitRepositoryLive:
  val layer: URLayer[DBTransactor, SerialKitRepository] = ???

case object SerialKitRepositoryMock extends SerialKitRepository:
  override def createIncomplete(bomId: Long, expirationDays: Int, divcod: Int, azcod: Int = 1): Task[Long] =
    Task.succeed(1)

The companion object of our SerialKitRepositoryLive service will only provide the layer property which is basically the declaration of dependencies, in this case, the DBTransactor with the configuration to access our database.
Obviously, before I had created my empty tests, as the TDD wants ... not the most exhaustive of the test suites, but always better than nothing
object SerialKitSpec extends DefaultRunnableSpec {

  def spec = suite("SerialKitSpec")(
    test("should create a new serial kit in INCOMPLETE status") {
      ???
    },
    test("should create 1K of new serial kits in few seconds") {
      ???
    },
    test("should create INCOMPLETE serial kits from prod_order_det_id") {
      ???
    }
  )
}

I have my logic, and I have described the effects, let's see how to create the test program, starting from the first:
private val testLayer = ZLayer.make[SerialKitRepository](SerialKitRepositoryLive.layer, DBTransactorLive.layer, ConfigLive.layer)

test("should create a new serial kit in INCOMPLETE status") {  
  val testcase = for {
    id <- SerialKitRepository(_.createIncomplete(16441870, 60, 9, 1))
  } yield assert(id)(isGreaterThan(0L))
  
  testcase.provideLayer(testLayer)
}

In this case, the composition of my test case is very simple, since the compiler is all typed and then helps me to compose the correct layer necessary to run my test. First green test, now we can work on performance ...
test("should create 1K of new serial kits in few seconds") {
  val program = SerialKitRepository(_.createIncomplete(16441870, 60, 9, 1))
  val qty = 1000
  val testcase = for {
    l  <- ZIO.foreach(1 to qty)(_ => program)
  } yield l

  testcase.map(assert(_)(anything)).provideCustomLayer(testLayer)
}

Up to now fast, a few seconds to generate 1K of Serial Kit, but the optimizations are mainly due to the essential and simplified queries and the reduction of layers compared to the legacy code ... the first and thousandth lines:
16:23:37.052 [zio-default-blocking-2] DEBUG com.zaxxer.hikari.pool.PoolBase - HikariPool-1 - Reset (autoCommit) on connection org.postgresql.jdbc.PgConnection@483211b1
...
16:23:40.780 [zio-default-blocking-1] DEBUG com.zaxxer.hikari.pool.PoolBase - HikariPool-1 - Reset (autoCommit) on connection org.postgresql.jdbc.PgConnection@483211b1

but before you talked about concurrency, parallelism, and fiber?!? ... it's true, let's modify our test...now I understand that this change will have an important impact on these few lines of code, but try to follow me:
test("should create 1K of new serial kits in few seconds") {
  val program = SerialKitRepository(_.createIncomplete(16441870, 60, 9, 1))
  val qty = 1000
  val testcase = for {
    l  <- ZIO.foreachPar(1 to qty)(_ => program)
  } yield l

  testcase.map(assert(_)(anything)).provideCustomLayer(testLayer)
}

Find the differences ... there is an extra "Par", but here is the first and the thousandth line:
16:25:07.704 [zio-default-blocking-1] DEBUG com.zaxxer.hikari.pool.PoolBase - HikariPool-1 - Reset (autoCommit) on connection org.postgresql.jdbc.PgConnection@6b374b79
...
16:25:09.074 [zio-default-blocking-1] DEBUG com.zaxxer.hikari.pool.PoolBase - HikariPool-1 - Reset (autoCommit) on connection org.postgresql.jdbc.PgConnection@2b80e7cf

Almost 4 seconds to just over a second ... not bad, and if I put qty 10K:
16:31:53.402 [zio-default-blocking-11] DEBUG com.zaxxer.hikari.pool.PoolBase - HikariPool-1 - Reset (autoCommit) on connection org.postgresql.jdbc.PgConnection@3aaf75ec
...
16:31:56.908 [zio-default-blocking-1] DEBUG com.zaxxer.hikari.pool.PoolBase - HikariPool-1 - Reset (autoCommit) on connection org.postgresql.jdbc.PgConnection@34688d11

10K entity record generated in 3,5 seconds...this is "PERFORMANCE"!!!