Garrett Catlin

September 29, 2022

Spatial Clustering: Love Letters (Part 4: Discrete Event Simulation)

Project Recap:

Over the last 3 posts on the topic, we've been taking a look at different spatial clustering methods to determine the placement of vans. What kind of vans? Vans called to a higher purpose: delivering handwritten greeting cards (see post 1 if you'd like to know why that's a total misnomer). Our goal is righteous: to minimize the time it takes our cards to be delivered by printing out the cards as soon as they're written and having that van deliver the greeting card, love letter, apology note, etc.  in question now!

Our perfectly amusing testing city is Casper, WY. This has nothing to do with the fact I had to access this data for another very similar project and everything to do with how windy it is there. Delivery drones apparently scare the CEO of Handwritten Greeting Cards (Now!) and they have little chance of lasting more than a couple hours in Casper. Wow, I have had some coffee today!

In post 2, we covered some existing weighted clustering using the PAMonce algorithm weighted by the predicted number of letters delivered to each census block. We did this with 2 types of distance matrices: one based on Euclidean distances and one based on road network travel times. We found that using PAMonce on the travel time distance matrix generally yielded more optimized clustering in terms of average travel time from a van to a given census block within each cluster. However, one issue present with both options was that the clusters typically were not balanced in terms of workload. I.E., one driver/van would work vastly harder and more often than the others. We wanted to see if that could be fixed.

In post 3, we wrote Equalify: a sufficiently dumb algorithm that, generally speaking, is very inefficient but does, however, balance the workload of the vans with minimal hits to travel time within each cluster. After running it, we now have 3 clustering solutions for 2-10 vans. How might we compare them?

One interesting option before comparison is injecting some reality into this problem. Sure, we have placements for the vans, but what happens if one van has 20 letters to deliver and another has none? Wouldn't it make sense to have the one being underutilized to help out? What about working hours? It surely doesn't make sense for our writers to be writing letters 24 hours a day. And you certainly wouldn't want someone knocking on your door at 2am to deliver a thoughtful, though inconveniently timed confession of undying affection... These are the types of questions we aim to answer today.

Setup & Data Generation

As with all the scripts so far, you can follow along with the 6 - simulation.R script in the love_letters Github Repository if you so please. The packages we'll need are the usual suspects with 2 new additions.

lubridate is pretty well known at this point but it'll help with working with dates (this is really important for our simulation). simmer is, as far as I can tell, the premier discrete event simulation package in R. I have some issues with it - as you'll no doubt see, some formatting things are unintuitive and the syntax I've deemed 'simmer speak' can be unforgiving. Nonetheless, I've been able to sufficiently bend its framework to my will when I've used it!

Again, planning the multisession to utilize parallelization (this script definitely needs it) and setting the pushoverr options if you decide to use it. Pushover allows you to send notifications to your device of choice, which I find handy. Post 3 covers how to set it up.

library(data.table)
library(stringr)
library(future.apply)
library(pushoverr)
library(lubridate)
library(simmer)

plan(multisession)
set_pushover_user(user = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX")
set_pushover_app(token = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX")

From here, I'm importing what we'll need and setting some debugging options for our simulation function. We essentially need to run the same code 27 different times so that's why I'm generalizing here.

# imports
time_mat = readRDS("R_Objects/time_mat.rds")
cluster_dat = readRDS("R_Objects/cluster_dat.rds")
monthly = readRDS("R_Objects/monthly.rds")

# debugging options
monthly_data = monthly
n_vans = 3
method = "Equal"
cluster_solutions = cluster_dat

In case you've forgotten (because I sure did), "monthly" is the data we created with generated counts of letters predicted per census block per month all the way back in the first post. We accounted for holidays this way and I knew we'd use it here too.

> head(monthly)
   ID Population      Lat       Lon MonthNo Response
1:  1        253 42.86459 -106.3259       1        5
2:  2         30 42.86166 -106.3269       1        0
3:  3         22 42.86171 -106.3105       1        2
4:  4          6 42.86147 -106.3147       1        0
5:  5          2 42.86297 -106.3211       1        0
6:  6         18 42.86190 -106.3164       1        2
> tail(monthly)
     ID Population      Lat       Lon MonthNo Response
1: 1992         34 42.84613 -106.3933      12        1
2: 1993         50 42.84633 -106.3921      12        0
3: 1994         15 42.84656 -106.3908      12        1
4: 1995         23 42.84740 -106.3897      12        0
5: 1996         29 42.85365 -106.3942      12        2
6: 1997         10 42.84387 -106.4186      12        1

Our letter_sim function starts with the items we'll need as inputs. From there, we'll get the van placements for our chosen method, van clustering solution:

# function
letter_sim = function(monthly_data, cluster_solutions, n_vans, 
                      method, time_mat) {
  
  # get placements
  clust_assign = cluster_solutions[N_Clust == n_vans & Method == method]
  placements = clust_assign[Medoid == 1]
  clust_assign = clust_assign[, .(ID, Clust)]

> head(clust_assign)
   ID Clust
1:  1     1
2:  2     1
3:  3     1
4:  4     1
5:  5     1
6:  6     1

> placements[, .(ID, Clust, Medoid)]
     ID Clust Medoid
1:  188     1      1
2:  588     2      1
3: 1356     3      1

We'll subset our travel time matrix to the travel times to every census block for our chosen van placements:

  # subset travel times for placements
  placement_ids = c(sort(placements$ID), "ID")
  time_mat = time_mat[, ..placement_ids]
  setnames(time_mat, c(paste0("van_",1:n_vans,"_time"), "ID"))
  setkey(time_mat, ID)

> head(time_mat)
   van_1_time van_2_time van_3_time ID
1:        3.7        8.9       11.2  1
2:        3.1        8.4       10.7  2
3:        3.7        7.3       11.4  3
4:        3.8        7.4       11.3  4
5:        3.7        8.9       11.2  5
6:        3.5        7.8       11.1  6

Next, we'll make a date key. This is really important because simmer requires every column of the data frame / data.table to be of class type numeric or integer for some reason. This means we can't just have one column in a ymd_hms format like "2011-12-31 12:59:59". We instead have to backwards engineer this with one numeric value. Here's what our date key looks like:

> head(datekey)
         Date Elapsed_Day
1: 2023-01-01           1
2: 2023-01-02           2
3: 2023-01-03           3
4: 2023-01-04           4
5: 2023-01-05           5
6: 2023-01-06           6

So, if a letter request comes in on the 6th of January 2023 at 1pm, this will have to correspond to:

> 6 + 13/24
[1] 6.541667

These are the kind of calculations we'll have to constantly be doing when interfacing with simmer. I'm really not a fan of this, and it's entirely possible I'm missing something, but I've gotten this method to work so far so it'll do for now.

We'll use this date key in tandem with our monthly input to generate a data.table that simmer will be happy with. Notice how in our monthly data, census block 1 has 5 predicted letters for the month of January.

> monthly_data[1]
   ID Population      Lat       Lon MonthNo Response
1:  1        253 42.86459 -106.3259       1        5

We need to take this, expand it to be 5 unique arrival times (in simmer speak) in January like:

    simmer_time
 1:    15.10069
 2:    26.95417
 3:    28.46944
 4:    29.58681
 5:    31.89583

As you can see these correspond to the 15th, 26th, 28th, 29th, and 31st of January all with corresponding times of day. Again, not a fan of this, but it's within reach. 

First, I'm making a function that will tackle our monthly data by row. Some logic at the beginning to make sure the month corresponds to one lubridate will understand. In practice, this simply take a month formatted as "1" and turns it to "01".

  # generate simmer data function
  simmer_gen = function(dt) {
    if (dt$MonthNo %in% 1:9) {
      month = paste0(dt$MonthNo)
    } else {month = as.character(dt$MonthNo)}

If there were predicted letters for that census block in that month, we'll sample from the days in that month for every predicted letter in our single number format:

    if (dt$Response != 0) {
    month_full = datekey[month(Date) == month]
    day_request = sample(month_full$Elapsed_Day, dt$Response, replace = T)

> head(month_full)
         Date Elapsed_Day
1: 2023-01-01           1
2: 2023-01-02           2
3: 2023-01-03           3
4: 2023-01-04           4
5: 2023-01-05           5
6: 2023-01-06           6

> day_request
[1] 13  4 20 25 30

Similarly, for hour and minute, then adding together and creating a table, with ID and arrival time:

    hour_request = sample(1:23, dt$Response, replace = T)/24
    minute_request = sample(1:59, dt$Response, replace = T)/1440
    simmer_time = day_request + hour_request + minute_request
    table = data.table(ID = dt$ID, simmer_time)

> table
   ID simmer_time
1:  1   13.545833
2:  1    4.978472
3:  1   20.988889
4:  1   25.384722
5:  1   30.331944

If that row in our monthly data had no predicted calls, the table will be NULL. Finally, we'll return the table.

    } else (table = NULL)
    
    return(table)
  }

This function implies that letters can and will be requested at any time of day for any date in the 2023 calendar year. This may not be super realistic, but if Amazon will take orders at 2am then so should we.

Now all that's left is to apply this function to every row in our monthly data. Then we can add back all the information we'll need for the simulation by merging tables. We'll get van assignment as well as travel times to a given census block from every van. Finally, we have to order by arrival time or else simmer won't take our data.

  # generate simmer data
  simmer_data = data.table()
  for (row in 1:nrow(monthly_data)) {
    table = simmer_gen(monthly_data[row])
    simmer_data = rbind(simmer_data, table)
  }
  setkey(simmer_data, ID)
  setkey(clust_assign, ID)
  simmer_data = clust_assign[simmer_data]
  simmer_data[, Clust := as.numeric(Clust)]
  setkey(simmer_data, ID)
  simmer_data = time_mat[simmer_data]
  setkey(simmer_data, simmer_time) # orders by time for simmer

At the end, we have a table with all numeric columns, ordered by arrival time. This is the only format simmer will accept:

> head(simmer_data)
   van_1_time van_2_time van_3_time   ID Clust simmer_time
1:        6.6        3.8       14.6  540     2    1.050694
2:        1.4        7.2        9.8  242     1    1.061111
3:       12.9       17.9        8.4 1669     3    1.061111
4:        4.7        4.5       12.6  490     2    1.070139
5:        2.8        6.5       10.5   49     1    1.073611
6:        8.9        2.6       11.6 1549     2    1.085417

Last thing in this phase, we'll extent our date key in case our simulation runs over the end of 2023.

  # date key overwrite (so sim can extend past generated year)
  datekey = data.table(Date = seq(as.Date("2023-01-01"), 
                                  as.Date("2025-12-31"), 
                                  by = "1 day"))
  datekey[,Elapsed_Day := 1:.N]
  setkey(datekey, Date)

All done with the setup!

Simmer: A (Relatively) Quick Overview

As you've probably gathered, I don't much care for the all numeric data input when using simmer for DES. Some of the ways of doing things can be unintuitive (at least if you're me) so this is my best attempt at a crash course for simmer syntax and setup in context of our example. 

Simmer uses the pipe operator (ctrl+shift+m) and so the chains are like those used in the tidyverse. If you use simmer in any context in which you're inputting data the setup will be something like the following:

 # sim setup
 sim = simmer() %>% 
   add_resource(paste0("van_", 1:n_vans), 1) %>% 
   add_dataframe("letter ", 
                  letter, 
                  simmer_data, 
                  col_time = "simmer_time", 
                  time = "absolute", 
                  mon = 2)

add_resource is pretty intuitive. Just adding the number of vans we have for our given clustering solution with names and marking their capacity as 1. The capacity argument can be used to greater extent in different situations than ours: where one amusement park ride can hold more than 1 person, for example. 

Our data is added with add_dataframe. 
  • "letter " corresponds to the name which will be followed by a number when it is run - letter 1, letter 2, etc. 
  • letter here corresponds to the trajectory each letter will follow. The trajectory is the meat of simmer and you'll see how crazy that can get shortly. 
  • simmer_data is our data we created, again it must be in all numeric and ordered by arrival time. 
  • You must then specify which column contains your time, what kind of time measurement it is, and how much information to store while the simulation is running with the mon = argument.

To run a simulation, is pretty straightforward. To get stored information you have use get_mon_attributes:

sim %>% simmer::run()
results = sim %>% get_mon_attributes() 

Data will be in long format and if you want it in wide format, you'll need to use dcast if using data.table. Now for the fun part: the trajectory. The trajectory tells the simulation what to do with every letter that arrives. There are a few main commands I end up using:

1) timeout is the command to add time. This can be a simple one liner but every time I've used simmer I use something like the following. Here, we're adding 1-2 days before processing a letter.

# define trajectory
letter = trajectory() %>%

  # delay by 1-2 days before processing
  timeout(function() {
    sample(1:2, 1)
  }) %>%

2) set_attribute is used to store information about the run. Otherwise this data is not collected. You can set multiple attributes at once. Here, we're storing how long we delayed our letter by and the absolute time the delay ended. I.E., if our letter was requested at 12a on January 3rd and our delay was 1 day, "delay_end" would be 3 + 1 = 4 and "delay_time" would be 4 - 3 = 1. 

  # record delay duration and end
  set_attribute(c("delay_end", "delay_time"), function() {
    c(simmer::now(sim), 
      simmer::now(sim) - get_attribute(sim, "simmer_time"))
  }) %>% 

This illustrates a couple odd things in "simmer speak", as I've decided to call it. 

  1. You can't record information with set_attribute within a timeout call so you have to end up doing these types of maths to record information after the timeout occured. 
  2. Simmer's now() function yields the current time. However, it's out-scoped by packages like lubridate which also has a now() function that gas an entirely different use case. For safety, I always use simmer::now()
  3. get_attribute grabs information from your data so you can use it. Unlike set_attribute; however, you MUST tell it which simulation to get the information from. Recall that we defined our simulation with the following code. This is where you have to point back to each time you need get_attribute (and simmer::now(sim)).

  # sim setup
  sim = simmer() %>% # etc.

3) select and seize_selected are used to assign a letter to a van and to "occupy" it. The argument 1 tells us how much capacity one letter takes. Since our vans were setup to have a capacity of 1, it fills up the van, so to speak. Here, we're saying every letter should be delivered by the first van and for each individual letter to "occupy" the van (mark it busy) until the letter has been delivered.

  # always select van 1
  select(function() {
    "van_1"
  }) %>%
  seize_selected(1)

4) release_selected marks a van as back and ready to work. Here we're saying every letter takes exactly 1 hour to deliver at which point the van will be ready for another letter.

  # time for delivery
  timeout(function() {
    1/24
  }) %>% 
  release_selected(1)

And there we have it! A very basic DES with simmer.

Letter Trajectory - Staff Hours:

Our trajectory will unfortunately be much more complicated than the above. From the get-go, let's record exactly what time our letter was requested. At this point, we're immediately met with huge and important questions: what happens if a letter is requested when our staff aren't working and, furthermore, when are they working? Let's say they work from 9a-5p Monday through Friday. Here's how I set this up and some of the simmer speak I had to use to achieve it:

  # simulation trajectory
  letter = trajectory() %>% 
    
    # initialize - records when the letter was requested
    set_attribute("letter_request", 
                  values = function() {simmer::now(sim)}) %>%
    
    # are our staff working? work 9a-5p Mon-Fri
    timeout(function() {
      time = simmer::now(sim)
      base_day = as.numeric(gsub("\\..*","",simmer::now(sim)))
      time_request = time - base_day

time records the absolute time at the time of the letter request. base_day is, well, the base day and is used to look up the corresponding date with our date key we created. time_request then, is simply the time of day.

Now, if time_request is before 9a in the morning, the letter will need to wait until one of our writers can look at the request. Some simple maths here for that waiting time. What happens now? We need to update our overall time to reflect the new time (9a) and recalculate the base_day as well as the new time. In example, if a letter was requested at 2a on January 3rd, time would now be 3.375 (9a on 1/3/23), base day would still be 3, and time_request will have changed from .083 (2a) to .375 (9a).

      # if before 9a - kick to 9a
      if (time_request < 9/24) {
        timeout = 9/24 - time_request
      } else {
          timeout = 0
      }
      
      # update
      time = time + timeout
      base_day = as.numeric(gsub("\\..*","",time))
      time_request = time - base_day

Next natural question: what happens after 5p when no one is on duty? Same logic as before so I won't go into this as much detail. 

# if after 5p - kick to 9a next day
      if (time_request > 17/24) {
        timeout = (1 - time_request) + 9/24
      } else {
          timeout = 0
      }
      
      # update
      time = time + timeout
      base_day = as.numeric(gsub("\\..*","",time))
      time_request = time - base_day

Notice how I keep the same update logic afterwards. This is because we need to keep a rolling list of delays because there's many more considerations. Do our writers have enough time to write a letter? For example, what happens if a letter request comes in 5 minutes before the workday? Letters in our company typically take between 30 minutes and 2 hours to write so there's simply not enough time that day. Just like with our 5p logic, we'll kick it to the next day:

      # do our writers have enough time (i.e., request at 4:59p)?
      # letters take between 30 minutes and 2 hours to write,
      # skip to next day if less than an hour remains
      if (time_request + 1/24 >= 17/24) {
        timeout = (1 - time_request) + 9/24
      } else{
        timeout = 0
      }
      
      # update
      time = time + timeout
      base_day = as.numeric(gsub("\\..*","",time))
      time_request = time - base_day

What next? Well our staff only work M-F so what happens if a letter is requested on a weekend or is kicked to one with some of our previous logic? Similar process here but we have to look up the day of the week:

      # is it a weekend? 
      # if so, skip to Monday
      what_day = weekdays(datekey[Elapsed_Day == base_day]$Date)
      if (what_day == "Saturday") {
        timeout = (1 - time_request) + 1 + 9/24
      } else if (what_day == "Sunday") {
        timeout = (1 - time_request) + 9/24
      } else {
        timeout = 0
      }
      
      # update
      time = time + timeout
      base_day = as.numeric(gsub("\\..*","",time))
      time_request = time - base_day

Our last consideration for now will be US federal holidays, I'm including them by hand here but there are probably better ways to do this. My main concern with automating this process was things like the "in lieu" of holidays and days of week, etc.

      # is it a holiday? 
      # if so, skip to next available day
      what_day = datekey[Elapsed_Day == base_day]$Date
      if (what_day == as.Date("2023-01-01")) { # new years 
        timeout = (1 - time_request) + 1 + 9/24
      } else if (what_day == as.Date("2023-01-02")) { # new years (in lieu)
        timeout = (1 - time_request) + 9/24
      } else if (what_day == as.Date("2023-01-16")) { # MLK day
        timeout = (1 - time_request) + 9/24
      } else if (what_day == as.Date("2023-02-20")) { # Washington birthday
        timeout = (1 - time_request) + 9/24
      } else if (what_day == as.Date("2023-05-29")) { # Memorial day
        timeout = (1 - time_request) + 9/24
      } else if (what_day == as.Date("2023-06-19")) { # Juneteenth
        timeout = (1 - time_request) + 9/24
      } else if (what_day == as.Date("2023-07-04")) { # Independence day
        timeout = (1 - time_request) + 9/24
      } else if (what_day == as.Date("2023-09-04")) { # Labor day
        timeout = (1 - time_request) + 9/24
      } else if (what_day == as.Date("2023-10-09")) { # Columbus day
        timeout = (1 - time_request) + 9/24
      } else if (what_day == as.Date("2023-11-10")) { # Veterans' day
        timeout = (1 - time_request) + 2 + 9/24
      } else if (what_day == as.Date("2023-11-23")) { # Thanksgiving
        timeout = (1 - time_request) + 9/24
      } else if (what_day == as.Date("2023-12-25")) { # Christmas
        timeout = (1 - time_request) + 9/24
      } else if (what_day == as.Date("2024-01-01")) { # New Years
        timeout = (1 - time_request) + 9/24
      } else {
        timeout = 0
      }
      
      # update
      time = time + timeout
      base_day = as.numeric(gsub("\\..*","",time))
      time_request = time - base_day

Finally, we'll process all of these delays by taking our new overall time and subtracting the simulation's current time from it. Notice how, with the way we've been keeping track of overall time, it's entirely possible to have our time be equal to simmer::now(sim) and thus this entire process could result in a delay of 0. And do note all of this is happening in a single timeout() call.

We'll record how long this initial delay was and the absolute time when it ended, just like how we recorded the absolute time the letter was requested. Notice how I'm getting the duration using get_attribute.

      # process all possible delays
      time - simmer::now(sim)
    }) %>% 
    set_attribute(c("initial_delay_end", "initial_delay_time"), function() {
      c(simmer::now(sim), 
        simmer::now(sim) - get_attribute(sim, "letter_request"))
    }) %>% 

Letter Trajectory - Time to Write & Choosing a Van:

Next off, and far simpler to start, we need to add the time it takes for a letter to be written. Fairly simple here, dividing minutes by 60 then 24 to get back to our day timescale. Again recording attributes. Brief note: I'm assuming we have infinite writers for simplicity (haha - a bad joke) but we could set up writers with a queue, etc. like we have with vans.

    # time for writing (anywhere from 30 minutes to 2 hours)
    # assuming infinite writers here but could set up writer "resources"
    # like vans below
    timeout(function() {
      sample(30:120, 1)/60/24
    }) %>% 
    set_attribute(c("write_end", "write_time"), function() {
      c(simmer::now(sim), 
        simmer::now(sim) - get_attribute(sim, "initial_delay_end"))
    }) %>% 

Next we need to figure out how a letter gets assigned to a van in a manner that allows for flexing. I.E., the van a given letter is assigned to is currently busy but another van is free. How do we re-assign a letter?

Here, we're getting which van the letter was originally assigned to. We're then getting the travel times from each van to the block the letter needs to be delivered to so that we can measure which van is closer if our original assigned van is busy. 

    # select which van to choose
    select(function() {
      
      # get assigned van
      assigned = get_attribute(sim, "Clust")
      
      # get van travel times
      vans = paste0("van_",1:n_vans)
      time_cols = paste0(vans,"_time")
      times = get_attribute(sim, time_cols)
      select_table = data.table(vans, times)
      select_table[, number := gsub("van_","",vans)]

Next, we get the queue count for each van which tells us the number of letters that are in each van's queue. If our assigned van isn't busy, we'll send the letter that way. If it has a queue, we'll send the letter to the van that minimizes both queue count and travel time. Finally, some extra logic in case more than 1 van minimize queue and travel time. 

      # get queue
      select_table[, queue := get_queue_count(sim, vans)]
      
      # if assigned has lowest queue, select it
      # otherwise, select the closest van with lowest queue
      if (select_table[number == assigned]$queue == min(select_table$queue)) {
        select_table[number == assigned]$vans
      } else {
        # select which minimizes Q & TT
        sample(select_table[queue == min(queue), 
                            .SD[times == min(times)]]$vans, 1)
      }
    }) %>% 

Finally, we seize the van we selected. This means the letter is now formally assigned to the van, either the van will start processing it or it will go in the van's queue. I'm recording which van was selected here too.

    seize_selected(1) %>% 
    set_attribute("van_selected", values = function() {
      van = get_selected(sim)
      as.numeric(gsub("\\D","",van))
    }) %>% 

Letter Trajectory - Driving, Delivery, & Reset:

Now that a van has been "seized" in simmer speak, we need to have it print and deliver the letter. We'll just assume the time it takes to print is nominal here (assuming it could be done while en route). What considerations go here?

First, we need to obtain the approximate travel time. Once we know that, we can see if our van can drive to the destination census block, deliver the letter, and drive back to its base position by 5p. This last bit is a little unrealistic, why not have them hop around? This technically can be done in the simmer framework and I have done it; however, it's very complex as if this isn't complex enough so I'm skipping on that one here.

    # time to drive (assuming printing time is nominal here)
    # take travel time and manipulate with uncertainty, adding time for
    # winter travel conditions
    timeout(function() {
      van = get_selected(sim)
      travel_time = get_attribute(sim, paste0(van, "_time"))
      
      # can our driver deliver and reset by 5p? If not skip to 9a next day
      time = simmer::now(sim)
      base_day = as.numeric(gsub("\\..*","",simmer::now(sim)))
      time_request = time - base_day
      if (time_request + 2*travel_time/60/24 >= 19/24) {
        timeout = 1-time_request + 9/24
      } else {
        timeout = 0
      }
      
      # update
      time = time + timeout
      base_day = as.numeric(gsub("\\..*","",time))
      time_request = time - base_day

Notice the similar structure to our writer delays before. In fact, we'll recalculate weekend and holiday delays  just in case bumping to the next day changes that. I won't repeat the code here for brevity's sake (haha - another one). After processing all those delays, we finally have the drive time calculation.

# if in winter travel_time will always be expected, up to twice as long
      # if not, want rough bound of +/- 10%
      what_month = month(datekey[Elapsed_Day == base_day]$Date)
      if (what_month %in% c(12,1:3)) {
        timeout = travel_time * runif(1, 1, 2)
      } else {
        timeout = travel_time * rnorm(1, 1, 0.05)
      }
      time = time + timeout/60/24
      
      # process delays
      time - simmer::now(sim)
    }) %>% 
    set_attribute(c("travel_end", "travel_time"), function() {
      c(simmer::now(sim), 
        simmer::now(sim) - get_attribute(sim, "write_end"))
    }) %>% 

This basically just adds some uncertainty here when it comes to the OSRM drive time and adds potential delays during winter months. As before, processing all the delays in one place and storing the results.

Delivery time here is simple, like with writing time before. We also need to calculate the time it takes for a van to drive back to its base position. We don't have to do the 5p checks, etc. since theoretically we've ensured our van was able to deliver and return by 5p. Notice that we do leave room for it going over there. Finally, after our van has returned, we can "release" it from its service to that letter.

# delivery time, 0.5 - 5 minutes
    timeout(function() {
      delivery_time = runif(1,.5,5)
      delivery_time/60/24
    }) %>% 
    set_attribute(c("delivery_end", "delivery_time"), function() {
      c(simmer::now(sim), 
        simmer::now(sim) - get_attribute(sim, "travel_end"))
    }) %>% 
    
    # reset time, drive back to base placement
    # not realistic but this code is already long for a toy example
    timeout(function() {
      van = get_selected(sim)
      travel_time = get_attribute(sim, paste0(van, "_time"))
      
      base_day = as.numeric(gsub("\\..*","",simmer::now(sim)))
      what_month = month(datekey[Elapsed_Day == base_day]$Date)
      
      # if in winter travel_time will always be expected, up to twice as long
      # if not, want rough bound of +/- 10%
      if (what_month %in% c(12,1:3)) {
        travel_time = travel_time * runif(1, 1, 2)
      } else {
        travel_time = travel_time * rnorm(1, 1, 0.05)
      }
      
      travel_time/60/24
    }) %>% 
    set_attribute(c("travelback_end", "travelback_time"), function() {
      c(simmer::now(sim), 
        simmer::now(sim) - get_attribute(sim, "delivery_end"))
    }) %>% 
    release_selected(1)

And that sums up our trajectory! As you can no doubt see, I cut several corners here and it still ends up being a breezy 283 lines of code. Granted, most of it is really similar in structure so it doesn't take forever to write. Even after the gripes I've discussed with simmer, I've determined that everything you need from DES (at least for a novice like myself) can be done... it's just a matter of whether or not it's worth the effort.

Simulation Run & Summary:

Now that we have the trajectory written, we can finally run the thing:

  # sim setup and run
  sim = simmer() %>% 
    add_resource(paste0("van_", 1:n_vans), 1) %>% 
    add_dataframe("letter ", 
                  letter, 
                  simmer_data, 
                  col_time = "simmer_time", 
                  time = "absolute", 
                  mon = 2)
  sim %>% simmer::run()

We're simulating an entire year here and this takes around 5 minutes to actually run. Jeepers! After it's done, we can get data from our simulation. Here, as I said before, I need to use dcast to get the results in wide format. I'm then setting the column order for better story-telling and adding some columns for our function (remember this is all within one big function). 

  # sim summary
  simulation = sim %>% get_mon_attributes() 
  simulation = as.data.table(simulation)
  simulation = dcast(simulation, name~key, value.var = "value")
  
  setcolorder(simulation, 
              c("name", "ID", "letter_request", "initial_delay_end",
                "write_end", "travel_end", "delivery_end",
                "travelback_end", "initial_delay_time", "travel_time",
                "Clust", "van_selected"))
  
  simulation = simulation[order(letter_request)]
  simulation[, N_Clust := n_vans]
  simulation[, Method := method]

Finally, a quick sanity check to make sure it did what it was supposed to do. I have it commented out here but this was originally in place to check and make sure that the driver reset times (the end of the trajectory) were within reason:

  # quick sanity check
  # simulation[, Time_finished := travelback_end - floor(travelback_end)]
  # View(simulation[Time_finished <= 9/24 | Time_finished >= 19/24])
  
  message(n_vans)
  return(simulation)

Then, in context of our function, I'll just message which cluster solution was completed and return the simulation results.

Parallelization:

All that's left now is to actually run our function on the 27 different solutions, combining into one object and saving. Finally, a pushover notification to let us know that we're done. This takes about 15 minutes to do all 27 simulations on my workstation.

# equalify simulation
equal_sim = future_Map(letter_sim,
                       n_vans = 2:10,
                       MoreArgs = list(
                         monthly_data = monthly,
                         method = "Equal",
                         cluster_solutions = cluster_dat,
                         time_mat = time_mat),
                       future.seed = T)
equal_sim = rbindlist(equal_sim, fill = T)

# pam with time simulation
time_sim = future_Map(letter_sim,
                       n_vans = 2:10,
                       MoreArgs = list(
                         monthly_data = monthly,
                         method = "Time",
                         cluster_solutions = cluster_dat,
                         time_mat = time_mat),
                       future.seed = T)
time_sim = rbindlist(time_sim, fill = T)

# pam with euclidean simulation
euc_sim = future_Map(letter_sim,
                       n_vans = 2:10,
                       MoreArgs = list(
                         monthly_data = monthly,
                         method = "Euc",
                         cluster_solutions = cluster_dat,
                         time_mat = time_mat),
                       future.seed = T)
euc_sim = rbindlist(euc_sim, fill = T)

# save simulations
simulation_dat = rbind(equal_sim, time_sim, euc_sim)
saveRDS(simulation_dat, "R_Objects/simulation_dat.rds")
pushover("All simulations completed!")

Are We There Yet?

As you may have guessed, there's no real easy way to dig in to what we've just done. To go over 27 simulations by hand is... daunting to say the least. So, to do so, I'll be employing R Shiny so I, as well as the decision-makers in our company, can really interact with and get a feel for the results. That'll be the last chapter in our love letters saga. As always, thanks for reading!

Simmering down now,

Garrett C. The Matrix
catlin@hey.com

About Garrett Catlin

Your friendly neighborhood data scientist.