Presenter Pattern

Hey guys, today I wanted to talk about something I encounter during my job. The presenter pattern.

What is a presenter ?

The presenter is a pattern allowing you to format / organize your data. This is super helpful in case where you have plenty of variables that should be displayed depending on certain conditions. Here is a basic implementation of it :

controller.rb

  def usage
    data = model.retrieve_data(params)
    render status: :ok, json: { Presenter::AnalyticsPresenter.new(data).present.to_json }
  end

and here is the code of the presenter itself :

Presenter.rb

module Presenters
  class AnalyticsPresenter
    include AnalyticsHelpers

    attr_reader :data
    attr_reader :json

    def initialize(data)
      @data = data
      @json = {}
    end

    def present
      json['usage'] = @data['usage']
      json['analytics_v2'] = @data['analytics_v2'] if analytics_v2?
      json['click_analytics'] = @data['click_analytics'] if click_analytics?
      json
    end
  end
end

I believe the code is pretty explicit. Notice that I included analyticsHelpers which give me access to analytics_v2? and click_analytics?. I believe this pattern should be included in all rail application rather than put the condition logic to display the data or not in the model.

Web - Tabnabbing

At my job one day, I discover some weird html attribute that I never saw before : rel="noopener noreferrer". After making some research on the topic, I realize this was a protection against phishing and the target=_blank attribute. Let me explain that to you.

Nabtabbing

The previous protection is against a security issue call tabnabbing. To describe you more how this works, let’s take the following scenario :

You are on twitter and someone posted an exciting tweet about the new trendy tech that GraphQL is.
You click the link and it opens it in a new tab. (Thanks to the target="_blank" attribute)
You read the article, you are happy to learn why this is an amazing tech, you get back on twitter, but … weird they ask you your password. “Ok I might have been disconnected because of cache or session issue, whatever here is my password”

You have been just phished. This was a fake website and you give it your twitter’s password.

How nabtapping work technically ?

The link on twitter that you open in a new tab was containing the following piece of code :

window.opener.location="http://phishingtwitter.com"

According to W3C :

The opener property returns a reference to the window that created the window

What this mean is that through a link that you click and open in a new tab, the newly opened tab (children) has control over the parent tab (twitter) and can thus redirect it transparently to a phishing website using the location method.

This is currently working on up-to-date major browsers like Chrome.

How to prevent the security issue ?

To prevent the control of the parent tab through the children one, just the add this to your link :

rel="noopener noreffer"

This HTML code prevent from accessing the window.opener object, and also actually ask the browser to set his referrer to “no-reffer” (see here)

If you want to experiment the attack, I recommend going on this website

A performance impact

When I first encounter this and understand to security issue, I was against this idea because the link containing the target="_blank" are not user’s link but link that are pointing to our official website, so I was pretty confident no script would run to change the location property.

That having said, adding this attribute are always a good idea because they are also an optimisation for certain browsers.

From Jake Archibald’s website :

A Javascript running on one domain name runs on a different thread to a window/tab running another

Using target="_blank" the new created window will run in the same process && thread; Using rel="noopener" prevent from using window.opener so the browser won’t use the same threads to open the new tab.

After having experiment this performance impact here I see it only working for Safari and Firefox. Chrome sounds to have solved this issue as well as Chromium.

Sources

Ruby - Active Record - Query optimization

All vs find_each

All

Immagine you have a database with a table users with one hundred thousand records. If you use the all method of active records, basically what will happen is the ORM will run the following query :

SELECT "users".* FROM "users"

Just so you know it took 3.5ms to execute this query for 2000 users. The second thing you have to know is that the memory storing these users took 96 bytes, which is okay for 2000 users, but what about 500 000 posts of theses users ? (500 000 * 96) / 2000 = 0.22mo. You can reduce the memory needed for this query by using batches in the find_each function. The garbadge collector will then clean the memory while you process other records.

find_each

Let’s take a look at rails source code in order to analyze the behavior of the function. Basically, find_each has two behaviors based on wether we pass a block or not.

Here is the source code :

def find_each(options = {})
  if block_given?
    find_in_batches(options) do |records|
      records.each { |record| yield record }
    end
  else
    enum_for :find_each, options do
      options[:start] ? where(table[primary_key].gteq(options[:start])).size : size
    end
  end
end

If we don’t pass any block, the code will return a enumerator which will iterate by calling find_each (again). when you call each on the enumerator. The block is given to compute the size of the enumerator lazyly.

Otherwise if a block is given, then we call find_in_batches. Basically calling find_in_batches yield an array of records (a batch). The find_each use this function to iterate over each records of the batches and yield for each. In other words, when you call find_each you get as parameter one records of the batch while using find_in_batches the parameters contain an array of records i.e a entire batch.

Let’s quickly look at the implementation of find_in_batches :

def find_in_batches(options = {})
  options.assert_valid_keys(:start, :batch_size)

  relation = self
  start = options[:start]
  batch_size = options[:batch_size] || 1000

  unless block_given?
    return to_enum(:find_in_batches, options) do
      total = start ? where(table[primary_key].gteq(start)).size : size
      (total - 1).div(batch_size) + 1
    end
  end

  if logger && (arel.orders.present? || arel.taken.present?)
    logger.warn("Scoped order and limit are ignored, it's forced to be batch order and batch size")
  end

  relation = relation.reorder(batch_order).limit(batch_size)
  records = start ? relation.where(table[primary_key].gteq(start)).to_a : relation.to_a

  while records.any?
    records_size = records.size
    primary_key_offset = records.last.id
    raise "Primary key not included in the custom select clause" unless primary_key_offset

    yield records

    break if records_size < batch_size

    records = relation.where(table[primary_key].gt(primary_key_offset)).to_a
  end
end

We find again the same logic when we don’t pass any block to the function : return an enumerator and set the size using a SQL query: total / batch_size.

According to this post and my experience digging in the code what happen is first we set LIMIT and ORDER parameter on the SQL queries that will be made. This is done through this code :

relation = relation.reorder(batch_order).limit(batch_size)

batch_size is set from the options you gave or default to 1000 batch_order is using reflection to set the SQL query :

def batch_order
  "#{quoted_table_name}.#{quoted_primary_key} ASC"
end

All of this result in have a query like this one :

Users Load (1.8ms)  SELECT  "users".* FROM "users" WHERE ("users"."id" > 2000) ORDER BY "users"."id" ASC LIMIT $1  [["LIMIT", 2000]]

We see the ORDER BY “users”.”id” ASC which come from the previous method and set the limit. By doing that we assure that users id will be fetch in the good order (asc) so that we can fetch all the records by batch.

Finally for the loop condition what happen is active records fetch the records by batches. The first time through the start variable, the second times using the id of the last records fetched in the batch. It then yield the records.

If the number of records fetched is inferior to the number of records in the batch, it mean that we fetched all the records and we are done. We break the loop.

Funnily enough if this is equal we can still be done. If I have 2000 users and ask for a batches of 1000 users. find_each will do 3 query the one from 0 - 1000; then 1000 - 2000; and 2000 - 3000. The last one will result in no records and we will break the loop.

Conclusion

When we don’t pass any block to find_each or find_in_batches those will return an enumerator. It will contain the size of records. (Nice because we don’t need to iterate over it to recompute it again)
find_in_batches yields with an array of results i.e the batch. find_each yield for each batch that find_in_batches yields. Nice implementation.
find_in_batches (and thus find_each too) use ORDER BY and LIMIT to get an ascending order of the records and select them only by batch (using LIMIT keywords). For this it select the last id of the previously fetch records and fetch the new batch of records by getting those superior to this id.

From what I computed using all will not take as much memory as I thought however, we can easily end up in a situation where the call to this function can take 3-4 seconds (SELECT * FROM myTable), in this case it might be better to use find_each (SELECT * FROM mytable ORDER BY fields.id ASC LIMIT myBatch) because of the segregation of the records by batch, basically querying less to be faster.

It’s funny to see that the variable pass to find_each are the one used for the SQL query.

Rspec - Stub vs Mock

RSpec is a DSL made in Ruby. We will focus on two major detail of RSpec : Stub & Mock

For not beeing to ruby’s specific let’s see the difference between both generally. You can use the stub to override the behavior of certain function to return specific value and spy if the function was called or not.

This post of stackoverflow has a good explanation on the flow that each a stub and a mock should follow.

Stub

For the stub, you basically initialize -> exercice -> verify. According to Martin Fowler’s website stub provide canned answers to calls made during the test.

Now, let’s briefly talk about spy. According to Martin Fowler’s website

Spies are stubs that also record some information based on how they were called.

So basically a spy is a stub; at least for how the flow should happen, but you use a stub to override the behavior of a function while a spy is counting the number of time a method has been called. I don’t think it’s relevant to make a dedicated part for spy, because instead of using have_received you simply expect to eq.

Let’s take some ruby code now :

class Human
  def run(n)
    n.times { p "Running..." }
  end
end

RSpec.describe do

  let(:human) { Human.new } # Initialize
  before do
    allow(human).to receive(:run)
    human.run(1) # Exercice
  end

  it { expect(human).to have_received(:run).with(1).once } # Verify
end

Here we initialize the stub, we act by running human.run and finally we assert through the expect keyword. When I said the stub is more permissive above, it mean that if we stub the call to run but remove human.run and expect(humain).to have_received(:run)... the test will still pass.

Mock

Here is the flow for a mock: initialize -> set expectations -> exercise -> verify

RSpec.describe do
  let (:human) { Human.new } # Initialize
  before { expect(human).to receive(:run).with(1).once } # Set expectation and will verify (after exercice)

  it { human.send(:run, 3) } # Exercice
end

In a mock you specify the expectation before you exercice / act.

We set expectation and verify on the same line meaning that if your function is not called the mock will throw an error on RSpec. Wait ? How is that possible ? I found this link on stackoverflow which explain that receive will process if something happen in the future, whereas have_received (used for stub) analyze what has happened in the past.

Using mock aren’t natural but truely use the power of RSpec.

You see how less line it takes ? And those two test using stubs and mock are checking the exact same thing. Now the difference with the stub is that, with a mock the test will fail if we don’t call human.run. When using expect the test needs to be validated, i.e receiving run whereas with a stub, will only override the function and return a specify value (if we specify one) but it did not care if the method was actually call. (Except if we expect to have_received a function)

When to use one or the other ?

Generally, a stub will be used when you want to override a method which call method that you can’t always execute in a test environment, for example, let’s say you use an external librairy which make REST call to an API, but need to be authenticate to make these calls, instead of authenticated yourself, you just stub the method and specify a static json as return value. This is way simpler to do this and expect a result of an object which takes this json as input.

The mock can be used to check if the method was called and will throw an error if this is not the case (method not called).

Conclusion

I spent few days reading on mock, stub, spy, I think it’s important to know the concept, but at the end, stub and spy are pretty the same thing, and let’s be honest you won’t remember the difference during all your software engineer career, because they have the same flow, so not important. However, what’s important is the difference of flow between a stub and a mock. In the stub you initialize, run and then expect some result while in a mock, you initialize, set your expectation and then you exercice and verify.

Ruby - reduce / inject function

Reduce function

The reduce function reduce an array into a single value. It iterates through the array while having a accumulator variable (memo). This variable contain the return value of the block given as parameter for each elem of the array. The block contain as param the element itself and the accumulator value.

[1] pry(main)> a = [1, 2, 3, 4, 5]
=> [1, 2, 3, 4, 5]
[2] pry(main)> a.reduce {|elem, accumulator| accumulator += elem}
=> 15

Here we simply add all the element of the array into the accumulator variable. So on the first loop we have :

accumulator = 1 (It takes the first value of the array if we don’t specify one)
elem = 2
2 + 1

Then for the second round of loop :

accumulator = 3
elem = 3
3 + 3 = 6

And so on.

The true idea with reduce is that the accumulator variable (memo) is reassigned each time in the loop by the operation we specify in the block.

a = [2, 3, 4]
=> [2, 3, 4]
[4] pry(main)> a.reduce {|elem, accumulator| accumulator * elem}
=> 24

accumulator = 2
elem = 3
2 * 3 = 6
accumulator = 6

Then

accumulator = 6
elem = 4
6 * 4 = 24
accumulator = 24

Then it return the accumulator (memo) variable.

One interesting thing is that the reduce function can take as an argument a default value for the accumulator variable. If not specified then, the function will take the first value of the array as the accumulator variable and directly begin with the second value of the array to execute the block.

We can pass a default accumulator variable, here 0, like this :

a = [2,3,4]
a.reduce(0) {|elem, sum| sum * elem} # 24

The reduce function with symbol

The reduce function can also take a symbol. It will iterates over the array and apply an operation.

[5] pry(main)> a = [2, 3, 4]
=> [2, 3, 4]
[6] pry(main)> a.reduce(:+)
=> 9

It’s exactly the same as writing a.reduce {|elem, sum| sum += elem}. But I guess it’s more readable to use a symbol.

Recoding reduce 🧙‍

If you read my previous article about map, there’s not really any magic here : we call a proc on each elem and pass as argument the elem itself and the accumulator variable.

A basic implementation of reduce look like this :

def reduce(array, &block)
  accumulator = 0
  array.each do |elem|
    accumulator = block.call(accumulator, elem)
  end
  accumulator
end

Again the only thing that change from the each method is that we pass a second variable to the proc (accumulator) which is then reassigned as the return value of the proc.

Let’s now look at a more robust version of reduce, which can handle symbol using the open class technique on the Array class.

class Array

  def reduce(accumulator = nil, operation = nil, &block)
    if operation && block
      raise ArgumentError
    end

    if operation.nil? && block.nil?
      operation = accumulator
      accumulator = nil
    end

    block = begin
      case operation
      when Symbol
        lambda { |acc, value| acc.send(operation, value) }
      when nil
        block
      end
    end

    if accumulator.nil?
      accumulator = first
      skip_first = true
    end

    index = 0

    self.each do |elem|
      unless skip_first && index == 0
        accumulator = block.call(accumulator, elem)
      end
      index += 1
    end
    accumulator
  end
end

The operation variable contain a symbol, it is optionnal. The accumulator is the memo value which stock the return value of the proc at each round of the loop. This is this value that the function will return.

Two important things : the behavior of the function make a case on the operation variable to either execute a the given block or the symbol.

acc.send(operation, value) is where the magic happen with the symbol as parameter. This snippets will be called on each iteration of the loop to increment the accumulator value.

For example with [1, 2, 3, 4, 5].reduce(:+) This is what going on on the first round of loop :

accumulator = 1
operation = :+
value = 2
1.send(:+, 2) will return 3

Then on the second round of loop :

accumulator = 3
operation = :+
value = 3
3.send(:+, 3) will return 6

And so on.

The second important thing is the skip_first and index variable. Basically, it sum up the fact that if you don’t specify any default accumulator value, then accumulator will take the first value of the array (accumulator = first) and we begin the loop with the second value of the array. We skip the first round of the loop.

We reach the end of this articles about the reduce function.

As a conclusion, the reduce function and the inject are aliases according to the ruby doc :

The inject and reduce methods are aliases. There is no performance benefit to either.

Sources :

Older Newer

Oh sorry Iamfrench

Presenter Pattern