Snowplow Analytics Consulting

One tool among many to transform your business

Our clients Bark&Co as well as Valvoline both utilize Snowplow Analytics as an essential part of their data analytics workflow in conjunction with Amazon’s Redshift for data storage and Looker or DOMO for business intelligence and making sense of their data. When considering implementing Snowplow analytics with your organization you really have to ensure you have all the essential pieces in place. Snowplow serves as a great tool for getting your data from point A to point B, B, in this case, being your database. This is useless however if you don’t have the tools to process this information properly.

Valvoline’s Integration

For our client, Valvoline we created a website with embedded charts from looker that the client can visit and view metrics in a whole host of areas. Some example areas are Content Metrics, Commerce Metrics, User Metrics and KPIs. To achieve this we wrote ten Drupal modules to track site data for Valvoline, 3 modules on the Drupal Commerce pro Valvoline Site, and 7 on the team Valvoline site. we also have a laravel application that runs a custom artisan command daily to ingest CSV data that we get from their other partners. This contains data mostly relating to rebates earned via in-store purchases. This external data is input directly into redshift and bypasses snowplow altogether; this illustrates how much control we have over the data and how the tools bend to our needs we don’t change our needs to fit the tool.

Snowplow for the tracking on the website is just a tool to get this data into Redshift in a sane way but redshift at the end of the day is just a database, so we have the ability to write things to that database ourselves.

Bark & Co’s Integration

For Bark&Co we implemented our snowplow tracking via a WordPress plugin. BarkPost is the site that we wrote the tracking for, but we also have data from the other sites in the Bark&Co network such as BarkBox and BarkShop. With snowplow, we have the ability to track events from multiple domains in the same database and the events are differentiated by a set app_id that is provided by the code tracking the events. So this makes it easy to follow a user’s journey from site to site as an example following a person that just read a story about a dog toy on BarkPost and then proceeded to go to BarkShop and purchase that toy.

Almost the entirety of the BarkPost reader base comes from social networks, so a big thing that we in conjunction with the growth hacker at Bark&Co worked on was looking at the retention of users week over week broken out by the social network. I wrote the query below to get this data.

WITH cohort_dfn_by_refr_channel_acquired_by_week AS (
    SELECT
      domain_userid,
      refr_acquired_medium,
      refr_acquired_source
    FROM (
           SELECT
             domain_userid,
             refr_medium                  AS refr_acquired_medium,
             refr_source                  AS refr_acquired_source,
             rank()
             OVER (PARTITION BY domain_userid, refr_medium, refr_source
               ORDER BY collector_tstamp) AS visit_number
           FROM atomic.events
           WHERE refr_medium != 'internal'
                 AND refr_source != ''
                 AND event = 'page_view'
                 AND useragent != 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
                 AND app_id = 'barkpost-web'
         ) t
    WHERE visit_number = 1
),
    retention_by_week_by_user AS (
      SELECT
        domain_userid,
        rank()
        OVER (PARTITION BY t.domain_userid
          ORDER BY t.weeks_active ASC) AS weeks_active
      FROM (
             SELECT
               domain_userid,
               DATE_TRUNC('week', collector_tstamp) AS weeks_active
             FROM atomic.events
             WHERE app_id = 'barkpost-web'
                   AND event = 'page_view'
                   AND useragent != 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
                   AND refr_medium != 'internal'
             GROUP BY 1, 2
           ) t
  ),
    retention_totals AS (
      SELECT
        first.refr_acquired_medium          AS refr_acquired_medium,
        first.refr_acquired_source          AS refr_acquired_source,
        total.weeks_active                  AS week_rank,
        count(DISTINCT total.domain_userid) AS uniques
      FROM retention_by_week_by_user AS total JOIN cohort_dfn_by_refr_channel_acquired_by_week AS first
          ON first.domain_userid = total.domain_userid
      GROUP BY 1, 2, 3
      ORDER BY 1, 2, 3
  ),
    retention_starting_totals AS (
      SELECT
        refr_acquired_medium,
        refr_acquired_source,
        sum(count(DISTINCT domain_userid)) OVER (PARTITION BY refr_acquired_medium,refr_acquired_source) as starting_total
      FROM cohort_dfn_by_refr_channel_acquired_by_week
    GROUP BY 1,2
  ),
    retention_normalize AS (
      SELECT
        total.refr_acquired_medium,
        total.refr_acquired_source,
        total.week_rank,
        sum(uniques) as uniques,
        start.starting_total,
        SUM(uniques) / sum(start.starting_total) :: Real * 100 as percentage
      FROM retention_totals as total INNER JOIN retention_starting_totals as start on total.refr_acquired_source=start.refr_acquired_source AND total.refr_acquired_medium=start.refr_acquired_medium
      GROUP BY total.refr_acquired_medium, total.refr_acquired_source, total.week_rank,start.starting_total
  )

SELECT *
FROM retention_normalize
ORDER BY 2, 3;

This query is then inputted into looker in the form of lookml files that we then use to build our visualizations. An example of a lookml file we have written is below.

# Copyright (c) 2016 Usterix LLC. All rights reserved.
#
# This program is licensed to you under the Apache License Version 2.0,
# and you may not use this file except in compliance with the Apache License Version 2.0.
# You may obtain a copy of the Apache License Version 2.0 at http://www.apache.org/licenses/LICENSE-2.0.
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the Apache License Version 2.0 is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the Apache License Version 2.0 for the specific language governing permissions and limitations there under.
#
# Version: 1-0-0
#
# Authors: William Wilkerson
# Copyright: Copyright (c) 2016 Usterix LLC
# License: Apache License Version 2.0

- view: weekly_retention_overall
  derived_table:
    sql: |
      WITH cohort_dfn_by_week_first_touch_website AS (
        SELECT
          domain_userid,
          DATE_TRUNC('week', MIN(collector_tstamp)) AS cohort
        FROM atomic.events
        WHERE event = 'page_view'
        AND useragent != 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
        GROUP BY 1),
        retention_by_user_by_week AS (
          SELECT
            domain_userid,
            DATE_TRUNC('week', collector_tstamp) AS weeks_active
          FROM "atomic".events
          WHERE event = 'page_view'
          GROUP BY 1, 2),
        retention_totals AS (
          SELECT
            cohort,
            weeks_active                        AS week_actual,
            rank()
            OVER (PARTITION BY cohort
              ORDER BY weeks_active ASC)        AS week_rank,
            COUNT(DISTINCT (m.domain_userid))   AS uniques,
            COUNT(DISTINCT (m.domain_userid)) / (first_value(COUNT(DISTINCT (m.domain_userid)))
            OVER (PARTITION BY cohort)) :: REAL AS fraction_retained
          FROM cohort_dfn_by_week_first_touch_website c
            JOIN retention_by_user_by_week m
              ON c.domain_userid = m.domain_userid
          GROUP BY 1, 2
          ORDER BY 1, 2
      ),
        retention_normalize AS (
          SELECT
            week_rank,
            sum(uniques)          AS uniques,
            (SELECT sum(uniques)
             FROM retention_totals
             WHERE week_rank = 1) AS starting_total
          FROM retention_totals
          GROUP BY week_rank
      )
      SELECT
        retention_normalize.uniques        AS uniques,
        retention_normalize.week_rank      AS week_rank,
        retention_normalize.starting_total AS starting_total,
        SUM(uniques) / first_value(starting_total)  OVER(PARTITION BY week_rank) :: REAL * 100 as percentage
      FROM retention_normalize
      GROUP BY 1, 2, 3
      ORDER BY 2 DESC


  fields:

   - dimension: week_rank
     type: number
     sql: ${TABLE}.week_rank

   - dimension: starting_total
     type: number
     sql: ${TABLE}.starting_total

   - dimension: returning
     type: number
     sql: ${TABLE}.uniques

   - dimension: percentage_users_retained
     type: number
     value_format: '0.00\%'
     sql: ${TABLE}.percentage

We then were able to go even further with this data and look at the pages viewed in those sessions and put together a ranked list of which pieces of content pulled in the most users from social networks and which ones kept readers coming back. We then also wrote another query to pull a similarly ranked list but broken out instead by the vertical/category of the content to see that articles written under the humor vertical as an example had almost quadruple the amount of traffic of the travel vertical.

In Summary

There’s a lot of information above indeed maybe too much. Our hope is that we have educated you on the implementations of business analytics solutions utilizing tools such as snowplow.

Snowplow Analytics Consulting

One tool among many to transform your business

Valvoline’s Integration

Bark & Co’s Integration

In Summary

Analytics Consulting services