Building Employer Measures
Module 2: Supplemental Notebook
Introduction
This supplemental notebook provides a demonstration of how we can build employer-level characteristics, at the yearly grain, from the Unemployment Insurance (UI) wage records dataset. Our final output from this notebook is a permanent table with employer-level information aggregated to the calendar year for each employer with at least 5 employees in Wisconsin that appears in its UI wage records.
We will start by loading necessary packages not readily available in the base R setup.
As a reminder, every time you create a new R file, you should copy and run the following code snippet.
options(scipen = 999) # avoid scientific notation
library(RJDBC)
library(tidyverse)
library(scales)
library(zoo)
library(lubridate)
And then set up a connection to the Redshift database:
=Sys.getenv("DBUSER")
dbusr=Sys.getenv("DBPASSWD")
dbpswd
<- "jdbc:redshift:iam://adrf-redshift11.cdy8ch2udktk.us-gov-west-1.redshift.amazonaws.com:5439/projects;
url loginToRp=urn:amazon:webservices:govcloud;ssl=true;
AutoCreate=true;idp_host=adfs.adrf.net;idp_port=443;
ssl_insecure=true;
plugin_name=com.amazon.redshift.plugin.AdfsCredentialsProvider"
<- JDBC(
driver "com.amazon.redshift.jdbc42.Driver",
classPath = "C:\\drivers\\redshift_withsdk\\redshift-jdbc42-2.1.0.12\\redshift-jdbc42-2.1.0.12.jar",
identifier.quote="`"
)
<- dbConnect(driver, url, dbusr, dbpswd) con
For this code to work, you need to have an .Renviron
file in your user folder (i.e. U:\John.Doe.P00002) containing your username and password.
Quarterly Measures
We will define each employer as a unique ui_account
value in the UI wage records,, developing the following measures for each ui_account
:
Firm characteristics
- Total payroll
- Number of employees experiencing full quarter employment
- Total employment
Stability
- Number of new hires who become full quarter employees (hired in t-1 and we see in t+1)
- Ratio of full quarter employees to all employees
- Separation rate
Opportunity
- Number of new hires
- Employment growth rate
- Hiring growth rate
Job Quality
- Average earnings per employee
- Average earnings per full quarter employee
- 25th percentile quarterly earnings
- 75th percentile quarterly earnings
Let’s remind ourselves of the range of the UI wage records available for this class:
Note: Some employers change identification numbers over time - due to the limitations of the data available in the ADRF, we are unable to track these employers over time and thus, are treated separately.
# see years and quarters in wage data
<- "
qry SELECT DISTINCT year, quarter
FROM ds_wi_dwd.ui_wage uw
ORDER BY year, quarter;
"
dbGetQuery(con,qry)
Although our final analytic table will be aggregated to the calendar year, we need to first develop these measures for each quarter. To do so, we will first create permanent tables of all the UI wage record information from ui_wage
for all the quarters from 2008-2022. Ideally, we should include information from the two quarters prior to our desired start (2007Q3, 2007Q4) and one after our end (2023Q1) since we need employment, separation, and hiring counts for 2007Q3 and 2007Q4 to calculate growth rates for these measures in 2008Q1, as well as counts for 2023Q1 to calculate these measures in 2022Q4. Since we do not have data for 2023Q1, the 2022Q4 table will be incomplete. Nevertheless, this incomplete table, with the proper framing, can still be useful in our analysis.
We can create a table with all necessary pre- and post-quarter of interest information, which will make it easier to loop through all of the quarters.
Note: We included the table 2022Q4 despite incomplete information for pedagogical purposes in showing uses for an aggregated employer-based table in the Characterizing Demand notebook.
Range of Quarters Function
<- function(start_yq, end_yq) {
create_quarters
# converting start and end quarters to year-quarter format that R can understand
# yq() from lubridate
<- yq(start_yq)
d1 <- yq(end_yq)
d2
# Getting the range between d1 and d2
<- format(seq(d1, d2, by="quarter"), "%Y-%m")
dat
# converting resulting range to a year-quarter format
<- as.yearqtr(dat, "%Y-%m") #from zoo
q_yr_input <- data.frame(q_yr_input)
df names(df) <- c("yr_quarter")
$qyr_req <- paste0(tolower(substring(df$yr_quarter, 6, 7)), "_", substring(df$yr_quarter, 1, 4))
df$title = paste0(df$qyr_req, "_agg")
df<- df %>% mutate(quart = str_sub(title, 2,2))
df <- df %>% mutate(year = str_sub(yr_quarter, 1, 4))
df
return(df)
}
<- create_quarters('2007 Q3', '2022 Q4')
quarter_year
<- quarter_year$yr_quarter
quarters <- quarter_year$qyr_req
quarters_sql_save <- quarter_year$title
quarter_agg_save <- quarter_year$quart
quart <- quarter_year$year
yr
head(quarter_year)
Now that we have our data frame populated with all the quarters needed, we will start creating our measures. Here we are creating measures for every year and quarter.
Creating tables
We use a for()
loop to iterate through the UI Wage records by our quarters of interest. While doing so, we drop duplicated ssn
/ui_account
/quarter
/year
entries, prioritizing the highest wage value in these instances.
Since we are creating permanent tables, we will comment out the code to execute these commands - they will return errors because the table names already exist in the schema.
Note that this is one way to approach duplicate data. There are other ways in handling duplicates, such as summing up the wages.
for(i in 1:length(quarters_sql_save)){
= 'DROP TABLE IF EXISTS tr_wi_2023.%s;
qry
with init_wage as (
select *
from ds_wi_dwd.ui_wage
where quarter = %s and year = %s and ui_account is not null and wage > 0
),
dup as (
select *,
row_number() over (partition by ssn, ui_account, quarter, year order by wage desc) as rownumber_wg
from init_wage
)
select ui_account, ssn, wage, quarter, year
into tr_wi_2023."%s"
from dup
where rownumber_wg = 1
'
# sprintf() allows us to systematically update the character string (SQL query)
= sprintf(qry, quarters_sql_save[i], quart[i], yr[i], quarters_sql_save[i])
full_qry # writeLines(full_qry)
# DBI::dbExecute(con, full_qry)
}
Now that we have created all the tables that contain our UI Wage records by each quarter of interest, we need to now permission those tables to the appropriate groups. The code below does just that.
for(i in 1:length(quarters_sql_save)){
<- '
qry GRANT SELECT, UPDATE, DELETE, INSERT ON TABLE tr_wi_2023.%s
TO group ci_read_group;
GRANT SELECT, UPDATE, DELETE, INSERT ON TABLE tr_wi_2023.%s
TO group db_t00111_rw;
'
= sprintf(qry, quarters_sql_save[i], quarters_sql_save[i])
full_qry # writeLines(full_qry)
# DBI::dbExecute(con, full_qry)
}
Let’s take a peek at a single table to make sure we created it properly:
# see example
= "
qry SELECT *
FROM tr_wi_2023.q3_2015
LIMIT 5
"
dbGetQuery(con, qry)
Our code worked as intended - we can access the table.
Identifying pre- and post-quarter employment
In the process of eventually building out our desired measures, we will create new pre_emp
and post_emp
columns to track if each ui_account
/ssn
combination within a given quarter exists in the wage record table the quarter before and/or the quarter after. This will be important in tracking full-quarter employment, as well as hiring and separation numbers.
# initialize pre and post employment columns
<- c('pre_emp', 'post_emp')
new_cols
for(col in new_cols){
for(i in 1:length(quarters_sql_save)){
='
qry ALTER TABLE tr_wi_2023."%s" ADD "%s" int
'
= sprintf(qry, quarters_sql_save[i], col)
full_qry # writeLines(full_qry)
# DBI::dbExecute(con, full_qry)
} }
Let’s make sure our new columns are added to the tables:
# see example
= "
qry SELECT *
FROM tr_wi_2023.q3_2015
LIMIT 5
"
dbGetQuery(con, qry)
Now that the pre_emp
and post_emp
columns are initialized in each of these temporary tables, we can set these as indicator variables if the ssn
/ui_account
combination that appeared in the UI wage records for the given year/quarter combination also existed in the previous and future quarter.
for(i in 2:length(quarters_sql_save)){
# update this quarter employment flag
='
qry UPDATE tr_wi_2023."%s" SET pre_emp =
CASE WHEN b.wage is null THEN 0 ELSE 1 END
FROM tr_wi_2023."%s" b
where tr_wi_2023."%s".ssn = b.ssn and
tr_wi_2023."%s".ui_account = b.ui_account
'
= sprintf(qry, quarters_sql_save[i], quarters_sql_save[i-1], quarters_sql_save[i], quarters_sql_save[i])
full_qry # writeLines(full_qry)
# DBI::dbExecute(con, full_qry)
}
for(i in 2:length(quarters_sql_save)-1){
# update this quarter employment flag
='
qry UPDATE tr_wi_2023."%s" SET post_emp =
CASE WHEN b.wage is null THEN 0 ELSE 1 END
FROM tr_wi_2023."%s" b
where tr_wi_2023."%s".ssn = b.ssn and
tr_wi_2023."%s".ui_account = b.ui_account
'
= sprintf(qry, quarters_sql_save[i], quarters_sql_save[i+1], quarters_sql_save[i], quarters_sql_save[i])
full_qry # writeLines(full_qry)
# DBI::dbExecute(con, full_qry)
}
To confirm the proper creation of our new columns, we can look at the observation counts by values of pre_emp
and post_emp
.
# see values of post_emp
= "
qry SELECT post_emp, COUNT(*)
FROM tr_wi_2023.q4_2015
GROUP BY post_emp
"
dbGetQuery(con, qry)
# see values of pre_emp
= "
qry SELECT pre_emp, COUNT(*)
FROM tr_wi_2023.q4_2017
GROUP BY pre_emp
"
dbGetQuery(con, qry)
Separation and Hire
Now that we have pre and post-quarter employment indicators for each ssn
/ui_account
combination, we can add hiring and separation indicators into these tables.
<- c('sep', 'hire')
new_cols
for(col in new_cols){
for(i in 1:length(quarters_sql_save)){
='
qry ALTER TABLE tr_wi_2023."%s" ADD "%s" int
'
= sprintf(qry, quarters_sql_save[i], col)
full_qry # writeLines(full_qry)
# DBI::dbExecute(con, full_qry)
} }
Let’s make sure it worked:
<- "
qry SELECT *
FROM tr_wi_2023.q4_2019
LIMIT 5
"
dbGetQuery(con, qry)
We will build our sep
and hire
variables as indicators - if an individual is not employed by the same employer in the previous quarter, they will have a hire
value of 1
. We use similar logic for the sep
column, instead evaluating the next quarter.
for(i in 1:length(quarters_sql_save)){
='
qry UPDATE tr_wi_2023."%s"
SET
sep = CASE WHEN post_emp is null THEN 1 ELSE 0 END,
hire = CASE WHEN pre_emp is null THEN 1 ELSE 0 END
'
= sprintf(qry, quarters_sql_save[i])
full_qry # writeLines(full_qry)
# DBI::dbExecute(con, full_qry)
}
Checking if the table populated correctly:
# look at different values of sep
= '
qry select count(*), sep
from tr_wi_2023.q1_2018
group by sep
order by sep
'
dbGetQuery(con, qry)
# look at different values of hire
= '
qry select count(*), hire
from tr_wi_2023.q1_2018
group by hire
order by hire
'
dbGetQuery(con, qry)
Aggregate by Employer
At this point, we have all the information we need to aggregate on the ui_account
values. We will do these aggregations in separate steps, as they will require separate WHERE
clauses. In the first, we will find values for all measures outside of the full-quarter employee-related ones.
# example of non-full quarter measures query
<- "
qry select distinct ui_account, year, quarter,
count(ssn) over(partition by ui_account, year, quarter) as num_employed,
sum(Wage) over(partition by ui_account, year, quarter) as total_earnings,
sum(hire) over(partition by ui_account, year, quarter) as num_hire,
sum(sep) over(partition by ui_account, year, quarter) as num_sep,
percentile_cont(0.25) within group (order by Wage) over (partition by ui_account, year, quarter) as bottom_25_pctile,
percentile_cont(0.75) within group (order by Wage) over (partition by ui_account, year, quarter) as top_75_pctile
from tr_wi_2023.q1_2015
limit 5
"
dbGetQuery(con, qry)
In a separate table, we can find all of the columns related to full quarter employment.
# example of full quarter measures query
<- "
qry select distinct ui_account, year, quarter,
count(ssn) over(partition by ui_account, year, quarter) as full_num_employed,
sum(Wage) over(partition by ui_account, year, quarter) as full_total_earnings
from tr_wi_2023.q1_2015
where post_emp = 1 and pre_emp = 1
limit 5
"
dbGetQuery(con, qry)
Finally, we need information on these employer’s hiring, employment, and separation numbers for the prior quarter to calculate their growth rates.
# example query finding hiring, employment, and separation metrics
<- "
qry select ui_account, year, quarter,
count(ssn) as num_employed_pre,
sum(hire) as num_hire_pre,
sum(sep) as num_sep_pre
from tr_wi_2023.q1_2015
group by ui_account, year, quarter
"
dbGetQuery(con, qry)
Now that we have all the information we need in three tables, we can join them together based on the ui_account
values.
# example join
<- "
qry with full_q as (
select distinct ui_account, year, quarter,
count(ssn) over(partition by ui_account, year, quarter) as full_num_employed_init,
sum(Wage) over(partition by ui_account, year, quarter) as full_total_earnings_init
from tr_wi_2023.q1_2015
where post_emp = 1 and pre_emp = 1
),
emp as (
select distinct ui_account, year, quarter,
count(ssn) over(partition by ui_account, year, quarter) as num_employed,
sum(Wage) over(partition by ui_account, year, quarter) as total_earnings,
sum(hire) over(partition by ui_account, year, quarter) as num_hire,
sum(sep) over(partition by ui_account, year, quarter) as num_sep,
percentile_cont(0.25) within group (order by Wage) over (partition by ui_account, year, quarter) as top_25_pctile,
percentile_cont(0.75) within group (order by Wage) over (partition by ui_account, year, quarter) as top_75_pctile
from tr_wi_2023.q1_2015
),
tabs as (
select emp.*, full_q.full_num_employed_init,
full_q.full_total_earnings_init
from emp
left join full_q
on emp.ui_account = full_q.ui_account and emp.year = full_q.year and emp.quarter = full_q.quarter
where emp.num_employed >= 5
)
select ui_account, year, quarter, num_employed, total_earnings, num_hire, num_sep, top_25_pctile, top_75_pctile,
case
when full_num_employed_init is null then 0
else full_num_employed_init end as full_num_employed,
case
when full_total_earnings_init is null then 0
else full_total_earnings_init end as full_total_earnings
from tabs
limit 5
"
dbGetQuery(con, qry)
Calculating growth rates
To calculate the hiring, separation, and employment growth rates, we will use the following function from Davis and Haltiwanger (1992) to calculate employment growth rate (emp_rate
), separation growth rate (sep_rate
), and hire growth rate (hire_rate
):
\[g_{et}=\frac{2(x_{et}-x_{e,t-1})}{(x_{et}+x_{e,t-1})}\]
In this function, \(g_{et}\) represents employment/separation/hire growth rate of employer \(e\) at time \(t\). \(x_{et}\) and \(x_{e,t-1}\) are employer \(e\)’s employment/separation/hire at time \(t\) and \(t-1\), respectively. According to Davis and Haltiwanger (1992):
“This growth rate measure is symmetric about zero, and it lies in the closed interval [-2,2] with deaths (births) corresponding to the left (right) endpoint. A virtue of this measure is that it facilitates an integrated treatment of births, deaths, and continuing establishments in the empirical analysis.”
In other words, a firm with a \(g_{et} = 2\) is a new firm, while a firm with a \(g_{et} = -2\) is a a firm that exited the economy.
Why do the two endpoints represent firms’ deaths and births? Calculate the value of \(g_{et}\) when \(x_{et}=0\) and when \(x_{e,t-1}=0\) and see what you find.
In practice, we will apply this formula for every ui_account
unless it experienced no hires or separations in the current and previous quarters, where instead of getting a divide by zero error, we will assign it to 0.
Let’s translate this to a query:
for(i in 2:(length(quarters_sql_save))){
= 'DROP TABLE IF EXISTS tr_wi_2023."%s";
qry
with full_q as (
select distinct a.ui_account, a.year, a.quarter,
count(ssn) over(partition by ui_account, a.year, a.quarter) as full_num_employed_init,
sum(wage) over(partition by ui_account, a.year, a.quarter) as full_total_earnings_init
from tr_wi_2023."%s" a
where post_emp = 1 and pre_emp = 1
),
emp as (
select distinct ui_account, b.year, b.quarter,
count(ssn) over(partition by ui_account, b.year, b.quarter) as num_employed,
sum(wage) over(partition by ui_account, b.year, b.quarter) as total_earnings,
sum(hire) over(partition by ui_account, b.year, b.quarter) as num_hire,
sum(sep) over(partition by ui_account, b.year, b.quarter) as num_sep,
percentile_cont(0.25) within group (order by wage) over (partition by ui_account, year, b.quarter) as bottom_25_pctile,
percentile_cont(0.75) within group (order by wage) over (partition by ui_account, year, b.quarter) as top_75_pctile
from tr_wi_2023."%s" b
),
tabs as (
select emp.*, full_q.full_num_employed_init,
full_q.full_total_earnings_init
from emp
left join full_q
on emp.ui_account = full_q.ui_account and emp.year = full_q.year and emp.quarter = full_q.quarter
where emp.num_employed >= 5
),
joined as (
select ui_account, tabs.year, tabs.quarter, num_employed, total_earnings, num_hire, num_sep, bottom_25_pctile, top_75_pctile,
case
when full_num_employed_init is null then 0
else full_num_employed_init end as full_num_employed,
case
when full_total_earnings_init is null then 0
else full_total_earnings_init end as full_total_earnings
from tabs
),
old_tabs as (
select ui_account, c.year, c.quarter,
count(ssn) as num_employed_pre,
sum(hire) as num_hire_pre,
sum(sep) as num_sep_pre
from tr_wi_2023."%s" c
group by ui_account, year, c.quarter
),
hired as (
select tmone.ui_account,
count(tmone.ssn) as new_hires_fullq
from tr_wi_2023.%s tmone
join tr_wi_2023.%s t on tmone.ssn = t.ssn and tmone.ui_account = t.ui_account
where tmone.hire = 1 and t.post_emp = 1
group by tmone.ui_account
)
select joined.ui_account, joined.year, joined.quarter, joined.num_employed, joined.total_earnings,
joined.bottom_25_pctile, joined.top_75_pctile, joined.full_num_employed, joined.full_total_earnings,
CAST(joined.full_num_employed AS FLOAT)/CAST(joined.num_employed AS FLOAT) as ratio_fullq_total, hired.new_hires_fullq,
case
when (old_tabs.num_employed_pre is null or old_tabs.num_employed_pre = 0) and joined.num_employed = 0 then 0
when old_tabs.num_employed_pre is null and joined.num_employed != 0 then 2
else (2.0 * (joined.num_employed - old_tabs.num_employed_pre))/(joined.num_employed + old_tabs.num_employed_pre) end as emp_rate,
case
when (old_tabs.num_hire_pre is null or old_tabs.num_hire_pre = 0) and joined.num_hire = 0 then 0
when old_tabs.num_hire_pre is null and joined.num_hire != 0 then 2
else (2.0 * (joined.num_hire - old_tabs.num_hire_pre))/(joined.num_hire + old_tabs.num_hire_pre) end as hire_rate,
case
when (old_tabs.num_sep_pre is null or old_tabs.num_sep_pre = 0) and joined.num_sep = 0 then 0
when old_tabs.num_sep_pre is null and joined.num_sep != 0 then 2
else (2.0 * (joined.num_sep - old_tabs.num_sep_pre))/(joined.num_sep + old_tabs.num_sep_pre) end as sep_rate
into tr_wi_2023.%s
from joined
left join old_tabs on joined.ui_account = old_tabs.ui_account
left join hired on joined.ui_account = hired.ui_account
'
= sprintf(
full_qry
qry, quarter_agg_save[i], quarters_sql_save[i], quarters_sql_save[i],-1], quarters_sql_save[i-1], quarters_sql_save[i],
quarters_sql_save[i
quarter_agg_save[i]
)
# write(full_qry, "my_file1.txt", append=TRUE)
# writeLines(full_qry)
# DBI::dbExecute(con, full_qry)
}
We need to give access to the appropriate groups:
for(i in 2:length(quarter_agg_save)){
<- '
qry GRANT SELECT, UPDATE, DELETE, INSERT ON TABLE tr_wi_2023.%s
TO group ci_read_group;
GRANT SELECT, UPDATE, DELETE, INSERT ON TABLE tr_wi_2023.%s
TO group db_t00111_rw;
'
= sprintf(qry, quarter_agg_save[i], quarter_agg_save[i])
full_qry # writeLines(full_qry)
# DBI::dbExecute(con, full_qry)
}
Aggregation to the Calendar Year
Now that we have our aggregations and growth rates calculated, we will combine these into a single yearly aggregation table. We need to create our start and end strings of the query and then we will paste these strings together.
= "
string select *, %s as years, total_earnings/num_employed as avg_earnings,
case
when full_num_employed = 0 then 0
else full_total_earnings/full_num_employed
end as full_avg_earnings
from tr_wi_2023.%s_agg"
= "select ui_account, years,
end_qry count(*) as num_quarters, avg(num_employed) as avg_num_employed,
avg(cast(total_earnings as bigint)) as avg_total_earnings,
avg(bottom_25_pctile) as avg_bottom_25_pctile,
avg(top_75_pctile) as avg_top_75_pctile,
avg(full_num_employed) as avg_full_num_employed,
avg(cast(full_total_earnings as bigint)) as avg_full_total_earnings,
avg(emp_rate) as avg_emp_rate,
avg(hire_rate) as avg_hire_rate,
avg(sep_rate) as avg_sep_rate,
avg(avg_earnings) as avg_avg_earnings,
avg(full_avg_earnings) as avg_full_avg_earnings,
avg(ratio_fullq_total) as avg_ratio_fullq_total,
avg(new_hires_fullq) as avg_new_hires_fullq
into tr_wi_2023.employer_yearly_agg
from wi_comb
group by ui_account, years"
The only part of the query string that needs updating in the loop is the middle. So, here we are creating the middle part of the query then pasting the start and end string of the query to it.
Note: We do not want to include the quarters from 2007 because we do not have four quarters for a full year. So we’ll start with 2008Q1 and proceed accordingly.
= quarter_year$title
quarters = quarter_year$year
yr
= "DROP TABLE IF EXISTS tr_wi_2023.employer_yearly_agg;
start_string with wi_comb as ( "
for(i in 3:(length(quarter_agg_save))){
="select *, %s as years, total_earnings/num_employed as avg_earnings,
query case when full_num_employed = 0 then 0 else full_total_earnings/full_num_employed end as full_avg_earnings
from tr_wi_2023.%s"
= sprintf(query, yr[i], quarters[i])
full_qry if (i == 3) {
= paste0(start_string, full_qry)
start_string
}else if (i == length(quarter_agg_save)) {
= paste0(start_string, '\n UNION ALL \n', full_qry, '\n )', '\n', end_qry)
start_string
}else {
= paste0(start_string, ' \n UNION ALL \n', full_qry)
start_string
}
}
# writeLines(start_string)
# DBI::dbExecute(con, start_string)
And finally, we need to permission the table to all appropriate groups:
<- '
qry GRANT SELECT, UPDATE, DELETE, INSERT ON TABLE tr_wi_2023.employer_yearly_agg
TO group ci_read_group;
GRANT SELECT, UPDATE, DELETE, INSERT ON TABLE tr_wi_2023.employer_yearly_agg
TO group db_t00111_rw;
'
# DBI::dbExecute(con, qry)
Citations
Feder, Benjamin, Garner, Maryah, Nunez, Allison, & Mian, Rukhshan. (2022, December 19). Creating Supplemental Employment Measures using Indiana’s Unemployment Insurance Wage Records. Zenodo. https://doi.org/10.5281/zenodo.7459730