Queues and writing time at the Federal Court (FC Data Part 4)

This is Part 4 of our series analyzing Federal Court decision turnaround times. Readers should consult Part 1, Part 2, and Part 3 before this one.

In this part, I revisit an idea that was mentioned briefly in earlier editions: how to measure judges’ workloads in comparison to decision-writing time.

We will be looking at cases under reserve (judges’ queue of pending cases) at the Court to see how they may affect wait times experienced by litigants.

All data sources and caveats remain the same as before unless otherwise noted.

Querying the queue

When we previously discussed judicial workloads (Part 1), we looked at the number of cases decided by the Court and by each judge in any given year, which served as a proxy for how busy each judge was. This was a reasonable first pass at analyzing how workload might affect writing time, but it came with some limitations.

In particular, the number of decisions written by a judge is an output of the decision-writing process itself, rather than an input. It is necessarily endogenous, meaning it is not a reliable signal of causality. I also addressed this briefly in Part 3 — given the data available, we do not have many true independent predetermined variables to use as predictors, making it difficult to separate causation from correlation.

After doing some further processing of the data, we can now assess another interesting datapoint, which is the number of decisions under reserve at any given time.

We can measure this as follows: since we have hearing dates and judgment dates, we can determine the number of decisions under reserve for a given judge at a given date by counting the number of decisions that have been heard but not yet issued. For each decision, we can then look backward and see how many other decisions a particular judge had under reserve (or their “queue size”).

In other words, we can calculate how many other decisions need to be dealt with concurrently by the judge at the hearing of any particular matter. The queue size is calculated per-decision-per-judge, so there is a valid “queue size” value for each decision.

Intuitively, queue size is helpful in understanding both a judge’s workload (how many other decisions need to be made at the same time) and on their concurrent cognitive burden.

Queue size is a useful statistical measure because it is independent of case type and output for a given case, and it is a fixed quantity before any given decision is written. Therefore, we can be more confident of the direction of causality. Further, since this metric is quantifiable for each decision, we have a large sample size to work with (n > 10,000).

(Note that this does not yield a “live” calculation of current cases under reserve, since I am using a fixed collection of past decisions from 2016 to 2026 for which we have extractable hearing dates and judgment dates. To avoid artifacts from out-of-range hearing and judgment dates, I have cut off a year from the end of the data.)

Queue size and trends

The queue size of each judge has not changed much over time. On average, each judge has about 4-6 other cases under reserve when they hear a new case, and this has stayed the same for the past 6 or so years. (Figure 1)

Queue size and turnaround time

A visualization of the “queue size” of the hearing judge versus median turnaround time for each decision paints a picture of association between the two variables (Figure 2)

In general, the more cases a judge has under reserve at the time of a hearing, the longer it takes to issue that judgment. This association is highly statistically significant (p < 0.001) across all non-PMNOC cases and has a moderate positive correlation.

PMNOC cases (not shown) are artificially constrained by the need to issue decisions within the 24-month statutory stay and it is not surprising that there was no statistical significance (p=0.832) found against queue size for these cases. This proves a parenthetical point made earlier, in that it appears judges do in fact favour issuance of PMNOC decisions over other cases in their queue, when possible.

The strongest association was found in immigration cases (Spearman ρ = +0.49, a moderately strong positive correlation). An immigration hearing before a judge with 1 other case under reserve at the time of the hearing yields a decision in 8 days, while a hearing before a judge with 11+ cases under reserve takes 99 days on average.

Although this is probably intuitive, we should not necessarily take it as a given.

Inter-judge vs. within-judge queue sizes

One major issue in using queue size across all decisions is that individual judges’ inherent writing style and time is a confounder. If a judge takes longer to write decisions, all else being equal, we should expect that their number of reserved decisions will accumulate, leading to a larger average queue size than for a faster writing judge.

Put another way, one could object that a smaller inter-judge queue size is the result of faster writing, so the observed effect on turnaround time is attributable to the specific judge assigned to the case rather than on how many decisions that judge has under reserve.

However, we can look at the within-judge data to see whether queue size for any given judge is associated with longer writing time, which isolates inter-judge queue-size effects. Below (Table 1) is a table showing the association between queue size and turnaround time for each judge (anonymized and ranked from the highest degree of association to the lowest). The statistically significant values have been bolded.

Judge	N (decisions)	ρ (Spearman rank)	p-value	Statistical Sig.	Median Time (d)	Mean Time (d)
1	162	+0.597	0.0000	p < 0.05	48	54
2	243	+0.428	0.0000	p < 0.05	161	141
3	165	+0.394	0.0000	p < 0.05	118	118
4	279	+0.393	0.0000	p < 0.05	167	146
5	131	+0.383	0.0000	p < 0.05	16	20
6	59	+0.368	0.0041	p < 0.05	21	25
7	204	+0.351	0.0000	p < 0.05	39	74
8	68	+0.294	0.0148	p < 0.05	12	23
9	418	+0.288	0.0000	p < 0.05	13	23
10	167	+0.280	0.0002	p < 0.05	85	106
11	420	+0.267	0.0000	p < 0.05	55	59
12	325	+0.247	0.0000	p < 0.05	48	55
13	212	+0.237	0.0005	p < 0.05	41	61
14	195	+0.197	0.0059	p < 0.05	51	57
15	88	+0.197	0.0660	NS	6	10
16	318	+0.194	0.0005	p < 0.05	7	11
17	111	+0.191	0.0451	p < 0.05	28	42
18	202	+0.189	0.0071	p < 0.05	168	126
19	196	+0.174	0.0149	p < 0.05	6	11
20	219	+0.163	0.0160	p < 0.05	110	118
21	261	+0.162	0.0087	p < 0.05	25	35
22	136	+0.158	0.0670	NS	50	62
23	319	+0.158	0.0047	p < 0.05	50	77
24	213	+0.157	0.0221	p < 0.05	15	57
25	134	+0.147	0.0904	NS	4	19
26	273	+0.146	0.0158	p < 0.05	211	212
27	342	+0.130	0.0165	p < 0.05	70	89
28	149	+0.127	0.1231	NS	43	51
29	256	+0.120	0.0549	NS	22	33
30	132	+0.119	0.1737	NS	9	20
31	82	+0.107	0.3409	NS	9	14
32	155	+0.102	0.2051	NS	33	59
33	84	+0.100	0.3646	NS	19	36
34	360	+0.090	0.0893	NS	14	28
35	197	+0.090	0.2094	NS	43	71
36	35	+0.090	0.6071	NS	4	9
37	220	+0.085	0.2092	NS	44	89
38	220	+0.084	0.2128	NS	16	28
39	255	+0.073	0.2422	NS	21	62
40	94	+0.068	0.5173	NS	42	70
41	280	+0.040	0.5086	NS	9	22
42	161	+0.029	0.7104	NS	118	118
43	58	+0.029	0.8306	NS	141	130
44	34	+0.026	0.8829	NS	22	36
45	290	+0.007	0.9087	NS	13	21
46	45	+0.004	0.9818	NS	6	13
47	213	−0.001	0.9887	NS	21	58
48	108	−0.029	0.7632	NS	17	50
49	135	−0.030	0.7312	NS	30	53
50	133	−0.058	0.5076	NS	27	62
51	266	−0.069	0.2635	NS	6	18
52	103	−0.073	0.4663	NS	20	65
53	193	−0.089	0.2173	NS	21	55
54	35	−0.100	0.5693	NS	19	38
55	154	−0.110	0.1727	NS	50	57
56	299	−0.113	0.0511	NS	183	167
57	74	−0.126	0.2856	NS	20	28
58	196	−0.156	0.0286	p < 0.05	21	42
59	146	−0.167	0.0438	p < 0.05	15	63
60	72	−0.214	0.0705	NS	35	53
61	39	−0.314	0.0514	NS	9	29
62	57	−0.317	0.0161	p < 0.05	7	23

This can also be visualized in a “volcano plot” as follows, where judges above the dashed line are those meeting statistical significance, and the X-axis (left/right) shows the magnitude of the correlation. (Figure 3)

For just under half the judges on the Court, queue size is a statistically significant positive predictor of turnaround time. For these judges, when their queue is larger on the date of a hearing, they take longer to issue a decision. Conversely, for the other half of the Court, writing time does not seem to vary significantly based on queue size.

This data could result in competing interpretations.

On the one hand, positively, the data support the idea that keeping a “lean queue” demonstrably improves decision efficiency for many judges. The positive correlations among a significant number of judges suggest that there are efficiency gains to be made through active queue management.

On the other hand, the fact that there is no association for half the Court would suggest that inherent writing speed dominates turnaround time for these judges, regardless of queue size.

Indeed, there is a wide variability of “average queue size at hearing” across the bench, as shown in the below box plots. (Figure 4)

The overall unevenness of the judicial queue is a little surprising. There are judges who maintain queues of up to 15-20 cases while others barely have any (it should be noted that some of the judges towards the short end were newly appointed at the end of the data set and thus could not have had much time to accumulate a large queue, and several others at the short end were also supernumerary judges with a lower case load for a material portion of time).

While we do not have much insight into the Court’s internal metrics or docket management, it is likely judges have a lot of independent discretion on how to manage their own dockets and workload.

The variability gap identified here may represent an opportunity. I believe the existence of a correlation between queue size and writing time for some—but not all—judges suggests that there are practices employed by certain judges that could be adopted more widely across the Court. Perhaps additional inter-judge sharing of best practices can reduce variability so that litigants’ wait times are less dependent on the judge assigned to their case.

If this is correct, there can potentially be efficiency gains through more active judicial management and leveling of queue sizes. Tools that are available may include implementing project management practices, providing more real-time insight to the judicial administrator on queue sizes and judicial workload at the time of case assignment, and better load distribution across judges. (It is possible that the Court is already implementing some of these practices, though we do not have much visibility in the Court’s internal operations.)

Summary of factors affecting turnaround time

We have now looked at a number of factors that may affect decision-writing time using publicly available Federal Court data. The dataset allowed us to examine several interesting dimensions, including judicial output, queue size, subject matter, and complexity.

Some of these factors appear to matter more than others. In particular, case complexity and queue size appear to be meaningful (but conditional) predictors of writing time, and we also saw how turnaround time varies by subject matter.

I began this series out of an interest in testing current-generation AI tools to visualize legal data, but the project took on a bit of a life of its own.

To address feedback from various sources, a more rigorous multivariable regression analysis must be done before drawing formal conclusions. That level of analysis goes somewhat beyond the scope of this series, but it would be a useful extension of the work.

I had not originally intended to undertake this analysis – key assumptions and limitations in the data and methodology were set out earlier and, where possible, included statistical significance values (or, just as importantly, the absence of statistical significance was noted) so the results could be interpreted with the appropriate caution.

It bears repeating that the explanatory power of this dataset is necessarily limited by likely confounders and the absence of strong causal controls. As a result, the factors examined here cannot fully explain the observed inter-judge variation in turnaround time.

Generative AI disclaimer: The below section was created with the aid of generative artificial intelligence, specifically, Claude (Sonnet 4.6) was tasked with identifying both the statistical methodology and in validating the data for analysis.

Nonetheless, for completeness, I also used generative AI to assist with a regression model estimating the relative contribution of each factor. In short, after controlling for the variables available in the dataset, the available predictors account for about 31% of the observed variation in decision turnaround time. The 69% remaining unexplained variation likely reflects a combination of unquantified factors and individual judicial practices or style.

A forest plot of the regression coefficients for each factor is shown below, together with share of variance illustrated, grouped according to type of variable (Figure 5).

In the statistical model used for our data analysis, complexity accounts for about 11% of variability, queue accounts for 19%, other factors account for 1%, and the rest unexplained.

For the sake of completeness (and to anticipate notes from the statistically inclined), I add that it is possible to control for the observed variables (complexity, queue depth, category, and year) to obtain a per-judge residual that would illustrate the difference between expected time based on our modelled predictors versus actual turnaround time, which would represent the inherent inter-judge variability in writing time. However, given that such data is not actionable and because there may be other unanalyzed variables at play, I have not included it here.

Upcoming

Through this now four-part series, we have been able to visualize and analyze some interesting data on decision turnaround times at the Federal Court.

I think we have managed to exhaust most of the available information at this moment (and perhaps more importantly I have probably exhausted the readers). Though there are always more ways to slice the data, further advancement of tools and data availability may enable deeper analytics in the near future.

This series is about to be wrapped up for now. I have only one more part planned, which is a brief look at macro trends rather than specific analysis on turnaround times, so it will be more in the nature of “bonus content” than like the articles so far.

As always, if you have any further ideas or feedback, please share them with me and I will endeavour to respond.

Ordinary Skill