A recent paper, “Quantitative Translation of Dog-to-Human Aging by Conserved Remodeling of the DNA Methylome” in the journal Cell Systems has been getting attention in the media for a finding a new dog-age-to-human-age formula to replace the popular 7x formula. The paper estimates epigenetic age by measuring changes in DNA samples of dogs and humans of different ages. The new formula is not so easy to remember or compute: 31 + 16 × ln(dog years). It’s hardly news that the 7x formula is inaccurate, and the new equation seems plausible since it does incorporate the common understanding that dogs mature much faster than humans through adolescence and only somewhat faster during adulthood. However, looking closer at the study raises some questions about the final formula, and I’ll try to explore that in this post.

This WebMD dog-age–calculator article summarizes the previous expert knowledge of dog aging rates. I’m too lazy to find the original source; WebMD cites Purina, Humane Society and National Pet Wellness Month as sources for its table. The table has different data from small, medium and large dogs. Since the paper focused mainly on Labrador retrievers, I’ll use the WebMD data for medium-sized dogs and graph it against the other two models: the 7x rule and the epigenetic logarithmic model from the paper.

The 7x rule does a good job as a simple approximation of the relationship in the WebMD table, especially for the mid-life range. However, the epigenetic age model is quite different in that region.

As a long-time dog owner, I have other doubts about the epigenetic model. It puts a 1 year old dog on par with a 31 year old human. While 1yo dogs are about full height, they still have some bulking up to do and their brains still seem adolescent. At the other end, there is a big physiological difference between a 10yo dog and a 15yo dog, but not much difference according to the epigenetic model. The most obvious explanation is that epigenetic age is not quite the same as observed physiological or mental maturity; however, the paper claims correspondence. Time for a deeper dive into the data.

### Getting the data

What data?~~ As far as I can tell the raw data is unavailable, even at the hosting Ideker Lab at UCSD.~~

*I’ve since learned from the author that the data is available at https://zenodo.org/record/3864683. It’s 6GB, so will take me a while to explore but wanted to go ahead and add that correction to this post.*

However, a resourceful person can glean a lot from the images in a paper. For the study, part of the analysis involved matching up similarly-aged dogs and humans. I’m imaging the epigenetic age has a high dimensional vector, so the matching is not trivial. They used the average of several nearest neighbors instead of a one-to-one match, which seems reasonable. The supplemental materials include a few graphs of different numbers of nearest neighbors. Here’s their plot after matching each dog with its three nearest humans.

Other charts show more neighbors averaged, but they aren’t much different. Presumably using fewer neighbors will have more variation, which will make it easier for me to distinguish the points and will be variation will handled by any modeling. To read the position of the dots, I used an online tool called WebPlotDigitizer, which lets me click on each point and get a table of numbers out. It also has some auto-detection methods, but I haven’t had much luck with those. Here’s the result with my clicks represented as red dots.

There is still a lot of overstriking and it’s hard to be sure I got them all. I tried a few diagnostics, even modeling the shade of green that transparent overlaying would produce for a given number of overlaid dots. I found a couple dots I missed but still couldn’t figure out why I only found 92 dots while the paper mentioned 104 dogs. Finally, I saw in the paper that 9 dogs were excluded for incomplete data, and I noticed from the list of dog ages that I likely undercounted the number of dogs in that lower left blob around the origin. So I added 2 there, which brings my total to 94 dog-age-human-age pairings.

Here’s my data (gray dots) plus a smoother (gray line), bootstrapped confidence interval (gray region) and the paper’s epigenetic model (blue line).

A few oddities come to light from the graph:

- The variation is much higher for young-aged dogs: some very young dogs get matching with very old humans. And some 2-3yo dogs get matched with human infants.
- The smoother makes it looks like the underlying model is linear up to about 6 years and then levels off, which is surprising.
- The smoother is by definition constrained to be smooth, and the log curve is even more constrained. Maybe it’s too constrained here since it’s not following the middle ages or the smoother confidence interval.

### Human ages

I think a big part of these oddities is the distribution of human ages, which again we have to infer from the graphs. The paper only says they had 320 human from 1yo to 103yo. Looking the their plot of dog age by human age, there is one dot per human. Too much overstriking to digitize the values this time, but it’s clear that the ages are strongly bimodal with relatively fewer subjects in the 10yo to 50yo range.

Digging a little deeper, the paper cites two sources for the human data. One source has data for humans age 17yo and younger, and the other is mostly older humans, as shown by this histogram from that paper:

So with few mid-life humans to match with, it’s not surprising that mid-life dogs would match to young or old humans (but mostly old humans since there appear to be more of them). As a result, **I suspect the paper over-estimates the epigenetic age of mid-life dogs**, which is in agreement with the deviation from the WebMD table of dog-human ages.

### Fitting a log curve

Fitting human age as a logarithmic function of dog age is equivalent to fitting a straight line function against the log of dog age. However, if we plot human age versus dog age on a log axis, it doesn’t exactly call out for a straight line fit.

Though the dog ages have a fairly uniform distribution, taking the log skews the distribution to match the human age distribution.

I was going to try a few other models, but since I’m now having serious doubts about the quality of the dog-human matching, there’s no use modeling it. *This is a good time to point out that the most likely explanation for my doubts is that I’m an idiot and haven’t spent months with this study like the authors have.*

### Physiological Age

From the paper, regarding the logarithmic function:

We found that this function showed strong agreement between

Wang et al., Quantitative Translation of Dog-to-Human Aging by Conserved Remodeling of the DNA Methylome, Cell Systems (2020), https://doi.org/10.1016/j.cels.2020.06.006

the approximate times at which dogs and humans experience

common physiological milestones during both development and

lifetime aging, i.e., infant, juvenile, adolescent, mature, and senior.

The following chart is used to support the “strong agreement”:

However, it’s not very convincing to me. The model almost completely misses the adolescent and mature physiological regions, and given that you can’t possibly miss the two corner regions with any reasonable model, **I’d say the model scores zero out of two for predicting physiological age**.

Interestingly wording note: the article says that the correspondence in the middle stages is “more approximate” which I guess might be a positive spin on “wrong”.

### Simple Model

I was initially excited to see this paper touting a dog-age formula based on DNA data, but now I’m doubting the results due mainly to the skewed human age population. So for now, I’ll stick with the expert wisdom summarized in the WebMD article. If you’re looking for a simple formula, you can get close to that table with the formula:

human age = 5 × dog age + 10