Society of Actuaries Papers on Risk-Scoring Models

Risk-scoring, or the attempt to identify the expected medical costs associated with a particular patient, has become widespread, used by CMS for Medicare Advantage, by many Medicaid programs and for insurance exchange business. It may determine not only the premiums paid to plans but the capitation or other at-risk payments from health plans to provider groups and systems. The Society of Actuaries has issued a primer on risk-scoring and an evaluation of claims-based risk-scoring models. (SOA Primer) (SOA Evaluation) The ultimate rationale for risk assessment, scoring and adjustment is that certain outcomes, whether quality ones or economic ones, may be influenced, sometimes heavily, by patient characteristics, both demographic and health. Risk assessment and scoring tends to center around the diseases or conditions which a particular person has or has had, with some weight attributed to demographics like age and sex. Data sources and coding systems are very important in the process. And since the only data actually known about a person is from the past, but the risk assessment is often used going forward, for example for next year’s payment amounts, there is a predictive assumption always underlying this work–one case being that the past utilization and cost of a person’s health care accurately reflects what it will be in the future.

So for payment purposes most of these risk-scoring systems take all the past data about the patient’s diseases, aggregate data on utilization and spending for those diseases (often in the form of episode-grouping), gather some demographic data, and estimate what future spending might be, or they make a retroactive adjustment for payments in the period from which the data was gathered. A prospective risk assessment and scoring system may take non-chronic conditions out and may rely more heavily on demographic factors. This risk score is typically numerically rated compared to the average cost of a patient in the population of interest. For example, if all Medicare beneficiaries’ average cost is taken as a 1, a very healthy 65 year-old beneficiary might be a .8, and a very sick 95-year-old might be a 1.5. A larger population might be broken down geographically or by other factors. The primer contains an excellent explanation of the risk assessment and scoring process and the factors going into constructing a formula.

The second paper analyzes the effectiveness or accuracy of common claims-data based risk scoring systems. These systems are available from commercial vendors such as Optum, Truven, Milliman, SCIO, Johns Hopkins, 3M, and Verisk. CMS has also developed its own model. Although these models are very sophisticated and are constantly being updated and refined, they don’t have an extremely high level of accuracy. Concurrent models, which use data from the same period that risk is being scored for, are obviously more accurate than prospective ones, which try to use past data to project future costs. The various models were tested against a refined commercial claims data base. As drugs have become such a common treatment modality, particularly for chronic conditions, some of the models use drug treatments as the basis for prediction. These models tend to be less accurate than those which also use medical data. Most of even the concurrent systems have an accuracy of less than 50% in regard to individual patients, while the prospective ones struggle to reach 20%. While their accuracy improves if you take out very high-cost patients, that kind of defeats the purpose, since these patients may be the ones the health plan or provider would be most interested in managing, if they knew who they were ahead of time. Performance improves greatly when the models are applied to a group of patients, which is helpful at a health plan level or for a capitated provider with a large panel of patients, but does nothing to help focus care management efforts. And the models all grossly overpredict costs for individuals at the low end of the spending spectrum and somewhat underpredict it on the high end.

So why are the models not more accurate? Partly they may not pick up provider practice differences very well, which can have a significant influence on per person spending. They may also not adequately adjust for unit price differences or changes. And a lot of health spending is simply unpredictable–who is going to suddenly develop a serious cancer or have a heart attack? Who is going to get hit by a bus or fall of a ladder? We can expect models to get better as research continues, but predicting the random is not likely in our lifetimes. I joke about the nerdiness of this kind of detailed health care research and analytics, but it is very core to much of health care reimbursement today and understanding it well is critical for many health care organizations.