How well do tools predict what happens with adults with newly-diagnosed chronic lymphocytic leukaemia (CLL) over time?

What was the aim of this review?

There are many types of blood cancers called leukaemia. Chronic lymphocytic leukaemia (CLL) is the most common type. Twenty-five per cent of people who have leukaemia have CLL. It is natural for people with newly-diagnosed CLL and their families to want to know what will happen with their health in the future. They may be wondering if or when they will need treatment, if or when their disease will get worse or how long people live with CLL.

Researchers identified several characteristics that are associated with these outcomes. From these characteristics, they have tried to design tools that help predict what may happen to groups of people with newly-diagnosed CLL.

The aim of this Cochrane Review is to evaluate and summarise those tools and studies that test the tools with other patient data.

What are the key messages from this review?

Reviewers found that there is no reliable way to predict what might happen over time to people who have (untreated) CLL. One reason is because the prediction tools have not been tested enough times with enough different people to know how well they really work.

Another reason is because researchers continue to develop more effective CLL treatment options that have better results, and the prediction tools have not kept up with advances in treatment.

What are the main results of the review?

We identified 52 tools that were designed to predict what may happen to people newly-diagnosed with CLL. To find the best tools, we had to select the studies carefully. To apply these tools in clinical practice:

- a tool has to be tested by different researchers to predict what may happen with individuals with CLL in different geographic locations using different groups of people (i.e. age, gender, stage) with CLL. In other words, we would not include a tool if it was only tested on the people who provided their data to create it;

- the results of the tool should be consistent to prove that it works;

- the tests of the tool have to provide enough information to show how well the tool works. For example, the tests have to include large groups of people and enough information about the type of CLL they have.

We found three tools that met these requirements: the CLL International Prognostic Index (CLL-IPI), the Barcelona-Brno score, and the MDACC 2007 index score.

The CLL-IPI did the best job at identifying people who would survive longer with CLL and people who would survive less long. However, we rated the quality of the CLL-IPI studies as low because they did not provide all the information necessary to know how accurate the tool was. The Barcelona-Brno score and the MDACC 2007 index score, tested on a smaller overall number of patients, showed lower discrimination between persons with a good as compared to a worse prognosis, and showed a similarly low quality of the studies.

Conclusion

More and better research is needed to develop and test the tools to help predict how CLL will behave for different groups of people over time. The tools must also adapt to accurately predict the performance of new treatments.

Authors' conclusions: 

Despite the large number of published studies of prognostic models for OS, PFS or TFS for newly-diagnosed, untreated adults with CLL, only a minority of these (N = 12) have been externally validated for their respective primary outcome. Three models have undergone sufficient external validation to enable meta-analysis of the model's ability to predict survival outcomes. Lack of reporting prevented us from summarising calibration as recommended. Of the three models, the CLL-IPI shows the best discrimination, despite overestimation. However, performance of the models may change for individuals with CLL who receive improved treatment options, as the models included in this review were tested mostly on retrospective cohorts receiving a traditional treatment regimen. In conclusion, this review shows a clear need to improve the conducting and reporting of both prognostic model development and external validation studies. For prognostic models to be used as tools in clinical practice, the development of the models (and their subsequent validation studies) should adapt to include the latest therapy options to accurately predict performance. Adaptations should be timely.

Read the full abstract...
Background: 

Chronic lymphocytic leukaemia (CLL) is the most common cancer of the lymphatic system in Western countries. Several clinical and biological factors for CLL have been identified. However, it remains unclear which of the available prognostic models combining those factors can be used in clinical practice to predict long-term outcome in people newly-diagnosed with CLL.

Objectives: 

To identify, describe and appraise all prognostic models developed to predict overall survival (OS), progression-free survival (PFS) or treatment-free survival (TFS) in newly-diagnosed (previously untreated) adults with CLL, and meta-analyse their predictive performances.

Search strategy: 

We searched MEDLINE (from January 1950 to June 2019 via Ovid), Embase (from 1974 to June 2019) and registries of ongoing trials (to 5 March 2020) for development and validation studies of prognostic models for untreated adults with CLL. In addition, we screened the reference lists and citation indices of included studies.

Selection criteria: 

We included all prognostic models developed for CLL which predict OS, PFS, or TFS, provided they combined prognostic factors known before treatment initiation, and any studies that tested the performance of these models in individuals other than the ones included in model development (i.e. 'external model validation studies'). We included studies of adults with confirmed B-cell CLL who had not received treatment prior to the start of the study. We did not restrict the search based on study design.

Data collection and analysis: 

We developed a data extraction form to collect information based on the Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS). Independent pairs of review authors screened references, extracted data and assessed risk of bias according to the Prediction model Risk Of Bias ASsessment Tool (PROBAST). For models that were externally validated at least three times, we aimed to perform a quantitative meta-analysis of their predictive performance, notably their calibration (proportion of people predicted to experience the outcome who do so) and discrimination (ability to differentiate between people with and without the event) using a random-effects model. When a model categorised individuals into risk categories, we pooled outcome frequencies per risk group (low, intermediate, high and very high). We did not apply GRADE as guidance is not yet available for reviews of prognostic models.

Main results: 

From 52 eligible studies, we identified 12 externally validated models: six were developed for OS, one for PFS and five for TFS. In general, reporting of the studies was poor, especially predictive performance measures for calibration and discrimination; but also basic information, such as eligibility criteria and the recruitment period of participants was often missing. We rated almost all studies at high or unclear risk of bias according to PROBAST. Overall, the applicability of the models and their validation studies was low or unclear; the most common reasons were inappropriate handling of missing data and serious reporting deficiencies concerning eligibility criteria, recruitment period, observation time and prediction performance measures.

We report the results for three models predicting OS, which had available data from more than three external validation studies:

CLL International Prognostic Index (CLL-IPI)

This score includes five prognostic factors: age, clinical stage, IgHV mutational status, B2-microglobulin and TP53 status. Calibration: for the low-, intermediate- and high-risk groups, the pooled five-year survival per risk group from validation studies corresponded to the frequencies observed in the model development study. In the very high-risk group, predicted survival from CLL-IPI was lower than observed from external validation studies. Discrimination: the pooled c-statistic of seven external validation studies (3307 participants, 917 events) was 0.72 (95% confidence interval (CI) 0.67 to 0.77). The 95% prediction interval (PI) of this model for the c-statistic, which describes the expected interval for the model's discriminative ability in a new external validation study, ranged from 0.59 to 0.83.

Barcelona-Brno score

Aimed at simplifying the CLL-IPI, this score includes three prognostic factors: IgHV mutational status, del(17p) and del(11q). Calibration: for the low- and intermediate-risk group, the pooled survival per risk group corresponded to the frequencies observed in the model development study, although the score seems to overestimate survival for the high-risk group. Discrimination: the pooled c-statistic of four external validation studies (1755 participants, 416 events) was 0.64 (95% CI 0.60 to 0.67); 95% PI 0.59 to 0.68.

MDACC 2007 index score

The authors presented two versions of this model including six prognostic factors to predict OS: age, B2-microglobulin, absolute lymphocyte count, gender, clinical stage and number of nodal groups. Only one validation study was available for the more comprehensive version of the model, a formula with a nomogram, while seven studies (5127 participants, 994 events) validated the simplified version of the model, the index score. Calibration: for the low- and intermediate-risk groups, the pooled survival per risk group corresponded to the frequencies observed in the model development study, although the score seems to overestimate survival for the high-risk group. Discrimination: the pooled c-statistic of the seven external validation studies for the index score was 0.65 (95% CI 0.60 to 0.70); 95% PI 0.51 to 0.77.