The FDA may require comparative clinical efficacy for biosimilar approval, but on top of all the other required and available evidence, these trials add little value, Sarfaraz K. Niazi, PhD, states.
A true biosimilar product has “no clinically meaningful difference” compared with its reference product. Given the structural variance in a biosimilar candidate and its reference product, the regulatory agencies recommend stepwise testing that comprises analytical assessment, animal pharmacology, and clinical pharmacology profiling.
Clinical safety and efficacy testing should not be allowed if there remains a residual uncertainty after the stepwise assessment. Comparative efficacy testing of biosimilars is redundant at best based on the following:
Suppose the molecular structure of biological products were fixed. In that case, no clinical efficacy study would be required, so the emphasis should be on validating structural differences only, using both in vitro and in vivo testing, not clinical efficacy testing that is less sensitive in establishing structural similarity. If a structural element cannot be investigated, the uncertainty applies to both the biosimilar candidate and the reference product.
Regulatory agencies suggest conducting “additional clinical testing” if a residual uncertainty remains in establishing structural similarity; however, “additional clinical testing” does not mean clinical efficacy testing; it could well be an additional clinical pharmacology study. Historically, neither the FDA nor the European Medicines Agency (EMA) has identified residual uncertainty in any biosimilar product that will justify conducting comparative efficacy testing.
Comparative efficacy and safety studies require establishing an acceptable difference based on clinical judgment only that varies among agencies. Still, these do not ensure that the biosimilar candidate has sufficient comparable safety and efficacy. The safety and efficacy parameters are entirely arbitrary.
Biosimilar products receive extrapolation of indications, so, if a comparative efficacy study is required to establish biosimilarity, it must be conducted in all indications, not just in a group of indications based on the same mode of action. Such all-inclusive studies are not practical to conduct.
A new biological product is tested for comparative efficacy against a placebo. A statistical model to show a difference in this case is relatively straightforward. To achieve statistical significance in a comparative efficacy trial with biosimilars, many patients are required to demonstrate similarity or noninferiority, possibly more than for the testing of a new biological drug, if done correctly.
It is impractical to test a biosimilar product in a treatment-naïve population, which would increase patient response variability, making such studies complex and difficult to conclude.
Regulatory agencies suggest the use of clinical biomarkers, but there is no correlation among the available markers; one study may obtain results with one marker that may not be replicable using another marker.
Biological products have a broader dose-response relationship, so a product that does not meet the analytical assessment may pass the comparative efficacy testing. Comparative efficacy testing, therefore, risks the approval of unqualified biosimilars.
Patient populations are limited in some indications. In the case of oncology drug testing, this may stretch out the time needed to complete a trial protocol. In many cases, patients are lost to the disease. Given the highly variable patient response, clinical efficacy testing of biosimilar products becomes more demanding than the development of new biologics.
The FDA and European Medicines Agency have approved about 200 biosimilar products; a summary analysis shows no product was rejected based on comparative clinical efficacy testing. Even when there were minor differences, they were accepted. On an intellectual basis, if a test never fails, it means that either the products were similar or the test is not sensitive enough to tell the difference—in both cases, the testing would be redundant and unnecessary.