The Regional Earthquake Likelihood Models (RELM) experiment, conducted within the Collaboratory for the Study of Earthquake Predictability (CSEP), showed that the smoothed seismicity (HKJ) model by Helmstetter et al. was the most informative time-independent earthquake model in California during the 2006–2010 evaluation period. The diversity of competing forecast hypotheses and geophysical data sets used in RELM was suitable for combining multiple models that could provide more informative earthquake forecasts than HKJ. Thus, Rhoades et al. created multiplicative hybrid models that involve the HKJ model as a baseline and one or more conjugate models. In retrospective evaluations, some hybrid models showed significant information gains over the HKJ forecast. Here, we prospectively assess the predictive skills of 16 hybrids and 6 original RELM forecasts at a 0.05 significance level, using a suite of traditional and new CSEP tests that rely on a Poisson and a binary likelihood function. In addition, we include consistency test results at a Bonferroni-adjusted significance level of 0.025 to address the problem of multiple tests. Furthermore, we compare the performance of each forecast to that of HKJ. The evaluation data set contains 40 target events recorded within the CSEP California testing region from 2011 January 1 to 2020 December 31, including the 2016 Hawthorne earthquake swarm in southwestern Nevada and the 2019 Ridgecrest sequence. Consistency test results show that most forecasting models overestimate the number of earthquakes and struggle to explain the spatial distribution of epicenters, especially in the case of seismicity clusters. The binary likelihood function significantly reduces the sensitivity of spatial log-likelihood scores to clustering, however; most models still fail to adequately describe spatial earthquake patterns. Contrary to retrospective analyses, our prospective test results show that none of the models are significantly more informative than the HKJ benchmark forecast, which we interpret to be due to temporal instabilities in the fit that forms hybrids. These results suggest that smoothing high-resolution, small earthquake data remains a robust method for forecasting moderate-to-large earthquakes over a period of 5–15 yr in California.