Visitor Contribution: “Are Google knowledge actually helpful for macroeconomic nowcasting?”

At this time, we’re happy to current a visitor contribution by Laurent Ferrara (Professor of Economics at Skema Enterprise College, Paris and Director of the International Institute of Forecasters).
The current sequence of financial, monetary and pandemic crises across the globe has significantly shortened the horizon of predictions for macroeconomic forecasters. On the coronary heart of the Covid-19 disaster, the horizon of curiosity was reasonably the top of the week than two years-ahead. This led practitioners to concentrate on new forms of high-frequency and different datasets, elevating thus new challenges for econometricians (unstructured knowledge, very giant datasets, blended frequencies, excessive volatility, quick samples …).
Numerous sources of other knowledge have been used within the current literature, akin to for instance net scraped knowledge, scanner knowledge or satellite tv for pc knowledge. Usually, these datasets are extraordinarily giant and may be thought of as huge knowledge. One of many foremost sources of other knowledge are Google search knowledge, and seminal papers on using such knowledge for forecasting are those by Hal Varian and co-authors (see for instance here). Within the space of nowcasting/forecasting, the literature tends to point out proof of some forecasting energy for Google knowledge, a minimum of for some particular macroeconomic variables akin to unemployment charge (D’amuri and Marcucci, 2017) en employment (Borup and Montes Schütte, 2020), constructing permits (Coble and Pincheira, 2017) or automotive gross sales (Nymand and Pantelidis, 2018). Nonetheless, when accurately in contrast with different sources of knowledge, the jury remains to be out on the achieve that economists can get from utilizing Google knowledge for forecasting and nowcasting. A facet query, extremely debated on Econbrowser is in regards to the replicability of these knowledge by practitioners (see here for a dialogue between Hal Varian and Simon van Norden).
In a current paper, printed with Anna Simoni within the Journal of Business and Economic Statistics (see right here for a mimeo), we ask ourselves whether or not Google knowledge are nonetheless helpful in nowcasting quarterly GDP progress when controlling for official variables, akin to opinion surveys or manufacturing manufacturing, usually utilized by forecasters. And in that case, when precisely are these different knowledge including a achieve in nowcasting accuracy. Nowcasting GDP progress is extraordinarily helpful for policy-makers to evaluate macroeconomic situations in real-time. The idea of macroeconomic nowcasting has been popularized by Giannone et al. [2008] and differs from normal forecasting approaches within the sense it goals at evaluating present macroeconomic situations on a high-frequency foundation. The concept is to offer policy-makers with a real-time analysis of the state of the financial system forward of the discharge of official Quarterly Nationwide Accounts, that at all times come out with a delay. See for instance here for the U.S. financial system and here for a current publish on Econbrowser.
As a result of Google search knowledge are of excessive dimension, within the sense that the variety of variable is giant in comparison with the time sequence dimension, there’s a value to pay for utilizing them: first, we have to cut back their dimensionality from ultra-high to excessive through the use of a screening process and, second, we have to use a regularized estimator to cope with the pre-selected variables. Regularization strategies are a option to account for a lot of variables, doubtlessly correlated, right into a linear regression (see for instance the Ridge estimation). On this respect, we put ahead a brand new strategy combining variable pre-selection and Ridge regularization enabling to account for a big database. Within the paper, we offer some theoretical outcomes as regards the nice asymptotic properties of this estimation technique, that we check with as Ridge after Mannequin Choice.
Along with these theoretical outcomes, we get a bunch of empirical outcomes that might be attention-grabbing to share with folks concerned with utilizing excessive dimensional different knowledge for macroeconomic nowcasting. Our goal is to nowcast GDP progress each week of the quarter, for the U.S., euro space and Germany over 3 forms of financial durations: (i) a peaceful interval (2014-16), (ii) a interval with a sudden downward shift in GDP progress (2017-18, associated to commerce warfare between U.S and China/Europe) and (iii) a recession interval with giant adverse progress charges (2008-09, pushed by the International Monetary Disaster). On this respect we use classical macro knowledge (surveys and manufacturing), in addition to different knowledge stemming from Google (Google Search Knowledge, already grouped into classes and sub-categories). We examine numerous approaches based mostly on their nowcasting means, as measured by the Root Imply Squared Forecasting Error (RMSFE). 4 salient information emerge from our empirical evaluation.
First, we examine a typical regression (with Ridge regularization) with a regression after preselection (our Ridge after Mannequin Choice strategy). Determine 1 exhibits the outcomes for the euro space throughout a peaceful interval (2014-16). We clearly see the achieve by way of nowcasting accuracy of pre-selecting knowledge earlier than coming into into the mannequin. The concept is that having too many variables provides an excessive amount of noise. That is particularly the case with Google Search Knowledge, as a few of them should not instantly associated to financial exercise. This consequence confirms earlier outcomes in opposition to the background of dynamic issue fashions (see Bai and Ng, 2008 or Barhoumi et al., 2009).
Determine 1: RMSFEs for the euro space throughout a peaceful interval (2014-16) stemming from a typical regression with Ridge regularization (blue bars) and from the Ridge after Mannequin Choice strategy (orange bars). Evolution of RMSFEs inside the 13 weeks of the present quarter. Supply: Ferrara and Simoni (2023)
Second, we level out the usefulness of Google search knowledge in nowcasting GDP progress charge for the primary 4 weeks of the quarter, that’s when there is no such thing as a official details about the state of the present quarter. In Determine 1, we see that at the start of the quarter (from week 1 to week 4), Google knowledge certainly present an correct image of the GDP progress charge within the sense that RMSFEs are moderately low (between 0.2% and 0.3%), barely greater than these on the finish of the quarter when all the data is on the market (about 0.2%).
Determine 2: RMSFEs for the euro space throughout a peaceful interval (2014-16) stemming from a typical regression with Ridge regularization (blue bars), from the Ridge after Mannequin Choice strategy (orange bars), from the Ridge after Mannequin Choice strategy utilizing solely Google knowledge (inexperienced bars) and from a fundamental regression mannequin with none Google knowledge (yellow bars) . Evolution of RMSFEs inside the 13 weeks of the present quarter Supply: Ferrara and Simoni (2023)
Third, as quickly as official knowledge grow to be obtainable, that’s ranging from week 5 with the discharge of the primary opinion survey of the quarter (within the euro space case), then the relative nowcasting energy of Google knowledge quickly vanishes. We see in Determine 2, that for the week 5, the RMSFE with all knowledge (orange bar) is equal to the one with none Google knowledge (the yellow bar), that’s. with solely macro info contained within the first survey of the quarter. We additionally observe that RMSFEs stemming from the Ridge after Mannequin Choice strategy utilizing solely Google knowledge (inexperienced bars) don’t present any decline additional time, suggesting that the achieve seen in orange bars ranging from week 5 is coming from the combination of macro variables.
Fourth, recession durations current a selected sample, because the mannequin with none pre-selection and with solely Google knowledge as info set offers the bottom RMSFEs (inexperienced bars in Determine 3). This sample can also be usually seen for German and U.S. knowledge. This consequence should be additional understood by extra analysis, however it is likely to be associated to the well-known greater uncertainty that we observe throughout recessions, that means that extra knowledge should be used to account for it. In any case, this may be seen as a justification of using different knowledge throughout crises.
Determine 3: RMSFEs for the euro space throughout a recession interval (2008-09) stemming from a typical regression with Ridge regularization (blue bars), from the Ridge after Mannequin Choice strategy (orange bars), from the Ridge after Mannequin Choice strategy utilizing solely Google knowledge (inexperienced bars) and from a fundamental regression mannequin with none Google knowledge (yellow bars) . Evolution of RMSFEs inside the 13 weeks of the present quarter Supply: Ferrara and Simoni (2023)
Numerous robustness checks verify that these empirical outcomes nonetheless maintain for all of the international locations/areas in our evaluation and are nonetheless legitimate after we enhance the macroeconomic info set by contemplating 22 common variables (gross sales, exports, employment, …). Final a true-real evaluation for the euro space with vintages of knowledge verify the rating of the varied approaches. Total, all these outcomes level out that Google knowledge may be very helpful for GDP progress nowcasting throughout growth phases when info is missing, after a pre-selection step. Nonetheless, as quickly as official macroeconomic info arrives, the marginal achieve from Google knowledge tends to quickly vanish. Throughout recession phases, it appears that evidently forecasters want the most important obtainable info set to evaluate what’s occurring within the financial exercise.
This publish written by Laurent Ferrara.