Building a unified sustainable development goal database: Why does sustainable development goal data selection matter?

The 2020s are an essential decade for achieving the 2030 Agenda and its Sustainable Development Goals (SDGs). For this, SDG research needs to provide evidence that can be translated into concrete actions. However, studies use different SDG data, resulting in incomparable findings. Researchers primarily use SDG databases provided by the United Nations (UN), the World Bank Group (WBG), and the Bertelsmann Stiftung & Sustainable Development Solutions Network (BE-SDSN). We compile these databases into one unified SDG database and examine the effects of the data selection on our understanding of SDG interactions. Among the databases, we observed more different than similar SDG interactions. Differences in synergies and trade-offs mainly occur for SDGs that are environmentally oriented. Due to the increased data availability, the unified SDG database offers a more nuanced and reliable view of SDG interactions. Thus, the SDG data selection may lead to diverse findings, fostering actions that might neglect or exacerbate trade-offs.


| INTRODUCTION
The 2020s are a critical decade for achieving the 2030 Agenda and thus for transforming our world towards a more sustainable one. To measure the agenda's progress, the United Nations (UN) adopted 17 Sustainable Development Goals (SDGs) and 169 targets in 2015.
Five years into its implementation, however, no country is on track in meeting the SDGs until 2030 (Sachs et al., 2020). Additionally, the COVID-19 pandemic has also negatively impacted most SDGs . The global SDG research community continues to address this challenge by finding evidence-based guidance for policymakers and stakeholders to develop and implement the strategies needed to achieve the SDGs. To foster exchanges between science, policy, and society, SDG research needs to provide sound evidence to ensure that scientific outcomes are translated into concrete and practical actions.
Currently, SDG-related publications have reached a staggering 4.1 million articles, presenting both opportunities and challenges for global SDG research (Elsevier, 2020). Science-based approaches in SDG research range from qualitative, static-, to dynamic-quantitative methods applied at different scopes, ranging from sector-or goalspecific to integrated ones and from local to global scales (Allen et al., 2021). Since SDGs are a system of interacting components rather than just a collection of goals, targets, and indicators (Pradhan, 2019), SDG research, however, needs to go beyond one approach and scale. Within the SDG system, synergies and trade-offs need to be disentangled to support decision-making and prioritizing actions. Proving this support requires, for example, the identification of positive or negative multiplication effects within the system (Pham-Truffert et al., 2020), of nonlinear SDG interactions that can trigger rapid progress with minimal investments , or of entry points to support the SDG system, such as climate initiatives (Coenen et al., 2021). Consequently, SDG decision-making should not just be normative but also evidence-based to ensure efficient resource utilization. In this sense, data holds the potential to support and inform actions toward realizing the SDGs, but also great potential to marginalize or misinform about SDGs' progress.
Quantitative SDG research primarily uses databases provided by the United Nations, the World Bank Group (WBG), and the Bertelsmann Stiftung & Sustainable Development Solutions Network (BE-SDSN) at a global scale. For example, Pradhan et al. (2017), , and Anderson et al. (2021) used the UN data for a holistic quantification of SDG interactions. Kroll et al. (2021) and Asadikia et al. (2021) statistically investigated SDG interactions using data from the BE-SDSN. Lusseau and Mancini (2019) and Laumann et al. (2020) estimated the system of SDG interactions using the WBG data. Further, some SDG research does not apply any of those databases but instead simplifies the complexity of the SDGs by using at least one indicator per SDGs based on different sources or own model results (Jägermeyr et al., 2017;Mainali et al., 2018;Obersteiner et al., 2016;Zhang et al., 2022). Comparing the findings of those studies show similar but also different results. For example, Pradhan et al. (2017) detect SDG 8 and 12 being the goals most commonly associated with trade-offs. Contrarily, the results by Laumann et al. (2020) evince that economic growth, and by Asadikia et al. (2021) that responsible consumption, do not play such a central role for sustainable development compared to other SDGs. Miola and Schiltz (2019) compared existing SDGs performance tools using the same set of indicators for EU member states. They detected differences in the tools, where the selection of indicators and methods applied led to substantially different relative evaluations of the SDGs.
We go beyond evaluating single SDG performances and apply the same statistical method to measure SDG interactions holistically using different databases. The question, therefore, arises as to whether the data selection is the cause of the differences in results, which would make it difficult to compare SDG-related articles and their policy implications. This aspect already emphasizes the importance of a unified SDG framework and database to obtain comparable results (Miola & Schiltz, 2019;.
To assess how the selection of SDG data matters, we apply a temporal correlation analysis to investigate synergies and trade-offs within and across SDGs at the global, income, and regional scales, and evaluate the corresponding results. Further, this study provides the first framework to unify SDG databases. Based on these analyses, we showed the strengths and limitations of each of the databases and similarities and differences among them. Additionally, we investigated variations among the databases by developing simple SDG networks to identify the most connected goals and targets based on the results from the correlation analyses. Both approaches enable the identification of SDG goals and targets that are synergistic or impeding in the achievement of the 2030 Agenda.

| SDG monitoring frameworks and databases
Indicators monitor progress, develop implementation strategies, and manage resource allocations for achieving the 2030 Agenda. The UN, the WBG, and the BE-SDSN currently provide global SDG frameworks and corresponding databases widely used by scientists, policymakers, and practitioners. Those SDG frameworks try to cover the multidimensional aspects of sustainable development and are translated into SDG indicators. Here, we provided a brief description of these databases, which are elaborated in Text S1.
Together with national statistical offices, the Inter-agency and Expert Group on SDG Indicators (IAEG-SDGs) develops methodologies and compiles data for SDG indicators, which they submit to the UN Statistics to generate the Global SDG Indicators database (United Nations Statistics Division, 2020). We refer to this database as the UN database.
Currently, the UN database provides data on 192 SDG indicators for a total of 258 countries and areas between 1967 and 2020 (download August 20, 2020) (United Nations Statistics Division, 2020). For almost every indicator, the UN database provides sub-indicators, which further are disaggregated. This data disaggregation considers demographic factors (e.g., gender, age group, or rural-urban) but also nondemographic factors (e.g., cities, sectors, or products) (Data S1). We refer to those disaggregated UN SDG sub-indicators as UN indicator data.

| Establishment of unified SDG database
We conducted three steps to merge the UN, WBG, and BE-SDSN databases into one unified SDG database (abbreviated unified database). First, we consider the lowest common denominator of years-2000 to 2019-as a comparison period. The Millennium Development Goals (MDGs), which served as the data foundation for several SDGs, were adopted in 2000.
Therefore, we choose 2000 as the first year for the database.
In a second step, we assigned the UN, WBG, and BE-SDSN indicator data to the officially adopted global SDG indicator framework, that is, the 17 SDGs and 169 targets. Since the UN database is fully built around that framework, no adjustments were needed. The WBG database consists of indicator data, which are already assigned to the target level. We took over those target assignments and allocated WBG indicator data to either similar UN indicator data or added them as additional indicators to cover the target. The allocation of WBG to UN indicator data occurs if both indicator data had the same or showed close resemblances in the description. If the description was similar, but the unit differs, we assigned both indicator data to cover the target. We particularly focused on disaggregated data while merging the UN and WBG frameworks. Despite the UN disaggregating data more frequently, disaggregated WBG data still supplement the unified SDG framework. To merge the BE-SDSN indicator data to the unified SDG framework, we first individually assigned all 85 indicators to the SDG target level. Subsequently, we decided per target whether the BE-SDSN indicator data is similar or resembles already allocated UN or WBG indicator data. If all three showed similar descriptions with the same meaning and the same unit, we allocated them as one indicator data to cover the target within the unified SDG framework.
However, most BE-SDSN indicators do not overlap with the indicators of the other databases but present additional measurements to cover SDG targets and supplement the unified SDG framework.
In the third step, we decided on a unique list of indicator data for the unified database (Data S1). We reviewed each of the 169 targets and the assigned indicator data from the UN, WBG, and BE-SDSN to maximize the number of usable indicators. If indicator data were overlapping (Table S1), meaning UN, WBG, or BE-SDSN provide data for the indicator, we choose the database with the highest data availability over time and space. Our criteria for the data availability is the number of data points-the amount of available values for the respective indicator for all countries over the years (Data S1 lists the exact amount of data points per indicator data). If the amount of data points were the same, we preferred the UN data.

| Statistical analysis of SDG interactions at different scales
For the comparison of the SDG interactions, our methodological approach is twofold. First, we use the statistical method introduced by Pradhan et al. (2017) to explore synergies and trade-offs among SDGs based on (anti)correlations. Accordingly, we apply a temporal analysis, measuring correlations between a pair of indicator data for each country. We measure the Spearman's ranked correlation coefficient (ρ) between at least eight paired observations (representing indicator data values for one country for at least 8 years) (Interstate Technology & Regulatory Council, 2013). These indicator data pairs can belong to the same SDG or two distinct SDGs. Some indicator data are measured by so-called dummy variables (also known as the Boolean indicator or binary variable). Those indicators consist of numerical values either being 0 or 1 to indicate the absence or presence of categorical effects. We exclude these indicator data from our analysis because of the bias for the statistical analysis.
The coefficient value ρ is multiplied by the relation's direction: a positive sign refers to indicators that are desirable to increase and a negative sign to those indicators that need a decline for meeting the 2030 Agenda. Based on the resulting coefficient's sign and value, we define synergies and trade-offs. A plus sign indicates a positive relation (synergy), and a minus sign indicates a negative one (trade-off). To avoid over-interpretation of correlation (Hauke & Kossowski, 2011), we implement thresholds while defining synergies and trade-offs. We define SDG interactions with ρ 0:5, 1 ð as "synergies," with ρ À0:5, 0:5 ½ as "not-classifieds," and with ρ À1,À0:5 ½ Þas "tradeoffs" (Smarandache, 2016;. For comparing the UN, WBG, BE-SDSN, and unified databases, we analyze SDG interactions at the global, income, and regional scales. The World Bank Atlas categorizes countries based on their income into low-income countries (LIC), lower-middle-income countries (LMIC), upper-middle-income countries (UMIC) and high-income countries (HIC) ( Figure S1) (World Bank Group, 2020b). We use this grouping to investigate how a country's macroeconomic context influence SDG interactions based on different databases. Since SDG interactions could also vary due to different country's social and environmental factors, we investigate SDG interactions based on world regions. We summarize countries into four world regions based on the United Nations Regional Groups (United Nations, 2020): Western World, Latin America, Asia-Pacific, and Africa ( Figure S2).
By capturing SDG synergies and trade-offs at global, income, and regional scales, we further distinguish similarities and differences among the SDG databases. At the global scale, we distinguish the top 10 SDG pairs with similarities and differences in SDG interactions among SDG databases. Considering income groups and regions, the four databases do not provide consistent data for the 153 SDG pairs (binomial coefficient of 17 SDGs, Table S2). For a reasonable comparison of similarities and differences across the income-based and regional SDG interactions, we only choose SDG pairs having indicator data across all four databases. The amount of available SDG pairs for the income and regional groups varies significantly across the databases. Due to the reduced data availability and, therefore, comparable SDG pairs at income and regional scales, we only distinguish the top five SDG pairs with similarities and differences in SDG interactions among the SDG databases. In terms of similarities, we define an SDG pair as a top pair if it exhibits the highest shares of synergies or tradeoffs for all four SDG databases. For differences, we consider an SDG pair as a top pair when it has the highest range of synergies (R S ) or trade-offs (R T ) across the databases, that is, R S > 50% or R T > 30%.
Second, we apply network analysis based on the correlation analysis results, namely the share of synergistic and impeding SDG interactions instead of the absolute values. The method adopted in this paper is an extension of the approach proposed by Putra et al. (2020) and Lusseau and Mancini (2019). We generate the network of interactions at two scales: among the 169 SDG targets and the 17 SDGs. By creating networks, we identify the most positively and negatively connected goals and targets in the SDG system for each of the four databases. Based on these networks, we can visualize the interactions and assess the role of components within the system. In general, a network structure consists of a set of nodes, either goals or targets and a set of edges, which are the connection between the nodes based on the correlation.
To simplify the network structure and focus more on the significance of nodes in the networks, we assigned three conditions for defining goal and target networks. First, we excluded target connections with a low amount of indicator data pairs (<20) to cover more connected targets in the networks and to represent global phenomena. Since the goal network aggregates the target network, we considered all goal connections. Second, we apply thresholds for the edges representing synergistic or impeding interactions between the nodes. We did not include edges with shares (s) in synergies (s S ) or trade-offs (s T ) with s ≤ 30%. This enables us to distinguish structurally connected nodes within the goal and target networks. Third, we calculate the eigenvector centrality of nodes to assess their significance in the network based on the transitive influence of nodes. A high eigenvector value indicates that a node is connected to many nodes, which themselves have high scores.

| Comparison of SDG databases
The three SDG databases have their strengths and limitation in covering the 2030 Agenda (Table 1). Although there are similarities between the databases, the degree of differences outweighs them, and thus they complement each other. However, complete indicator data time-series are not available for all time steps and countries. The UN database provides data for the highest number of countries and areas over time. However, the data availability varies significantly across regions and time. The same holds for the WBG database. To minimize biases from missing data, the BE-SDSN only includes countries that have data for at least 80% of the indicators across time (Sachs et al., 2020). For that reason, the BE-SDSN database provides the lowest number of countries but offers the most consistent timeseries.
The amount of indicator data per target and goal varies significantly across the three databases (Table S3, Data S1). BE-SDSN use a mix of official and nonofficial data sources, including model-based estimates, to fill data gaps and reduce time lags in official statistics.
Therefore, all BE-SDSN indicator data have an established methodology and good data coverage over time and space. The data coverage tends to be better for socioeconomic goals such as SDG 2, 3, 4, and 9, whereas data availability to monitor SDG 10 and 13 is rel- disaggregation and provides the most distinct view on demographic and nondemographic factors. The WBG database offers nine diverse levels of disaggregation with some overlaps with the UN database but also consists of unique disaggregation into specific types of species (Target 15.5), industries (Target 6.4, 8.2), and target values (Target 11.6).
Our unified database offers more indicator data, covering more targets across more countries than the other three SDG databases (Tables 1, S2-S4, and Data S1). The unified database consists of a unique list of 2584 indicator data for 255 countries and areas between 2000 and 2019. Even if the indicator data increase does not seem much, the amount of available data pairs did increase drastically, especially for income and regional groups (Table S1). Our database provides a comparable number of indicator data per target with a more consistent data disaggregation. Still, this database provides no data for 26 targets, mainly associated with SDG 11 to 17 (Data S1). i Both, the unified SDG framework (Data S1) and the unified SDG database (Data S2), are provided in the Supporting Information.

| Global SDG interactions
At the global scale, we observe similarities and differences in SDG  We further depict the top 10 SDG pairs with differences in synergies and trade-offs across all four databases globally ( Figure 1B). The share of synergies ranges from 43% to 53% among the databases in the top 10 SDG pairs with differences in synergies. We observe the highest difference in synergistic shares between the UN and the WBG database. Results obtained from the WBG database show high synergistic shares for SDG pairs 5 and 11; 6 and 11; and 7 and 11, whereas for the UN database, those are not-classified. Ensuring access to safe and affordable housing (Target 11.1) can increase security and safety, especially for women (Target 5.1-5.5), improve access to adequate sanitation and clean drinking water (Target 6.1, 6.2), as well as access to electricity and clean fuels and technologies for cooking (Target 7.1). The WBG database not only consists of indicators on the population living in slums (like the UN) but also on the urban population. We observe high shares of synergies for these indicators with the above-mentioned targets. Additionally, the WBG, in comparison to the UN, provides detailed data on the proportion of population exposed to air pollution (Target 11.6). A primary risk factor for deaths is the use of solid fuels for cooking (Target 7.1), which causes indoor air pollution (Target 11.6). The interaction between SDG 1 and 1 is in the top ten pairs for similarities (approx. s S > 45% across all databases) and differences in synergies. This is because the range in synergies (R S ) simultaneously exceeds due to the high shares of synergies based on the BE-SDSN database (s S > 97%) our threshold of detecting differences in synergies (R S > 50%). The same logic applies for SDG pair 6 and 6 with similarities in synergies but also differences in trade-offs. Among the top 10 SDG pairs with differences in impeding interactions, the shares of trade-offs range between 32% and 50%. We observe these large differences for goals related to lives below water (SDG 14) and on land (SDG 15) ( Figure 1B, right). One reason behind this observation is the diverse indicator data for SDG 14 and 15, which also holds for SDG 10 and 11 (Data S1  group within and across SDGs (Table S5) the tensions within the income groups to rapidly expand access to essential basic services, the need for efficient energy systems, and to ensure economic growth, causing impeding interaction with other SDGs.

| SDG interactions across income groups
Despite those similarities, income groups also have differences in shares of synergistic and impeding interactions among the four SDG databases ( Figure 2B,D for LICs and HICs, Figure S7B,D for LMICs and UMICs). For example, LICs show the lowest share of synergies based on the UN database (average sum of 30.4%), followed by the unified and WBG database (average sum of 30.5% and 38.1%, respectively), and the highest shares based on the BE-SDSN database (average sum of 59.4%) (Table S3). This order of databases does not necessarily apply to the other income groups. For LICs, SDG 9 reoccurs in three of the top five SDG pairs with differences in synergies. According to our target assignment of indicator data, the BE-SDSN database only provides data for targets 9.5 and 9.c (Data S1).
Enhancing scientific research (Target 9.5), particularly in developing countries, is founded on supporting quality in early childhood education (SDG 4). Significantly increasing access to information communications technology (ICT) (Target 9.c) addresses  (Table S1), consequently the smaller number of indicator data representing an SDG pair (especially on the part of the WBG and BE-SDSN), and the lower data availability, missing data or data inconsistency over time or space. The unified database shows fewer extreme deviations in shares of synergies or trade-offs due to the higher data volume, creating a more nuanced and robust picture of SDG interactions within income groups.
If the level of income is a descriptor of SDG interactions, the share of synergies and trade-offs should either be more significant within income groups than globally or show trends with increasing income levels. Results based on the BE-SDSN database exhibit higher shares of synergies and trade-offs within the income groups than globally. Comparing the other three databases to the global level, however, the results are diverse. These varying shares of synergies and trade-offs not just occur due to income inequalities but also because of data selection. However, especially the substantial inequalities between income groups need to be tackled to achieve the 2030 Agenda.

| SDG interactions across regions
Assessing SDG interactions for regions in relation to the data selection reveals some resemblances to the income groups' results. Each region has on average more synergistic than impeding SDG interactions regardless of the data selection (Table S5) SDG 11 has different indicators across the SDG databases (Data S1), with a large variation in the number of provided data (Table S3). The UN database offers 264 indicator data for Target 11.1, 11.5, 11.6, and 11.b. The WBG database provides nine indicator data (Target 11.1, 11.6) and the BE-SDSN database, four indicator data (Target 11.1,11.2,11.6). This finding highlights that the data selection significantly defines city-related SDG interactions.
The top five SDG pairs of Africa, Latin America, and Asia-Pacific with differences in trade-offs repeatedly contain SDG 14 and 15.
Selecting the UN, WBG, or BE-SDSN database to assess how human activities affect biodiversity results in large shares of trade-offs for different SDG interactions within all three regional groups. Only the unified database provides a distribution of shares that is scaled between the extremes based on the other three databases. Similar to the given explanation at the global scale, the SDG 14 and 15 indicator data is quite diverse across the databases. Additionally, the data availability and consistency throughout the biodiversity-related SDG pairs vary within the regions depending on the data selection.

| SDG networks
At goal and target levels, the structure of the SDG networks changes with the data selection ( Figures S9-S12 for goals, Figures 4, 5, S13, and S14 for targets). These networks enable identifying the most positively and negatively connected SDG goals and targets in all databases. In general, SDG networks at the target level scale up to the goal level because we observe more positive than negative connections within the networks based on all databases. Nevertheless, these networks also represent several instances of negative connections at the goal and target levels.
In the synergistic goal networks, the most connected SDG varies according to the selected database. We observe the highest eigenvector centrality for SDG 14 in the UN-based, SDG 7 in the WBG-based, and SDG 3, 9, and 15 in the BE-SDSN-based goal network (Figures S9-S11A, Table S6). For the unified database, SDG 3 and 15 have the largest eigenvector centrality ( Figure S12A, Table S6).
Although the synergistic networks are more complex, the trade-offs are not negligible as they could negatively impact the achievement of the 2030 Agenda. Especially, the fulfillment of the environmentally related goals (SDG 12-15) could be hindered by progress in other SDGs ( Figures S9B-S12B), as those have large eigenvector centralities, especially in the UN, WBG and unified-based goal networks (Table S6). Although the most negatively connected goals depend on the data selection, the networks display that achieving environmentalrelated SDGs will be challenging if current trends continue.
Since the goal networks aggregate the results of the target networks, both show similar but also different results. However, the most positively or negatively connected target does not necessarily belong to the most positively or negatively connected goal. Most SDG goal interactions exhibit synergies or trade-offs with shares of 30% < s < 45%, whereas at the target level, we observe more strong connections (30% < s ≤ 100%) (Data S3 and S4). The number of connections per node of goal networks does not necessarily reflect the one in the target networks. This aspect especially applies to the impeding networks, where, for example, the goal network based on the unified database shows the least connections per node. In contrast, the target network based on the unified database has the most connections per node. For these reasons, it is essential to gain more nuanced insights into interactions at the target level, as the share of synergies and trade-offs can more prominently be distinguished.
At the target level, the networks are more complex to the extent that they display more nodes, more positive than negative connections, and more diverse shares of synergies and trade-offs. The target networks based on the unified database ( Figure 4) display more nodes with connections showing synergies (97 targets) and trade-offs (87 targets). In comparison, the WBG has approximately 50% fewer nodes and connections (49 targets with synergies and 43 targets with tradeoffs, Figure 5).
Independent of the data selection, the target networks reveals that reducing child mortality (Target 3.2) is a clear structural component in the SDG system (Figures 4, 5, S13, and S14). Target 3.2 has the largest eigenvector centrality in all networks (Table S7) Feely, 2018). In other cases, the provider of data could use newer methods to measure indicators like remote sensing (Estoque, 2020), citizen participatory approaches (Thinyane & Kirschke, 2019), or big data via machine learning (United Nations, 2021;Vinuesa et al., 2020;Ziesche, 2017).
However, our unified database offers an increased data availability over time and space covering more targets and indicator data being disaggregated. Due to this increased amount of indicator data per target, we could consider more SDG indicator data pairs at all scales. The increased data availability and consistency provide an enhanced representation of the officially adopted global SDG indicator framework. Second, we detect similarities in SDG interactions among databases globally, at income and regional scales and across networks.
Although synergies outweigh trade-offs, our analysis shows that most SDG interactions are not-classified, independent of the data selection.
This finding resemblance the one by  which applied a cross-sectional correlation method using the UN database.
However, we still observe more synergies and trade-offs over time than this cross-sectional analysis. Nonetheless, these findings contrast studies that found more synergies and trade-offs than not-classified SDG interactions s (Kroll et al., 2021;Pradhan et al., 2017;Weitz et al., 2018). For example, we adopted the correlation method introduced by Pradhan et al. (2017), but did not detect as many synergies and trade-offs. One reason is the choice of a higher threshold for measuring the Spearman's ranked correlation coefficient (ρ Ronzon & Sanjuán, 2020).
Fourth, our network analysis offers new insights into how the most connected goals and targets vary according to the SDG data selection.
The networks based on the unified database provide the most holistic insights into interactions between all goals and 98 target interactions.
Our networks based on the WBG database display 12 goals and 50 targets for synergistic and impeding interactions. In comparison, the global SDG target network by Lusseau and Mancini (2019), based on the WBG database, has 71 targets. It should be noted, however, that we only considered targets with shares of synergies or trade-offs that are larger than 30%. This use of thresholds reduces the number of targets in sum but focuses on more connected ones in the network. Independent on the data selection, child mortality is a core target whose mechanisms we need to better understand to progress towards the SDGs.
Similar to the findings of Lusseau and Mancini (2019), reducing child mortality (Target 3.2) synergizes with most other SDG targets in the network. However, we observe Target 15.1 and 14.4 to be most negatively connected to the achievement of other targets, which does not become apparent in the results by Lusseau and Mancini (2019). Also, for the goal level, we see similarities in the results for synergies but differences for trade-offs, especially in the case of environmental goals.
This finding reinforces the importance to interpret any analytical results of SDG interactions at the same scale as the data used.
The generalization of our findings is constrained by the data availability and our methodological approaches. The assignment of indicator data from the UN, WBG, and BE-SDSN to the unified framework, was in some cases a subjective approach. Since UN and WBG already assigned their indicators to the target level and BE-SDSN at least to the goal level, this subjectivity is somewhat limited. Since we cannot use indicators that are either not well quantified with data (i.e., boolean indicators), some SDG interactions stay unnoticed. Further, our applied correlation analysis does not imply any causality. Since specific SDG interactions occur regardless of the data selection, their causality is still an open field of research. As we only consider bivariate SDG interactions, we can also neither conclude the directions nor multivariate aspects, particularly in the network analysis. A step forward in the network analyses, as done for example by Anderson et al. (2021), are SDG system models reflecting feedback loops and casual relations underlying synergies and trade-offs, where the progressing interaction of some targets might positively or negatively influence other targets, and goals respectively. We do, however, provide insight into strongly synergistic target connections, which we must take advantage of, which equates to nonlinear dynamics. This means that some SDG will be disproportionately affected by actions to meet other SDGs . We attempted to address these drawbacks (causality, direction, and multi-variation) by qualitatively providing literature evidence for the SDG interactions.

| CONCLUSION
Building a unified SDG framework and database, we provided an enhanced understanding of data-centric SDG interactions. Our study has outlined some of the major challenges in SDG data application, possible consequences and strategic issues for research emerging for the 2030 Agenda. From a global holistic perspective of evaluating progress towards the achievements of SDGs, it is essential to understand how selecting data might change SDG interactions and consequently the underlying messages. Especially, our findings on income or regional-based SDG interactions show that SDGs should be adapted to specific constraints of countries since the data selection changes even more drastically SDG interactions. The varying data availability, inconsistent data format, and the tension between national and global perspectives make it almost impossible for the data-driven SDG research community to create comparable results by each goal or target. Interferences from data may consciously but also unconsciously be conditioned by their providers as institution pursues their own goals. Therefore, the data and conclusions should be placed in context to minimize the risk of misinterpretation. The data selections can cause aspects of sustainable development to be omitted, oversimplified or overcomplicated, leading to misguided conclusions and investments. Further, the different levels of aggregation can tell different stories -SDG goal versus target level. Therefore, it is necessary to vary the levels of aggregation to confirm tendencies and understand at which point the results diverge or reverse. The above-highlighted obstacles of SDG data should be addressed to prioritize implementation strategies. Mainly, there is a need to develop a unified SDG monitoring framework to implement the 2030 Agenda successfully. At the mean time, efforts are also needed to develop more science-based goals, targets, and indicators for the next generation of global goals for sustainability-the post-2030 Agenda. These goals should ensure societal well-being, prosperity, and public and planetary health and be based on scientific evidence, for example, planetary boundaries, accounting for sustainable governance of global and local commons.

CONFLICT OF INTEREST
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.