Publications

Orderliness of Campus Lifestyle Predicts Academic Performance: A Case Study in Chinese University

Different from the western education system, Chinese teachers and parents strongly encourage students to have a regular lifestyle. However, due to the lack of large-scale behavioral data, the relation between living patterns and academic performance remains poorly understood. In this chapter, we analyze large-scale behavioral records of 18,960 students within a Chinese university campus. In particular, we introduce orderliness, a novel entropy-based metric, to measure the regularity of campus lifestyle. Empirical analyses demonstrate that orderliness is significantly and positively correlated with academic performance, and it can improve the prediction accuracy on academic performance at the presence of diligence, another behavioral metric that estimates students’ studying hardness. This work supports the eastern pedagogy that emphasizes the value of regular lifestyle.

Bilateral Relatedness: Knowledge Diffusion and the Evolution of Bilateral Trade

During the last two decades, two important contributions have reshaped our understanding of international trade. First, countries trade more with those with whom they share history, language, and culture, suggesting that trade is limited by information frictions. Second, countries are more likely to start exporting products that are related to their current exports, suggesting that shared capabilities and knowledge diffusion constrain export diversification. Here, we join both of these streams of literature by developing three measures of bilateral relatedness and using them to ask whether the destinations to which a country will increase its exports of a product are predicted by these forms of relatedness. The first form is product relatedness, and asks whether a country already exports many similar products to a destination. The second is importer relatedness, and asks whether the country exports the same product to the neighbors of the target destination. The third is exporter relatedness, and asks whether a country’s neighbors are already exporting the same product to the destination. We use bilateral trade data from 2000 to 2015, and a variety of controls in multiple gravity specifications, to show that countries are more likely to increase their exports of a product to a destination when they have more product relatedness, importer relatedness, and exporter relatedness. Then, we use several sample splits to explore whether the effects of these forms of relatedness are stronger for products of higher complexity, technological sophistication, and differentiation. We find that, in the case of product relatedness, the effects are stronger for differentiated, complex, and technologically sophisticated products. Also, we find the effects of common language and shared colonial past to increase with differentiation, complexity, and technological sophistication, while the effects of shared borders decrease with these three variables. These results suggest that product relatedness and common language capture dimensions of knowledge relatedness that are more important for the exchange of more sophisticated and differentiated products. These findings extend the ideas of relatedness to bilateral trade and show that the evolution of bilateral trade networks are shaped by relatedness among products, exporters, and importers.

Computational Socioeconomics

Uncovering the structure of socioeconomic systems and timely estimation of socioeconomic status are significant for economic development. The understanding of socioeconomic processes provides foundations to quantify global economic development, to map regional industrial structure, and to infer individual socioeconomic status. In this review, we will make a brief manifesto about a new interdisciplinary research field named Computational Socioeconomics, followed by detailed introduction about data resources, computational tools, data-driven methods, theoretical models and novel applications at multiple resolutions, including the quantification of global economic inequality and complexity, the map of regional industrial structure and urban perception, the estimation of individual socioeconomic status and demographic, and the real-time monitoring of emergent events. This review, together with pioneering works we have highlighted, will draw increasing interdisciplinary attentions and induce a methodological shift in future socioeconomic studies.

Research on the Spatial Structure and Dynamics of Socio-Economic Systems

Socio-economic systems are an important branch of complex systems, which involves the complex interactions between people's economic activities and the social environment in which they live. With the constant change of cognition and behavior, people's subjective decision-making process greatly affects the operation of socio-economic systems. To accurately and timely perceive socioeconomic situation and to reveal and understand the law of socioeconomic development have great theoretical and practical values. Revealing the status of socioeconomic development in many aspects and predicting the development trends with desirable accuracy can greatly help to guide socioeconomic decision-making. Uncovering the socioeconomic behavioral patterns of individuals can contribute to gradually realizing predictive management. Quantifying the macro socioeconomic structure can help to explore the path of economic development. How to effectively analyze the structure and evolution of socio-economic systems is an important scientific issue in the interdisciplinary research field, and it has recently received great attention from many related disciplines including computer science, network science, complexity science, statistical physics and socioeconomics.

Regional Economic Status Inference from Information Flow and Talent Mobility

Novel data has been leveraged to estimate the socioeconomic status in a timely manner, however, direct comparison on the use of social relations and talent movements remains rare. In this letter, we estimate the regional economic status based on the structural features of two networks. One is the online information flow network built on the following relations on social media, and the other is the offline talent mobility network built on the anonymized résumé data of job seekers with higher education. We find that while the structural features of both networks are relevant to the economic status, the talent mobility network in a relatively smaller size exhibits a stronger predictive power for the gross domestic product (GDP). In particular, a composite index of structural features can explain up to about 84% of the variance in GDP. The result suggests that future socioeconomic studies should pay more attention to the cost-effective talent mobility data.

Online Data Reveal Key Factors on Salary Expectation

The enrichment of data resources and the innovation of analytic methods are gradually facilitating the transformation of socioeconomics into a data-driven and quantitative discipline. As a part of quantitative human resources, the investigation of salary has a significant role on social and economic development. However, previous studies are mainly based on census data with limited sizes and lack of considerations in a different economic and cultural background. Based on large-scale resume data that were crawled from websites of Chinese human resource service providers, this paper analyzes key factors on job seekers’ salary expectation. Results suggest that height, working experiences, and educational degree affect salary expectation, and there are significant gender differences. In particular, females have lower salary expectation on average and lag behind males for five years’ working experience or one educational degree. Finally, the robustness of the analytical results is checked using the multivariate regression method.

Application of Carrier Data on Precise Poverty Alleviation and Emergency Management

Accurate perception of socioeconomic status and timely identification of emergencies are critical to smart social governance, however, traditional public sector data and statistical analysis methods cannot meet the accuracy and real-time requirements. Recently, large-scale data accumulated by the private sector, with many advantages including low acquisition cost, real-time updates and high spatio-temporal resolution, provide new directions for tackling the problem. This paper overviews the application of carrier data in combination with deep mining analysis algorithms on precise poverty alleviation and emergency management, and further discusses some prospects of applying carrier data to quantitatively evaluate the effect of poverty alleviation and disaster relief and to improve the decision-making efficiency and governance capability.

Orderness Predicts Academic Performance: Behavioral Analysis on Campus Lifestyle

Quantitative understanding of relationships between students’ behavioural patterns and academic performances is a significant step towards personalized education. In contrast to previous studies that were mainly based on questionnaire surveys, recent literature suggests that unobtrusive digital data bring us unprecedented opportunities to study students’ lifestyles in the campus. In this paper, we collect behavioural records from undergraduate students’ (N = 18 960) smart cards and propose two high-level behavioural characters, orderliness and diligence. The former is a novel entropy-based metric that measures the regularity of campus daily life, which is estimated here based on temporal records of taking showers and having meals. Empirical analyses on such large-scale unobtrusive behavioural data demonstrate that academic performance (GPA) is significantly correlated with orderliness. Furthermore, we show that orderliness is an important feature to predict academic performance, which improves the prediction accuracy even in the presence of students’ diligence. Based on these analyses, education administrators could quantitatively understand the major factors leading to excellent or poor performance, detect undesirable abnormal behaviours in time and thus implement effective interventions to better guide students’ campus lives at an early stage when necessary.

A Trust-Based Recommendation Method using Network Diffusion Processes

A variety of rating-based recommendation methods have been extensively studied including the well-known collaborative filtering approaches and some network diffusion-based methods, however, social trust relations are not sufficiently considered when making recommendations. In this paper, we contribute to the literature by proposing a trust-based recommendation method, named CosRA+T, after integrating the information of trust relations into the resource-redistribution process. Specifically, a tunable parameter is used to scale the resources received by trusted users before the redistribution back to the objects. Interestingly, we find an optimal scaling parameter for the proposed CosRA+T method to achieve its best recommendation accuracy, and the optimal value seems to be universal under several evaluation metrics across different datasets. Moreover, results of extensive experiments on the two real-world rating datasets with trust relations, Epinions and FriendFeed, suggest that CosRA+T has a remarkable improvement in overall accuracy, diversity and novelty. Our work takes a step towards designing better recommendation algorithms by employing multiple resources of social network information.

Height Conditions Salary Expectations: Evidence from Large-Scale Data in China

Height premium has been revealed by extensive literature, however, evidence from China based on large-scale data remains still lacking. In this paper, we study how height conditions salary expectations by exploring a dataset covering over 140,000 Chinese job seekers. By using graphical and regression models, we find evidence in support of height premium that tall people expect a significantly higher salary in career development. In particular, regression results suggest stronger effects of height premium on female than on male, however, the gender differences decrease as the education level increases and become insignificant after holding all control variables fixed. Further, results from graphical models suggest three promising ways in helping short people: (i) to accumulate more working experiences, since one year seniority can respectively make up about 3 cm and 7 cm shortness for female and male; (ii) to increase the level of education, since one higher academic degree may eliminate all disadvantages that brought by shortness; (iii) to target jobs in regions with a higher level of development. Our work provides a cross-culture supportive evidence of height premium and contributes two novel features to the literature: the compensation story in helping short people, and the focus on salary expectations in isolation from discrimination channels.

Quantifying China's Regional Economic Complexity

China has experienced an outstanding economic expansion during the past decades, however, literature on non-monetary metrics that reveal the status of China’s regional economic development are still lacking. In this paper, we fill this gap by quantifying the economic complexity of China’s provinces through analyzing 25 years’ firm data. First, we estimate the regional economic complexity index (ECI), and show that the overall time evolution of provinces’ ECI is relatively stable and slow. Then, after linking ECI to the economic development and the income inequality, we find that the explanatory power of ECI is positive for the former but negative for the latter. Next, we compare different measures of economic diversity and explore their relationships with monetary macroeconomic indicators. Results show that the ECI index and the non-linear iteration based Fitness index are comparative, and they both have stronger explanatory power than other benchmark measures. Further multivariate regressions suggest the robustness of our results after controlling other socioeconomic factors. Our work moves forward a step towards better understanding China’s regional economic development and non-monetary macroeconomic indicators.

Maximizing the Collective Learning Effects in Regional Economic Development

Collective learning in economic development has been revealed by recent empirical studies, however, investigations on how to benefit most from its effects remain still lacking. In this paper, we explore the maximization of the collective learning effects using a simple propagation model to study the diversification of industries on real networks built on Brazilian labor data. For the inter-regional learning, we find an optimal strategy that makes a balance between core and periphery industries in the initial activation, considering the core-periphery structure of the industry space - a network representation of the relatedness between industries. For the inter-regional learning, we find an optimal strategy that makes a balance between nearby and distant regions in establishing new spatial connections, considering the spatial structure of the integrated adjacent network that connects all regions. Our findings suggest that the near to by random strategies are likely to make the best use of the collective learning effects in advancing regional economic development practices.

Link Prediction in Weighted Networks via Structural Perturbations

Link prediction aims at revealing missing and unknown information from observed network data, or predicting possible evolutions in near future. In recent years, extensive studies of link prediction algorithms have been performed on unweighted networks. However most empirical systems are necessarily to be described as weighted networks rather than solely the topology. In this paper we extend the structural perturbation method to weighted networks. We found that by including weight information the prediction accuracy can be significantly improved on networks with homogeneous weight distributions, meanwhile less improvements for heterogeneous weighted networks. Also we compared the weighted structural perturbation method to some benchmark algorithms, both weighted and unweighted, and found generally better performance in accuracy.

Stamp Out Fake Peer Review

In the wake of large-scale retraction scandals, we urge scientific publishers to be more proactive in stamping out fake peer-reviewing practices. They should work with editors, authors and research institutes to implement an effective system of precautions and penalties.

Evaluating User Reputation in Online Rating Systems via An Iterative Group-Based Ranking Method

Reputation is a valuable asset in online social lives and it has drawn increased attention. Due to the existence of noisy ratings and spamming attacks, how to evaluate user reputation in online rating systems is especially significant. However, most of the previous ranking-based methods either follow a debatable assumption or have unsatisfied robustness. In this paper, we propose an iterative group-based ranking method by introducing an iterative reputation–allocation process into the original group-based ranking method. More specifically, the reputation of users is calculated based on the weighted sizes of the user rating groups after grouping all users by their rating similarities, and the high reputation users’ ratings have larger weights in dominating the corresponding user rating groups. The reputation of users and the user rating group sizes are iteratively updated until they become stable. Results on two real data sets with artificial spammers suggest that the proposed method has better performance than the state-of-the-art methods and its robustness is considerably improved comparing with the original group-based ranking method. Our work highlights the positive role of considering users’ grouping behaviors towards a better online user reputation evaluation.

Collective Learning in China's Regional Economic Development

Industrial development is the process by which economies learn how to produce new products and services. But how do economies learn? And who do they learn from? The literature on economic geography and economic development has emphasized two learning channels: inter-industry learning, which involves learning from related industries; and inter-regional learning, which involves learning from neighboring regions. Here we use 25 years of data describing the evolution of China's economy between 1990 and 2015–a period when China multiplied its GDP per capita by a factor of ten–to explore how Chinese provinces diversified their economies. First, we show that the probability that a province will develop a new industry increases with the number of related industries that are already present in that province, a fact that is suggestive of inter-industry learning. Also, we show that the probability that a province will develop an industry increases with the number of neighboring provinces that are developed in that industry, a fact suggestive of inter-regional learning. Moreover, we find that the combination of these two channels exhibit diminishing returns, meaning that the contribution of either of these learning channels is redundant when the other one is present. Finally, we address endogeneity concerns by using the introduction of high-speed rail as an instrument to isolate the effects of inter-regional learning. Our differences-in-differences (DID) analysis reveals that the introduction of high speed-rail increased the industrial similarity of pairs of provinces connected by high-speed rail. Also, industries in provinces that were connected by rail increased their productivity when they were connected by rail to other provinces where that industry was already present. These findings suggest that inter-regional and inter-industry learning played a role in China's great economic expansion.

A Vertex Similarity Index for Better Personalized Recommendation

Recommender systems benefit us in tackling the problem of information overload by predicting our potential choices among diverse niche objects. So far, a variety of personalized recommendation algorithms have been proposed and most of them are based on similarities, such as collaborative filtering and mass diffusion. Here, we propose a novel vertex similarity index named CosRA, which combines advantages of both the cosine index and the resource-allocation (RA) index. By applying the CosRA index to real recommender systems including MovieLens, Netflix and RYM, we show that the CosRA-based method has better performance in accuracy, diversity and novelty than some benchmark methods. Moreover, the CosRA index is free of parameters, which is a significant advantage in real applications. Further experiments show that the introduction of two turnable parameters cannot remarkably improve the overall performance of the CosRA index.

Big Data Reveal the Status of Economic Development

With the advent of the era of big data, both the quantity and quality of economic activity related data have been enormously enriched and improved. By analyzing these large-scale data from socio-economic systems, we have the opportunity to quantify the status of economic development instantaneously and accurately with nearly no cost. In this paper, focusing on how big data reveal the status of economic development, we briefly summary the applications of different types of big data on quantifying macro-economic structures and micro-social status. Further, we discuss and provide some promising ways to apply big data to improve regional economic development strategies and upgrade macro industrial structures.

Critical Size of Ego Communication Networks

With the help of information and communication technologies, studies on the overall social networks have been extensively reported recently. However, investigations on the directed Ego Communication Networks (ECNs) remain insufficient, where an ECN stands for a sub network composed of a centralized individual and his/her direct contacts. In this paper, the directed ECNs are built on the Call Detail Records (CDRs), which cover more than 7 million people of a provincial capital city in China for half a year. Results show that there is a critical size for ECN at about 150, above which the average emotional closeness between ego and alters drops, the balanced relationship between ego and network collapses, and the proportion of strong ties decreases. This paper not only demonstrate the significance of ECN size in affecting its properties, but also shows accordance with the “Dunbar's Number”. These results can be viewed as a cross-culture supportive evidence to the well-known Social Brain Hypothesis (SBH).

Promotion and Resignation in Employee Networks

Enterprises have put more and more emphasis on data analysis so as to obtain effective management advices. Managers and researchers are trying to dig out the major factors that lead to employees’ promotion and resignation. Most previous analyses are based on questionnaire survey, which usually consists of a small fraction of samples and contains biases caused by psychological defense. In this paper, we successfully collect a data set consisting of all the employees’ work-related interactions (action network, AN for short) and online social connections (social network, SN for short) of a company, which inspires us to reveal the correlations between structural features and employees’ career development, namely promotion and resignation. Through statistical analysis, we show that the structural features of both AN and SN are correlated and predictive to employees’ promotion and resignation, and the AN has higher correlation and predictability. More specifically, the in-degree in AN is the most relevant indicator for promotion, while the k-shell index in AN and in-degree in SN are both very predictive to resignation. Our results provide a novel and actionable understanding of enterprise management and suggest that to enhance the interplays among employees, no matter work-related or social interplays, can be helpful to reduce staffs’ turnover risk.

Bootstrap Percolation on Spatial Networks

Bootstrap percolation is a general representation of some networked activation process, which has found applications in explaining many important social phenomena, such as the propagation of information. Inspired by some recent findings on spatial structure of online social networks, here we study bootstrap percolation on undirected spatial networks, with the probability density function of long-range links’ lengths being a power law with tunable exponent. Setting the size of the giant active component as the order parameter, we find a parameter-dependent critical value for the power-law exponent, above which there is a double phase transition, mixed of a second-order phase transition and a hybrid phase transition with two varying critical points, otherwise there is only a second-order phase transition. We further find a parameter-independent critical value around −1, about which the two critical points for the double phase transition are almost constant. To our surprise, this critical value −1 is just equal or very close to the values of many real online social networks, including LiveJournal, HP Labs email network, Belgian mobile phone network, etc. This work helps us in better understanding the self-organization of spatial structure of online social networks, in terms of the effective function for information spreading.

Group-Based Ranking Method for Online Rating Systems with Spamming Attacks

The ranking problem has attracted much attention in real systems. How to design a robust ranking method is especially significant for online rating systems under the threat of spamming attacks. By building reputation systems for users, many well-performed ranking methods have been applied to address this issue. In this letter, we propose a group-based ranking method that evaluates users’ reputations based on their grouping behaviors. More specifically, users are assigned with high reputation scores if they always fall into large rating groups. Results on three real data sets indicate that the present method is more accurate and robust than the correlation-based method in the presence of spamming attacks.

Long-Term Effects of Recommendation on the Evolution of Online Systems

We employ a bipartite network to describe an online commercial system. Instead of investigating accuracy and diversity in each recommendation, we focus on studying the influence of recommendation on the evolution of the online bipartite network. The analysis is based on two benchmark datasets and several well-known recommendation algorithms. The structure properties investigated include item degree heterogeneity, clustering coefficient and degree correlation. This work highlights the importance of studying the effects and performance of recommendation in long-term evolution.

Finding Conspirators in the Network via Machine Learning

A conspiracy network is embedded in a network of employees of a company, with each edge representing a message sent from one employee (node) to another and categorized by topics. Given a few known criminals, a few known non-criminals, and suspicious topics, we seek to estimate the probability of criminal involvement for other individuals and to determine the leader of the conspirators.

Comprehensive Scholarship Evaluation Model in Colleges and Universities

This paper proposes simple and feasible college student comprehensive quality evaluation model,firstly,course grade scores are quantified,the weight factors for the courses are introduced,comprehensive achievements calculation model is set up,then the significance of each quality indicator of students is analyzed,analytic hierarchy process is used to give weight indicator,comprehensive quality scores are calculated by models,finally,according to comprehensive achievements and quality scores of students,evaluation mechanism for selected students is introduced,students scholarship is allocated and scientific results fitting for cultivation objective of colleges and universities are obtained.