Interconnectedness in the Interbank Market 

You can read the working paper on SSRN.

We study the behavior of the interbank market before, during and after the 2008 financial crisis. Leveraging recent advances in network analysis, we study two network structures, a correlation network based on publicly traded bank returns, and a physical network based on interbank lending transactions. While the two networks behave similarly pre-crisis, during the crisis the correlation network shows an increase in interconnectedness while the physical network highlights a marked decrease in interconnectedness. Moreover, these networks respond differently to monetary and macroeconomic shocks. Physical networks forecast liquidity problems while correlation networks forecast financial crises.  

"Physical" Networks of banks in the e-MID: 

Interbank Lending Networks

Correlation-based Networks: 

Granger-causality networks over time

More than Just Words: On Discovering Themes in Online Reviews to Explain Restaurant Closures 

Pay attention to the text! Models that include text information predict significantly better than models that simply include standard numerical information.

Image showing ROC curves for GLMER models

Joint work with Jorge Mejia and Anand Gopal.

Abstract: Online reviews and their effect on business outcomes have long been of interest to information systems scholars.  In this study, we complement the existing research on online reviews by proposing a novel use of modern text analysis methods to uncover the semantic structure of online reviews and assess their impact on the survival of merchants in the marketplace. We analyze online reviews from 2005 to 2013 for restaurants in a major metropolitan area in the United States and find that variables capturing semantic structure within review text are important predictors of the survival of restaurants, a relationship that has not been explored in the extant literature. We refer to these semantic dimensions as service themes. We thus combine machine learning approaches and econometric modeling to derive predictive models that are significantly better than models that simply include numerical information from reviews such as review valence, volume, word counts and readability. Our results suggest that our text mining methodology, if and when applied to production-level environments on large datasets, can extract valuable information pertaining to these themes from the online reviews generated by consumers. The products of such techniques can help business managers (e.g. restaurateurs) and platform owners (e.g. better utilize their review information to monitor business performance and inform consumer choice.

Do U.S. Financial Regulators Listen to the Public? Testing the Regulatory Process with the RegRank Algorithm

Here is a video summary of the project and some media coverage from the Wall Street Journal and Reuters. You can find a working paper on SSRN.

Abstract: We examine the notice-and-comment process and its impact on influencing regulatory decisions by analyzing the text of public rule-making documents of the Commodity Futures Trading Commission (CFTC) and associated comments. For this task, we develop a data mining framework and an algorithm called RegRank, which learns the thematic structure of regulatory rules and public comments and then assigns tone weights to each theme to come up with an aggregate score for each document. Based on these scores we test the hypothesis that the CFTC adjusts the final rule issued in the direction of tone expressed in public comments. Our findings strongly support this hypothesis and further suggest that this mostly occurs in response to comments from the regulated financial industry. We posit that the RegRank algorithm and related text mining methods have the potential to empower the public to test whether it has been given the "due process" and hence keep government agencies in check.

Discovering Structure in Complex and Dynamic Environments

Much of my work has focused on an increasingly common form of data, where a number of variables are measured for each sample (banks, firms, etc.) across different time points or conditions. Such data can be organized into three-way data (e.g., three
dimensional arrays), with the first two dimensions corresponding as usual to samples and variables, respectively, while the third dimension corresponds to time or experimental conditions. Network (graphs with nodes and edges) sequences can also be conceptualized as a sequence of adjacency matrices, e.g., a three-dimensional array.  The main goal is to create and apply computationally efficient methods and tools that help provide unique insights for improved decision making.

An as example, a matrix factorization technique from my work has been used to create visualizations that capture the different life-cycles of research articles. Consider the sequence of monthly graphs from the e-print service arXiv for the "high energy physics theory" section, covering papers from October 1993 to December 2002.

Visualization of arXiv Citation Networks

The top panel shows abstractions of the network for two communities. The paper (node) trajectories are smoothed effectively and the important ones are highlighted. Each component corresponds to a separate group in the data. With the exception of the highly popular outlier, the first component contains papers that mostly peak in their popularity by 1998. In other words, citations to these papers slowed dramatically around 1998, while research focus shifted to other topics and articles. The outlier continued to be cited throughout the data. The second component captures papers from 1998 onwards. Similarly, a small number of articles achieve massive impact.

Examples of my work in this area can be found in the following links: JCGS, Sigmod Workshop, Algorithmic Finance, IEEE Proceedings.