Leveraging MARL Markov predictive models for extreme scenarios

Motivation

Random high level things I’ve learned
Q&A
Thoughts

Papers to explore, ordered by my assessment of how interesting they might be and how relevant to my research:

[ ] Morals in multi-unit markets

Are these market multi/single a kind of game? Or just markets with one/many articles?

Anyway, it seems like multimarkets have pretty bad emergent patterns?

We find that multi-unit markets result in partial norm erosion; moreover, in contrast to single-unit markets, they lead to a full erosion of morals and norm compliance. The replacement logic is the main mechanism driving this finding.

Replacement logic: if I don’t do something (bad) someone else will do it, so I might as well do it and benefit from it.
- The author are trying to understand how morals or replacement logic affects the price and decisions made in markets. It seems like the seminal studies aren’t very good and all based in single-unit markets “which are markets where each participant is restricted to trade at most one unit.”. So then, they focus on multi-unit markets “and distinguish between two forces that may drive erosion: (i) market selection and (ii) replacement logic.”
- Market selection:
[x] Prediction Policy Problems

Argue that many important policy problems are fundamentally prediction problems.

Very interesting little paper. The math is simple. They use simple machine learning techniques to make progress in predictive problems. In the beginning they have quite a nice illustration of what’s the difference between causal and predictive problems. I’m not sure which ML algorithm they’re using, they just criticize Ordinary least squares (OLS). I believe they’re just building on top of it and making it better.
[x] A Modern Bayesian Look at the Multi-Armed Bandit

This is the paper that I talk about below. In which they claim bandit problems resolve RCTs orders of magnitude faster than real RCTs and still produce good results.

I’m not very good at bandits, but it’s sortta like a random forest, AFAIU, but instead of completely failing at a certain point and maybe turning off the branch… Bandits have two kinds of fails (one is that they kept exploring something that was a nil result and the other is kept exploring a bad result) and the first one is fine and the second one is not acceptable. And I think they’re also somehow more about the average of the paths rather than the absolute values of the current leaves? Not super sure, but also, I don’t see how RL can be applied to this kind of thing. He also has some Basyesian stuff that I haven’t looked into. I want to come back to this paper.
[x] Machine Learning: An Applied Econometric Approach

Recommended by Athey I think, somewhere. Either her or Agrawal. Either way it’s more high level looking at the whole field kinda thing.

Contains a paragraph talking about how different ML models were used to estimate economic indicators in Africa:

These new sources of data are particularly relevant where reliable data on economic outcomes are missing, such as in tracking and targeting poverty in devel- oping countries (Blumenstock 2016). Jean et al. (2016) train a neural net to predict local economic outcomes from satellite data in five African countries. Machine learning also yields economic predictions from large-scale network data; for example, Blumenstock, Cadamuro, and On (2015) use cell-phone data to measure wealth, allowing them to quantify poverty in Rwanda at the individual level. Image recognition can of course be used beyond satellite data, and localized prediction of economic outcomes is relevant beyond the developing world: as one example, Glaeser, Kominers, Luca, and Naik (2016) use images from Google Street View to measure block-level income in New York City and Boston.

Also two sessions that might be useful, “Prediction in the service of estimation” and “Prediction in Policy”. Gives a bunch of examples of things I’ve talked about here in this document already, but no special points on RL.
[x] Phantom - A RL-driven multi-agent framework to model complex systems

Promising to bridge the gap between complex systems and MARL. Specifically ABM, I guess, which is a subset of complex system, so it’s actually kinda lame that they promise to be so ballsy. However, it’s all JP Morgan people and I have high expectations for these folks, it’s probably good paper.

This is very similar to what I want to do. Funnily enough, I had to read that someone else had done it for me to be able to feel capable of trying. These people took a similar approach to what I want to do in a higher level, they’ve built an ABM system using RL. They rely on networks to interactions between agents though, instead of just keeping the information as part of agent’s memories in a vector like we do. Information opacity is controlled by the network and agents can only see what they’re connected to. I think the approach Wolf and I have been taking is more natural than that.
[x] Reinforcement Learning in Economics and Finance

A not super well done overview of the (at the time) state of the art algorithms to make predictions on markets using RL. It’s also by the Freakonomics guy and that’s cringe. The paper doesn’t bring anything new to the table and I haven’t found the papers linked there particularly useful.
[x] The economics of artificial intelligence: an agenda (Book)

This is a collection of papers put together by a guy called Agrawal.

Athey has an interesting piece here talking about predictions with ML and it’s linked above as something I should read.

She, talking about the difference between prediction and causal problems:

the goal is to get a good estimate of occupancy rates, where posted prices and other factors (such as events in the local area, weather, and so on) are used to predict occupancy. For such a model, you would expect to find that higher posted prices are predictive of higher occupancy rates, since hotels tend to raise their prices as they fill up (using yield management software). In contrast, imagine that a hotel chain wishes to estimate how occupancy would change if the hotel raised prices across the board (that is, if it repro- grammed the yield management software to shift prices up by 5 percent in every state of the world). This is a question of causal inference. Clearly, even though prices and occupancy are positively correlated in a typical data set, we would not conclude that raising prices would increase occupancy. It is well known in the causal inference literature that the question about price increases cannot be answered simply by examining historical data without additional assumptions or structure. For example, if the hotel previously ran randomized experiments on pricing, the data from these experiments can be used to answer the question. More commonly, an analyst will exploit natural experiments or instrumental variables where the latter are variables that are unrelated to factors that affect consumer demand, but that shift firm costs and thus their prices. Most of the classic supervised ML literature has little to say about how to answer this question.

Here she also offers a list of problems within this literature (each of these seem like a world and I don’t know anything about them):

In another part of this paper she praises bandits like being orders of magnitude better than RCTs? (e.g see Scott 2010) which I’ll link above:

growing literature based primarily in ML studies the problem of “bandits,” which are algorithms that actively learn about which treatment is best. Online experimenta- tion work yields large benefits when the setting is such that it is possible to quickly measure outcomes, and when there are many possible treatments. In the basic bandit problem when all units have identical covariates, the problem of “online experimentation,” or “multiarmed bandits,” asks the question of how experiments be designed to assign individuals to treatments as they arrive, using data from earlier individuals to determine the probabili- ties of assigning new individuals to each treatment, balancing the need for exploration against the desire for exploitation. That is, bandits balance the need to learn against the desire to avoid giving individuals suboptimal treat- ments. This type of online experimentation has been shown to yield reliable answers orders of magnitude faster than traditional randomized controlled trials in cases where there are many possible treatments
[x] Deep Reinforcement Learning: Emerging Trends in Macroeconomics and Future Prospects

Broadly about how DRL and RL can be used in the context of macroeconomics. Funded by the international monetary fund which seems like a big deal but I have no idea who they are, silly economist me.

This paper is mostly about DRL but it has some useful small and clear definitions of TD and RL algorithms like SARSA and Q-Learning. I think it’s useful in the sense that I can get the feel for how to write my own paper.
[x] The State of Applied Econometrics: Causality and Policy Evaluation

Susan Athey argues:

The gold standard for drawing inferences about the effect of a policy is a randomized controlled experiment. However, in many cases, experiments remain difficult or impossible to implement, for financial, political, or ethical reasons, or because the population of interest is too small. For example, it would be unethical to prevent potential students from attending college in order to study the causal effect of college attendance on labor market experiences, and politically infea- sible to study the effect of the minimum wage by randomly assigning minimum wage policies to states. Thus, a large share of the empirical work in economics about policy questions relies on observational data—that is, data where policies were determined in a way other than through random assignment.

This basically says, it’s not possible to do RCTs in some scenarios, therefore, we need to build models. The model that they build in order to find causal relations is random trees. I think these trees have ramifications of the policies and several different “use cases” of what could happen in case and then the leaves get evaluated in some way I don’t understand (in order to understand the causality at play).` Her paper on “GENERALIZED RANDOM FORESTS” seem interesting in case I want to learn this. THere’s also some people who did similar work but with Bayesian trees “BART: BAYESIAN ADDITIVE REGRESSION TREES”.

Look at this cool lil image I found explaining how this tree prediction thing works. It’s quite grounded. Cool. (Though this is not about random trees). This comes from Mullainathan and Spiess work: Machine Learning: An Applied Econometric Approach.
[x] Public policy in AI economy by Austan Goolsbee

Makes a point about how the main problems in policy aren’t about prediction but about people and that though AI may allow us to have more precise predictions they doubt it will ever be able to tell us whether we should tax the rich or change the rates of taxes in a country. (Also within Agawal book). Otherwise kind of useless. Mostly talking about the job market. Misleading title.
[x] The impact of machine learning on economics

I think this will be a relevant paper to read. Most of it is focusing on stuff I already know about using ML models to improve prediction, but a little bit of it somewhere is talking about RL and IRL.

Exactly what I said above.
[ ] Calibration of Shared Equilibria in General Sum Partially Observable Markov Games

Our contributions are (1) we introduce the concept of Shared equilibrium that answers the question on the nature of equilibria reached by agents of possibly different types using a shared policy, and prove convergence to such equilibria using self-play, under certain conditions on the nature of the game. (2) we introduce CALSHEQ, a novel dual-RL-based algorithm aimed at the calibration of shared equilibria to externally specified targets, that innovates by introducing a RL-based calibrator learning jointly with learning RL agents and optimally picking parameters governing distributions of agent types, and show through experiments that CALSHEQ outperforms a Bayesian optimization baseline.
[ ] A Game Theoretic Framework for Model Based Reinforcement Learning