I am an A.B.D. PhD candidate in Electrical and Computer Engineering, advised by Professor Bruno Sinopoli and Professor Soummya Kar. My research interests focus on inference and control of multi-agent systems, distributed optimization, and reinforcement learning. My thesis topic develops an understanding of how influence can, or cannot, achieve control objectives on the evolving group behavior of interconnected learning processes. This engineering framework is developed with considerations of the real-world constraints of scalability, computation, and estimation.

In 2019, I received a NSF Graduate Research Fellowship.

I graduated from UC San Diego in 2017 with a B.S. in Electrical Engineering.

Systems of intelligent agents regularly interact within environments subject to their own policies. Multi-agent reinforcement learning (MARL) and game theory are two paradigms that describe how agents may learn (optimal) policies. The selfish choices of agents, however, may be contrary to the social good of the community. To change the equilibrium behavior of the group, an intelligent controller must understand the learning mechanisms at the individual level, characterize how global intervention disrupts these learned action processes, and choose control policies that induce the desired change within the group's behavior.

In this project, I studied how games and multi-agent Markov decision processes (MDPs) can be controlled by defining a "third party" MDP from a central planner's (CP's) perspective. While basic control objectives, i.e. game-play at desireable states, may be achieved via simple policies, better performance is achieved when agents are partitioned into clusters and each cluster receives personalized controls from the CP. To combat the rising complexity of this problem, I investigated approximate policy computation techniques that eliminated an exponential complexity dependence on the number of clusters, and found cases in which these techniques found optimal policies. I also investigated the submodularity properties of the multi-agent MDP's value function, and used those results to propose a method to assign agents to clusters with provable improvement in value and reduced computation complexity.

While a control policy may be optimal, it may only maintain its great value under the precarious assumption that the system is time-homogeneous. MDPs with time-varying transition kernels can often be handled by refreshing learned models, but changes to the composition of the agents means that the structure of the MDP itself may change. When an agent (semi-)permanently leaves the system, we say that node dropout occurs, such as when a user deactivates their account or physical infrastructure is damaged. Even if the optimal pre-dropout policy can be applied after the agent leaves, there are no guarantees on how well it will perform.

In this project I study model-free policy design (i.e. when the transition matrix is unknown) that is robust to agent dropout, and how to pre-calculate optimal policies in the event dropout occurs. Using techniques from game theory, I can first measure the importance of each agent to the CP based on their ability to minimize the CP's value function. Then, robustness criteria can be defined based either on agent importance, or on the probability of each agent leaving the system. The robustness criteria is embedded into the definition of the MDP, which can be solved for the resulting robust policy.

In solving the MDP with model-free techniques, we wish to maintain a similar theme of only exerting policies in which we have a high confidence of producing a good value. To calculate the confidence bounds on the post-dropout MDP, I developed a policy importance sampling technique that can evaluate policies for the post-dropout MDP given samples from the pre-dropout MDP. Current work uses these bounds in a safe policy search routine to find the desired robust policies.

In a machine learning application, we think of using equations to model a physical system, optimizing the model so it accurately describes the system (e.g. via parameters), and then using the optimized model to infer useful information about the physical system. In contrast, properties of certain physical systems are known to evolve over time according to laws of physics. For example, electrical circuits follow Kirchoff's current and voltage laws, and capacitors' currents are proportional to the derivative of their voltages. Circuits are a promising example in particular, as they can be designed to produce almost any desired output signal. We can thus flip the usual machine learning paradigm; instead of using equations to model a physical system, we can model a desired equation with a physical system.

Given a desired objective function, we can construct a scaled gradient flow ODE to trace the continuous time trajectory of the optimization variable. Next, an equivalent circuit (EC) is constructed such that the voltage at a key node is equivalent to the optimization variable, and a related adjoint EC models its time derivative. Based on these ECs, control schemes are proposed based on minimizing the charge stored in the EC capacitors. To solve the controlled gradient flow, we adapt discretization techniques standard in circuit simulation to the optimization field. Continued and future work develop these ideas for RL and distributed/federated learning applications.

- C. Fiscko, S. Kar, and B. Sinopoli. "Clustered Control of Transition-Independent MDPs," 2022. Preprint available on Arxiv.
- C. Fiscko*, A. Agarwal*, S. Kar, L. Pileggi, and B. Sinopoli.
*Journal Expansion of ECCO: Equivalent Circuit Controlled Optimization.*Preprint to be posted soon.

- C. Fiscko*, A. Agarwal*, S. Kar, L. Pileggi, and B. Sinopoli. "ECCO: Equivalent Circuit Controlled Optimization." Preprint available on Arxiv. *Co-first authors.
- C. Fiscko, S. Kar, and B. Sinopoli, "On Confident Policy Evaluation for Factored Markov Decision Processes with Node Dropouts," in IEEE Conference on Decision and Control, Dec. 2022, Cancun, Mexico.
- C. Fiscko, S. Kar, and B. Sinopoli. "Identifying Impactful Agents Via Faux Adversarial Games." 2022 58th Annual Allerton Conference on Communication, Control, and Computing. Monticello, IL. September 2022.
- C. Fiscko, S. Kar, and B. Sinopoli, "Efficient solutions for targeted control of multi-agent MDPs," in American Control Conference, May 2021, New Orleans.
- C. Fiscko, S. Kar, and B. Sinopoli, "Learning transition statistics in networks of interacting agents," in Allerton Conference on Communication, Control, and Computing, Sept. 2019.
- C. Fiscko, B. Swenson, S. Kar, and B. Sinopoli, "Control of parametric games," in European Control Conference, Jun. 25 - 28, 2019, Naples, Italy.