Cognition and Brain Plasticity Group [Bellvitge Biomedical Research Institute - IDIBELL], L'Hospitalet de Llobregat, Barcelona, Spain; Department of Cognition, Development and Educational Psychology, Institute of Neurosciences, University of Barcelona, Barcelona, Spain. Electronic address: [Email]
Most studies that have investigated the brain mechanisms underlying learning have focused on the ability to learn simple stimulus-response associations. However, in everyday life, outcomes are often obtained through complex behavioral patterns involving a series of actions. Parallel learning systems might be important to reduce the complexity of the learning problem in such scenarios, as proposed in the framework of hierarchical reinforcement learning (HRL). The key feature of HRL is the decomposition of complex sets of action into subgoals. These subgoals are associated with the computation of pseudo-reward prediction errors (PRPEs), which allow the reinforcement of actions that led to a subgoal before the final goal itself is achieved. Here we wanted to test the hypothesis that, despite not carrying any rewarding value per se, pseudo-rewards might generate a bias in choice behavior in the absence of any advantage. Second, we also hypothesized that this bias might be related to the strength of PRPE striatal representations. In order to test these ideas, we developed a novel decision-making paradigm to assess reward prediction errors (RPEs) and PRPEs in two studies (fMRI study: n = 20; behavioral study: n = 19). Our results show that the participants developed a preference for the most pseudo-rewarding option throughout the task, even though it did not lead to more monetary rewards. fMRI analyses revealed that this preference was predicted by individual differences in the relative striatal sensitivity to PRPEs vs RPEs. Together, our results indicate that pseudo-rewards generate learning signals in the striatum and subsequently bias choice behavior despite their lack of association with actual reward.