Populations of simulated agents controlled by dynamical neural networks are trained by artificial evolution to access linguistic instructions and to execute them by indicating, touching, or moving specific target objects. During training the agent experiences only a subset of all object/action pairs. During postevaluation, some of the successful agents proved to be able to access and execute also linguistic instructions not experienced during training. This owes to the development of a semantic space, grounded in the sensory motor capability of the agent and organized in a systematized way in order to facilitate linguistic compositionality and behavioral generalization. Compositionality seems to be underpinned by a capability of the agents to access and execute the instructions by temporally decomposing their linguistic and behavioral aspects into their constituent parts (i.e., finding the target object and executing the required action). The comparison between two experimental conditions, in one of which the agents are required to ignore rather than to indicate objects, shows that the composition of the behavioral set significantly influences the development of compositional semantic structures.