Large-scale computer-based scientific simulations require high performance computing, involve big data manipulation, and are commonly modeled as data-centric scientific workflows managed by a Scientific Workflow Management System (SWfMS). In a parallel execution, a SWfMS schedules many tasks to the computing resources and Many Task Computing (MTC) is the paradigm that contemplates this scenario. In order to manage the execution data necessary for the parallel execution management and tasks’ scheduling in MTC, an execution engine needs a scalable data structure to accommodate those many tasks. In addition to managing execution data, it has been shown that storing provenance and domain data at runtime enables powerful advantages, such as execution monitoring, discovery of anticipated results, and user steering. Although all these data may be managed using different approaches (e.g., flat log files, DBMS, or a hybrid approach), using a centralized DBMS has shown to deliver enhanced analytical capabilities at runtime, which is very valuable for end-users. However, if on the one hand using a centralized DBMS enables important advantages, on the other hand, it introduces a single point of failure and of contention, which jeopardizes performance in a large scenario. To cope with this, in this work, we propose a novel SWfMS architecture that removes the responsibility of a central node to which all other nodes need to communicate for tasks’ scheduling, which generates a point of contention; and transfer such responsibility to a distributed DBMS. By doing this, we show that our solution frequently attains an efficiency of over 80% and more than 90% of gains in relation to an architecture that relies on a centralized DBMS, in a 1,000 cores cluster. More importantly, we achieve all these results without abdicating the advantages of using a DBMS to manage execution, provenance, and domain data, jointly, at runtime.