Caetano Sauer

Contact: caetanosauer@gmail.com

Publications | Curriculum Vitae | Links

Research Interests

I am currently working on Brackit, a versatile query engine which aims to provide a common foundation for query processing at an abstract level, independent of storage modules and data models.

In the context of this project, I have extended the functionality of the query compiler to generate Hadoop jobs, thus providing an XQuery-based higher-level interface to MapReduce, similar to projects like Hive, Pig, and JAQL. The basic mapping mechanism and the implementation of an initial prototype are described in my Master's thesis. Currently, I am working on the optimization of the generated Hadoop jobs and on a transparent integration in the abstract framework of the Brackit engine. Our goal is to abstract most of the logic for distributed execution in a way which is also suitable for other frameworks than Hadoop, such as parallel databases and dataflow systems like Dryad and Nephele. Based on the expressive power of XQuery, we also envision the incorporation of other query languages, such as SQL, PigLatin, and JAQL. This would allow different language syntaxes and data models to profit from the basic compilation and optimization logic of the Brackit engine.

From the storage and data management perspective, Brackit can easily be adapted to multiple database storage modules. It currently supports the management of native XML data with transaction support, using the BrackitDB module. We are currently working on the incorporation of NoSQL databases (HBase, MongoDB, and Riak) as well as embedded key-value stores (BerkeleyDB, LevelDB). Such storage modules work like plug-ins which implement a common interface of the Brackit engine. Our goal is not only to import and export data from such modules, but also to devise general techniques for specifying indexes, pushing down operations to the storage (e.g., projection and selection), and implementing transactional protocols.

Given such characteristics, we envision the implementation of a database system which can transparently handle multiple scenarios, from single-node ACID-complaint data management, to scalable Hadoop-based data processing. Such a generic infrastructure could support a variety of use cases, embracing the diversity and the complexity of Big Data applications.

Publications

Curriculum Vitae

My CV is available for download as PDF here. Please note that this is a public version, where my personal information was omitted.

Links