Friday, July 24, 2009
Yale Researchers Create Database-Hadoop Hybrid
Yale University professor Daniel J. Abadi has led the development of HadoopDB, an open source parallel database management system (DBMS) that combines the data-processing capabilities of a relational database with the scalability of new technologies such as Hadoop and MapReduce. HadoopDB was developed using components from PostgreSQL, the Apache Hadoop data-sorting technology, and Hive, the international Hadoop project launched by Facebook. HadoopDB queries can be submitted as either MapReduce or in SQL language. Abadi says data processing is partially done in Hadoop and partially in "different PostgreSQL instances" spread out over several nodes in a shared-nothing cluster of machines. He says that unlike previously developed DBMS projects, HadoopDB is not a hybrid only at the language/interface level, but also at the systems implementation level. Abadi says HadoopDB combines the best of both approaches to achieve the fault tolerance of massively parallel data infrastructures, such as MapReduce, in which a single server failure has little effect on the overall grid, and is capable of performing complex analyses almost as quickly as existing commercial parallel databases. He says that as databases continue to grow, systems such as HadoopDB will "scale much better than parallel databases."