Summary
- The Hadoop Distributed File System (HDFS) is an open-source clone of Google File System (GFS).
- HDFS is designed to run on a cluster of nodes and supports the MapReduce programming model by providing a distributed file system (DFS) for the model's I/O data.
- HDFS has a common, cluster-wide namespace; is able to store large files; is optimized for write-once, read-many access; and is designed to provide high availability in the presence of node failures.
- HDFS follows a primary-secondary topology; the NameNode handles the metadata, and the data is stored on the DataNodes.
- Files in HDFS are split into blocks (also called chunks), with a default size of 128 MB.
- Blocks are replicated by default three times (called replication factor) over the entire cluster.
- HDFS assumes a tree-style cluster topology, optimizes file access to improve performance, and attempts to place block replicas across racks.
- The original HDFS design follows immutable semantics and does not allow existing files to be opened for writes. Newer versions of HDFS support file appends.
- HDFS is strongly consistent because a file write is marked complete only after all the replicas have been written.
- The NameNode keeps track of DataNode failures using a heartbeat mechanism; if DataNodes fail to respond, they are marked as dead, and more copies of the blocks that were on that DataNode are created to maintain the desired replication factor.
- The NameNode is a single point of failure (SPOF) in the original HDFS design. A secondary NameNode can be designated to periodically copy metadata from the primary NameNode but does not provide full failover redundancy.
- HDFS provides high bandwidth for MapReduce, high reliability, low costs per byte, and good scalability.
- HDFS is inefficient with small files (owing to large default block size), is non-POSIX compliant, and does not allow for file rewrites, except for appends in the latest versions of HDFS.
- Ceph is a storage system designed for cloud applications. Ceph is based on a distributed object store with services layered on top of it.
- At the core of Ceph is RADOS, a self-managing cluster of object store daemons (OSDs) and monitor nodes. The nodes use advanced techniques to be self-managing, fault-tolerant, and scalable.
- An object in RADOS is hashed to a placement group and is then associated to an OSD using the CRUSH algorithm.
- RADOS can be accessed through librados, a native RADOS client that works with different programming languages.
- Applications can also access data in RADOS as Objects through the RADOS Gateway, which supports the S3 and Swift protocols through a REST interface.
- RADOS can also export block storage devices by using RBD. These can be used as disk images for virtual machines.
- Ceph FS is a filesystem layered over RADOS. This is achieved by using special Metadata nodes that keep track of the filesystem metadata. Metadata nodes partition the filesystem tree dynamically through a special algorithm. Metadata entries are also journaled to RADOS for fault-tolerance.