why metadata is important in big data systems
HDFS has namenode, Hive has metastore, both them have metadata services included.
For big data systems it's important to have a metadata service for indexing the physical data to improve the query performance.
The system has no metadata:
The system has metadata included:
As we see with metadata (Catalog) integrated the querying efficiency can be much improved, since the needed data have been copied across cluster to local for group/sort/join purpose.