HBase Java API from Site

  • All operations that mutate data are guaranteed to be atomic on a per-row basis.
    • It does not matter if another client or thread is reading from or writing to the same row: they either read a consistent last mutation, or may have to wait before being able to apply their change.
    • During normal operations and load, a reading client will not be affected by another updating a particular row since their contention is nearly negligible.
    • It also does not matter how many columns are written for the particular row

HBaseAdmin

Configuration conf = HBaseConfiguration.create();
HBaseAdmin admin = new HBaseAdmin(conf);
HTableDescriptor tableDescriptor = new HTableDescriptor(TableName.valueOf("people"));
tableDescriptor.addFamily(new HColumnDescriptor("personal"));
tableDescriptor.addFamily(new HColumnDescriptor("contactinfo"));
tableDescriptor.addFamily(new HColumnDescriptor("creditcard"));
admin.createTable(tableDescriptor);
  • It is recommended that you create HTable instances only once—and one per thread—and reuse that in- stance for the rest of the lifetime of your client application.
    • Each instantiation involves scan- ning the .META. table to check if the table actually exists!

Put

Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "people");
//instantiate a Put providing the unique row key to the constructor.
Put put = new Put(Bytes.toBytes("doe-john-m-12345"));
// Add values, which must include the column family, column qualifier, and the value all as byte arrays.
put.add(Bytes.toBytes("personal"), Bytes.toBytes("givenName"), Bytes.toBytes("John"));
put.add(Bytes.toBytes("personal"), Bytes.toBytes("mi"), Bytes.toBytes("M"));
put.add(Bytes.toBytes("personal"), Bytes.toBytes("surame"), Bytes.toBytes("Doe"));
put.add(Bytes.toBytes("contactinfo"), Bytes.toBytes("email"), Bytes.toBytes("[email protected]"));
// put the data into the table,
table.put(put);
// flush the commits to ensure locally buffered changes take effect, and finally close the table.
table.flushCommits();
table.close();
  • Unlike relational databases in which updates must update entire rows even if only one column changed, if you only need to update a single column then that's all you specify in the Put and HBase will only update that column.

Get

Get get = new Get(Bytes.toBytes("doe-john-m-12345"));
//  cuts down the amount of work HBase must do when reading information from disk.
get.addFamily(Bytes.toBytes("personal"));
get.setMaxVersions(3);
Result result = table.get(get);

Scan

  • The only method in HBase to retrieve multiple rows of data is scanning by sorted row keys, how you design the row key values is very important.
    Scan scan = new Scan(Bytes.toBytes("smith-"));
    // restrict the columns returned (thus reducing the amount of disk transfer HBase must perform)
    scan.addColumn(Bytes.toBytes("personal"), Bytes.toBytes("givenName"));
    scan.addColumn(Bytes.toBytes("contactinfo"), Bytes.toBytes("email"));
    scan.setFilter(new PageFilter(25));
    ResultScanner scanner = table.getScanner(scan);
    for (Result result : scanner) {
      // ...
    }
    
  • HBase supports the notion of partial keys, meaning you do not need to know the exact key, to provide more flexibility creating appropriate scans.

Connection Handling

HBase provides the HConnection class which provides functionality similar to connection pool classes to share connections

  • for example you use the getTable() method to get a reference to an HTable instance. There is also an HConnectionManager class which is how you get instances of HConnection.
  • Similar to avoiding network round trips in web applications, effectively managing the number of RPCs and amount of data returned when using HBase is important, and something to consider when writing HBase applications.

results matching ""

    No results matching ""