HBase过滤器

1年前 (2020-05-04) 976次浏览 已收录 0个评论

过滤器主要的功能就是添加更多的限制条件来减少查询得到的数据量,这些限制可以指定列族、列、时间戳、以及版本号。

例如下面的代码,获取的是列为basicInfo:name中全部的数据。

TableName tableName = TableName.valueOf(Bytes.toBytes("step1_stu"));
Table table = conn.getTable(tableName);
Scan scan = new Scan();
scan.addColumn(Bytes.toBytes("basicInfo"), Bytes.toBytes("name"));//设置扫描basicInfo:name列
ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
    System.out.println(result);
}

使用过滤器需要两个步骤

  • 创建过滤器
  • 设置过滤器
//创建过滤器
Filter filter = new RowFilter(CompareOperator.EQUAL, new BinaryComparator(Bytes.toBytes("row1")));

过滤器层次结构的最底层是 <span class="typ">Filter</span></code>接口和 <code class="prettyprint prettyprinted"><span class="typ">FilterBase</span></code>抽象类,他们实现了过滤器的空壳和骨架。</p> <p>在这里我创建了一个 <code class="prettyprint prettyprinted"><span class="typ">RowFilter</span></code>行键过滤器的构造方法: <code class="prettyprint prettyprinted"><span class="typ">RowFilter</span><span class="pun">(</span><span class="typ">CompareOperator</span><span class="pln"> op</span><span class="pun">,</span><span class="typ">ByteArrayComparable</span><span class="pln"> rowComparator</span><span class="pun">)</span></code> 第一个参数接收的是比较操作对象,第二个参数接收的是条件。</p> <p>第一个参数有很多种取值以匹配多种场景,取值表格如下:</p> <p>|操作|描述| |-|-| |CompareOperator.LESS|匹配小于设定值的值| |CompareOperator.LESS_OR_EQUAL|匹配小于或等于设定值的值| |CompareOperator.EQUAL|匹配等于设定值的值| |CompareOperator.NOT_EQUAL|匹配与设定值不相等的值| |CompareOperator.GREATER_OR_EQUAL|匹配大于或等于设定值的值| |CompareOperator.GREATER|匹配大于设定值的值| |CompareOperator.NO_OP|排除一切值|</p> <blockquote><p>值得注意的是:在 <code class="prettyprint prettyprinted"><span class="typ">HBase2</span><span class="pun">.</span><span class="lit">0</span></code>版本之前使用的是(比较过滤器) <code class="prettyprint prettyprinted"><span class="typ">CompareFilter</span></code>而不是 <code class="prettyprint prettyprinted"><span class="typ">CompareOperator</span></code>,不过 <code class="prettyprint prettyprinted"><span class="lit">2.0</span></code>之后 <code class="prettyprint prettyprinted"><span class="typ">CompareOperator</span></code>就取代了 <code class="prettyprint prettyprinted"><span class="typ">CompareFilter</span></code>, <code class="prettyprint prettyprinted"><span class="typ">CompareFilter</span></code>将会在 <code class="prettyprint prettyprinted"><span class="lit">3.0</span></code>的版本中被删除。</p></blockquote> <h4><span style="color: #000080;"><strong>行键过滤器 </strong><code class="prettyprint prettyprinted"><span class="typ"><strong>RowFilter</strong></span></code></span></h4> <p>使用行键过滤器如下代码:</p> <pre class="EnlighterJSRAW" data-enlighter-language="java">Filter filter = new RowFilter(CompareOperator.EQUAL,new BinaryComparator(Bytes.toBytes("row1"))); scan.setFilter(filter);

测试:

  • 查询 basic_info列族 gender列,且行键为 2018的值;
  • 查询 school_info列族 college列,且行键大于 2018的值;
  • 查询 basic_info列族 name列,且行键小于等于 2020的值。

提示: result.listCells()可以获取 Cell集合, CellUtil.cloneValue()方法可以获取值的二进制字节。

import java.io.IOException;
import org.apache.hadoop.cli.util.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.filter.*;
import org.apache.hadoop.hbase.util.*;
public class Task {
    public void query(String tName) throws Exception {
        Configuration config = new Configuration();
        Connection conn = ConnectionFactory.createConnection(config);
        TableName tableName = TableName.valueOf(tName);
        Table table = conn.getTable(tableName);
        Scan scan1 = new Scan();
        scan1.addColumn(Bytes.toBytes("basic_info"), Bytes.toBytes("gender"));
        Filter filter1 =  new RowFilter(CompareOperator.EQUAL,
                new BinaryComparator(Bytes.toBytes("2018")));
        scan1.setFilter(filter1);
        ResultScanner scanner1 = table.getScanner(scan1);
        System.out.println("row:2018");
        for (Result result : scanner1) {
            for(Cell cell : result.listCells()){
                System.out.println("basic_info:gender " + new String(CellUtil.cloneValue(cell),"utf-8") );
            }
        }
        scanner1.close();
        Scan scan2 = new Scan();
        scan2.addColumn(Bytes.toBytes("school_info"), Bytes.toBytes("college"));
        Filter filter2 =  new RowFilter(CompareOperator.GREATER,
                new BinaryComparator(Bytes.toBytes("2018")));
        scan2.setFilter(filter2);
        ResultScanner scanner2 = table.getScanner(scan2);
        for (Result result : scanner2) {
            System.out.println("row:" + new String(result.getRow(),"utf-8"));
            for(Cell cell : result.listCells()){
                System.out.println("school_info:college " + new String(CellUtil.cloneValue(cell),"utf-8") );
            }
        }
        scanner2.close();
        Scan scan3 = new Scan();
        scan3.addColumn(Bytes.toBytes("basic_info"), Bytes.toBytes("name"));
        Filter filter3 =  new RowFilter(CompareOperator.LESS_OR_EQUAL,
                new BinaryComparator(Bytes.toBytes("2020")));
        scan3.setFilter(filter3);
        ResultScanner scanner3 = table.getScanner(scan3);
        for (Result result : scanner3) {
            System.out.println("row:" + new String(result.getRow(),"utf-8"));
            for(Cell cell : result.listCells()){
                System.out.println("basic_info:name " + new String(CellUtil.cloneValue(cell),"utf-8") );
            }
        }
        scanner3.close();
        conn.close();
    }
}

正则表达式与子字符串匹配行键

//行键过滤器
Filter filter = new RowFilter(CompareOperator.EQUAL,new BinaryComparator(Bytes.toBytes("row1")));

<span class="typ">ByteArrayComparable</span></code>对象,这是一个<strong>比较器</strong>对象,可以设置各种条件。</p> <pre>比较器有很多子类,他们的功能如下: |比较器|描述| |-|-| |BinaryComparator|使用Bytes.compareTo()比较当前值与阈值| |BinaryPrefixComparator|与上面类似,但是是从左端开始前缀匹配| |NullComparator|不做匹配,只判断当前值是不是null| |BitComparator|通过BitwiseOp类提供的按位与(AND)、或(OR)、异或(XOR)操作执行位级比较| |RegexStringComparator|根据一个正则表达式,在实例化这个比较器的时候去匹配表中的数据| |SubStringComparator|把阈值和表中数据当做String实例,同时通过contains()操作匹配字符串| 值的注意的是:最后三种比较器,即 <code class="prettyprint prettyprinted"><span class="typ">BitComparator</span></code>、 <code class="prettyprint prettyprinted"><span class="typ">RegexStringComparator</span></code> 和 <code class="prettyprint prettyprinted"><span class="typ">SubStringComparator</span></code>,只能与 <code class="prettyprint prettyprinted"><span class="pln">EQUAL</span></code>和 <code class="prettyprint prettyprinted"><span class="pln">NOT_EQUAL</span></code>运算符一起使用,因为这些比较器 <code class="prettyprint prettyprinted"><span class="pln">compareTo</span><span class="pun">()</span></code>方法匹配的时候返回结果是 <code class="prettyprint prettyprinted"><span class="lit">0</span></code>或者 <code class="prettyprint prettyprinted"><span class="lit">1</span></code>,如果和 <code class="prettyprint prettyprinted"><span class="pln">GREATER</span></code>或 <code class="prettyprint prettyprinted"><span class="pln">LESS</span></code>运算符搭配使用,就会产生错误的结果。

正则比较器

正则比较器 RegexStringComparator,顾名思义,就是用正则表达式来匹配要过滤的值。

测试:获取所有以5结尾的行键。

Filter filter = new RowFilter(CompareOperator.EQUAL,new RegexStringComparator(".*5$")); //匹配任意开头,以5结尾。
子串过滤器

子串过滤器 <span class="typ">SubstringComparator</span></code>的作用是用来匹配行键中是否包含特定的字符串:</p> <p>例如,要查询所有行键名中包含 <code class="prettyprint prettyprinted"><span class="lit">10</span></code>的行,可以使用如下方式创建过滤器:</p> <pre class="EnlighterJSRAW" data-enlighter-language="java">Filter filter = new RowFilter(CompareOperator.EQUAL,new SubstringComparator("10"));

子串过滤器的比较器模式只能使用 EQUALNOT_EQUAL运算符。

测试:

  • 查询以 1开头,并以 9结尾的行键,并输出该行所有列的值;
  • 查询包含 231的行键,并输出该行所有列的值。
import java.io.IOException;
import org.apache.hadoop.cli.util.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.filter.*;
import org.apache.hadoop.hbase.filter.SubstringComparator;
import org.apache.hadoop.hbase.util.*;
public class Task {
    public void query() throws Exception {
        Configuration config = new Configuration();
        Connection conn = ConnectionFactory.createConnection(config);
        TableName tableName = TableName.valueOf("t2_student_table");
        Table table = conn.getTable(tableName);
        Scan scan1 = new Scan();
        Filter filter1 = new RowFilter(CompareOperator.EQUAL,new RegexStringComparator("1.*9$")); //匹配任意开头,以2结尾。
        scan1.setFilter(filter1);
        ResultScanner scanner1 = table.getScanner(scan1);
        for (Result result : scanner1) {
            System.out.println("row:" + new String(result.getRow(),"utf-8"));
            for(Cell cell : result.listCells()){
                String family = Bytes.toString(CellUtil.cloneFamily(cell));
                String qualifier = Bytes.toString(CellUtil.cloneQualifier(cell));
                String value = Bytes.toString(CellUtil.cloneValue(cell));
                System.out.println(family + ":" + qualifier + " " + value);
            }
        }
        scanner1.close();
        Scan scan2 = new Scan();
        Filter filter2 = new RowFilter(CompareOperator.EQUAL,new SubstringComparator("231"));
        scan2.setFilter(filter2);
        ResultScanner scanner2 = table.getScanner(scan2);
        for (Result result : scanner2) {
            System.out.println("row:" + new String(result.getRow(),"utf-8"));
            for(Cell cell : result.listCells()){
                String family = Bytes.toString(CellUtil.cloneFamily(cell));
                String qualifier = Bytes.toString(CellUtil.cloneQualifier(cell));
                String value = Bytes.toString(CellUtil.cloneValue(cell));
                System.out.println(family + ":" + qualifier + " " + value);
            }
        }
        scanner2.close();
        conn.close();
    }
}

列族过滤器

列族过滤器 <span class="typ">FamilyFilter</span></code>与行过滤器类似,列族过滤器是通过比较列族,而不是比较行键来返回结果的。</p> <p><span style="color: #800000;">测试:获取行键为2018且列族为basic_info的数据</span></p> <pre class="EnlighterJSRAW" data-enlighter-language="java">Filter filter = new FamilyFilter(CompareOperator.EQUAL, new BinaryComparator(Bytes.toBytes("basic_info"))); Get get = new Get(Bytes.toBytes("2018")); Result result = table.get(get); System.out.println(result);

列名过滤器

列名过滤器 QualifierFilter,可以帮助用户筛选特定的列。

测试:我们现在要筛选 name列的数据。

1.获取所有 name列:

Scan scan = new Scan();
Filter filter = new QualifierFilter(CompareOperator.EQUAL, new BinaryComparator(Bytes.toBytes("name")));
scan.setFilter(filter);
ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
System.out.println(result);
}
scanner.close();

测试:获取行键为2018且列为name的值

Get get = new Get(Bytes.toBytes("2018"));
Filter filter = new QualifierFilter(CompareOperator.EQUAL, new BinaryComparator(Bytes.toBytes("name")));
get.setFilter(filter);
Result result = table.get(get);
System.out.println("查询结果:" + result);

值过滤器

值过滤器 <span class="typ">ValueFilter</span></code>,可以帮助用户筛选某个特定值的单元格,与 <code class="prettyprint prettyprinted"><span class="typ">RegexStringComparator</span></code>配合使用,可以使用功能强大的表达式来进行筛选。需要注意的是,在使用特定比较器的时候,只能与部分运算符搭配。例如子字符串比较器、正则比较器等。</p> <p><span style="color: #800000;">测试:查询值包含 <code class="prettyprint prettyprinted"><span class="typ">Ha</span></code>的结果。</span></p> <pre class="EnlighterJSRAW" data-enlighter-language="java">Filter filter = new ValueFilter(CompareOperator.EQUAL, new SubstringComparator("Ha")); Scan scan = new Scan(); scan.setFilter(filter); ResultScanner scanner = table.getScanner(scan); for (Result result : scanner) { for(Cell cell : result.listCells()){ System.out.println(new String(CellUtil.cloneValue(cell),"utf-8")); } } scanner.close();

测试:查询行为2018,值包含张的结果

Get get = new Get(Bytes.toBytes("2018"));
Filter filter = new ValueFilter(CompareOperator.EQUAL, new SubstringComparator("张"));
Result result = table.get(get);
for(Cell cell : result.listCells()){
System.out.println(String.valueOf(CellUtil.cloneValue(cell)));
}

测试:

  • 查询行键 <span class="lit">1019</span></code>中列族 <code class="prettyprint prettyprinted"><span class="pln">school_info</span></code>所有列,输出值;</li> <li>查询行键 <code class="prettyprint prettyprinted"><span class="lit">2020</span></code>中,列名包含字母 <code class="prettyprint prettyprinted"><span class="pln">c</span></code>的所有列,输出值;</li> <li>查询表所有行中包含 <code class="prettyprint prettyprinted"><span class="pun">张</span></code>的值,并输出该值。</li> </ul> <pre class="EnlighterJSRAW" data-enlighter-language="java">import java.io.IOException; import org.apache.hadoop.cli.util.*; import org.apache.hadoop.conf.*; import org.apache.hadoop.hbase.*; import org.apache.hadoop.hbase.client.*; import org.apache.hadoop.hbase.filter.*; import org.apache.hadoop.hbase.util.*; import org.apache.hadoop.hbase.filter.SubstringComparator; public class Task { public void query() throws Exception { Configuration config = new Configuration(); Connection conn = ConnectionFactory.createConnection(config); TableName tableName = TableName.valueOf(Bytes.toBytes("t3_student_table")); Table table = conn.getTable(tableName); Filter filter1 = new FamilyFilter(CompareOperator.EQUAL, new BinaryComparator(Bytes.toBytes("school_info"))); Get get1 = new Get(Bytes.toBytes("1019")); get1.setFilter(filter1); Result result1 = table.get(get1); System.out.println("row:" + new String(result1.getRow(),"utf-8")); for(Cell cell : result1.listCells()){ String family = Bytes.toString(CellUtil.cloneFamily(cell)); String qualifier = Bytes.toString(CellUtil.cloneQualifier(cell)); String value = Bytes.toString(CellUtil.cloneValue(cell)); System.out.println(family + ":" + qualifier + " " + value); } Filter filter2 = new QualifierFilter(CompareOperator.EQUAL, new SubstringComparator("c")); Get get2 = new Get(Bytes.toBytes("2020")); get2.setFilter(filter2); Result result2 = table.get(get2); System.out.println("row:" + new String(result2.getRow(),"utf-8")); for(Cell cell : result2.listCells()){ String family = Bytes.toString(CellUtil.cloneFamily(cell)); String qualifier = Bytes.toString(CellUtil.cloneQualifier(cell)); String value = Bytes.toString(CellUtil.cloneValue(cell)); System.out.println(family + ":" + qualifier + " " + value); } Scan scan3 = new Scan(); Filter filter3 = new ValueFilter(CompareOperator.EQUAL, new SubstringComparator("寮 ")); scan3.setFilter(filter3); ResultScanner scanner3 = table.getScanner(scan3); for (Result result : scanner3) { System.out.println("row:" + new String(result.getRow(),"utf-8")); for(Cell cell : result.listCells()){ String family = Bytes.toString(CellUtil.cloneFamily(cell)); String qualifier = Bytes.toString(CellUtil.cloneQualifier(cell)); String value = Bytes.toString(CellUtil.cloneValue(cell)); System.out.println(family + ":" + qualifier + " " + value); } } scanner3.close(); conn.close(); } }

    前缀过滤器

    前缀过滤器PrefixFilter,在构造过滤器时传入一个前缀,所有与前缀匹配的行都会被返回到客户端。

    例如过滤行键前缀为row-1的数据

    Filter filter = new PrefixFilter(Bytes.toBytes("row-1"));
    Scan scan = new Scan();
    scan.setFilter(filter);
    ResultScanner scanner = table.getScanner(scan);
    for (Result result : scanner) {
            for(Cell cell : result.listCells()){
                String family = Bytes.toString(CellUtil.cloneFamily(cell));
                String qualifier = Bytes.toString(CellUtil.cloneQualifier(cell));
                String value = Bytes.toString(CellUtil.cloneValue(cell));
                System.out.println(family + ":" + qualifier + " " + value);
                }
    }
    scanner.close();

    分页过滤器

    如果数据库中有非常多的数据,比如100万条,一次性将所有数据全都查询出来肯定是不理智的,所以分页查询就很有必要了,分页查询也是最常用的查询方式之一。分页过滤器PageFilter,是常用的过滤器之一,用户可以使用这个过滤器对结果进行分页。

    tips: 使用分页过滤器的时候客户端代码会记录本次扫描的最后一行,并在下一次获取数据时把记录的上次扫描的最后一行设为本次扫描的起始行,同时保留相同的过滤属性,然后依次进行迭代。分页时对依次返回的行数设定了严格的限制,一次扫描所覆盖的行数很可能是多于分页大小的,一旦这种情况发生,过滤器有一种机制通知region服务器停止扫描。

    byte[] POSTFIX = new byte[] { 0x00 };
    Table table = conn.getTable(tableName);
    Filter filter = new PageFilter(15);//构建过滤器并设置每页数据量
    int totalRows = 0;
    byte[] lastRow = null;
    while(true){
        Scan scan = new Scan();
        //添加过滤器
        scan.setFilter(filter);
        //设置查询的起始行
        if(lastRow != null){
              byte[] startRow = Bytes.add(lastRow, POSTFIX);
                System.out.println("start row: " +
                    Bytes.toStringBinary(startRow));
            scan.withStartRow(startRow);
        }
        ResultScanner scanner = table.getScanner(scan);
        int localRows = 0;
        Result result;
        while ((result = scanner.next()) != null) {
            System.out.println(localRows++ + ": " + result);
                totalRows++;
                lastRow = result.getRow();
        }
        scanner.close();
        if (localRows == 0) break;
    }

    列分页过滤器

    列分页过滤器ColumnPaginationFilter,与PageFilter类似,这个过滤器可以对一行的所有列进行分页。它的构造器需要两个参数:ColumnPaginationFilter(int limut,int offset)它将跳过所有偏移量小于offset的列,并包括之后所有偏移量在limit之前(包含limit)的列。

    Filter filter = new ColumnPaginationFilter(5,15);
    Scan scan = new Scan();
    scan.setFilter(filter);
    ResultScanner scanner = table.getScanner(scan);
    for(Result result : scanner){
        System.out.println(result);
    }

    测试:

    • 查询前缀为row5的数据;
    • 对表进行4次分页查询,每页数据量为10,并输出查询的数据。
    import java.io.IOException;
    import javax.ws.rs.POST;
    import org.apache.hadoop.cli.util.*;
    import org.apache.hadoop.conf.*;
    import org.apache.hadoop.hbase.*;
    import org.apache.hadoop.hbase.client.*;
    import org.apache.hadoop.hbase.filter.*;
    import org.apache.hadoop.hbase.util.*;
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.hbase.Cell;
    import org.apache.hadoop.hbase.HBaseConfiguration;
    import org.apache.hadoop.hbase.TableName;
    import org.apache.hadoop.hbase.client.*;
    import org.apache.hadoop.hbase.util.Bytes;
    import java.io.IOException;
    public class Task {
        public void query(String tName) throws Exception {
            Configuration config = new Configuration();
            Connection conn = ConnectionFactory.createConnection(config);
            TableName tableName = TableName.valueOf(Bytes.toBytes("test_tb1"));
            Table table = conn.getTable(tableName);
            Filter filter = new PrefixFilter(Bytes.toBytes("row5"));
    Scan scan = new Scan();
    scan.setFilter(filter);
    ResultScanner scanner = table.getScanner(scan);
    for (Result result : scanner) {
     System.out.println(Bytes.toString(result.getRow()));
            for(Cell cell : result.listCells()){
                String family = Bytes.toString(CellUtil.cloneFamily(cell));
                String qualifier = Bytes.toString(CellUtil.cloneQualifier(cell));
                String value = Bytes.toString(CellUtil.cloneValue(cell));
                System.out.println("\t" + family + ":" + qualifier + " " + value);
                }
    }
    
    //分页
            byte[] POSTFIX = new byte[] {0};
            Filter filter1 = new PageFilter(10);//构建过滤器并设置每页数据量
            int totalRows = 0;
            byte[] lastRow = null;
            int i = 4;
            while(i > 0 ){
                Scan scan1 = new Scan();
                //添加过滤器
                scan1.setFilter(filter1);
                //设置查询的起始行
                if(lastRow != null){
                    byte[] startRow = Bytes.add(lastRow, POSTFIX);
                    String info = new String(startRow,"utf-8");
                    System.out.println("开始分页查询");
                    scan1.withStartRow(startRow);
                }
                ResultScanner scanner1= table.getScanner(scan1);
                int localRows = 0;
                Result result;
                while ((result = scanner1.next()) != null) {
                    System.out.println(Bytes.toString(result.getRow()));
                    for(Cell cell : result.listCells()){
                        String family = Bytes.toString(CellUtil.cloneFamily(cell));
                        String qualifier = Bytes.toString(CellUtil.cloneQualifier(cell));
                        String value = Bytes.toString(CellUtil.cloneValue(cell));
                        System.out.println("\t" + family + ":" + qualifier + " " + value);
                    }
                    localRows++;
                    totalRows++;
                    lastRow = result.getRow();
                }
                scanner1.close();
                if (localRows == 0) break;
                i--;
            }
            conn.close();
        }
    }

    可以使用FilterList(过滤器列表)来组合多个过滤器,实现单个过滤器不能实现的功能。过滤器列表提供了组合各个过滤器的功能。与其他单一功能的过滤器一样,FilterList类实现了Filter接口,所以它可以通过组合多个过滤器的功能来实现某种效果。

    构造方法

    FilterList(List<Filter> rowFilters)
    FilterList(Operator operator)
    FilterList(Operator operator,List<Filter> rowFilters)

    参数rowFilters以列表的形式创建过滤器,参数operator(操作符)决定了组合他们的结果,第一个参数很简单,第二个参数我们没有见过,它总共有两种取值,默认值是MUST_PASS_ALLFilterList.Operator 的可选枚举值

    操作描述
    MUST_PASS_ALL当所有过滤器都允许包含这个值时,这个值才会被包含在结果中,也就是说没有过滤器会忽略这个值
    MUST_PASS_ONE只要有一个过滤器允许包括这个值,那这个值就会包含在结果中

    当创建了 FilterList 实例之后,可以用以下方法添加过滤器:void addFilter(Filter filter)每个FilterList只能添加一个操作符,但用户可以随意地向已经存在的FilterList实例中添加FilterList实例,这样可以构造一组多级的过滤器,同时它们可以与用户需要的操作符进行组合。用户也可以通过控制List中过滤器的顺序来进一步精确地控制过滤器的执行顺序。例如,使用 ArrayList 可以保证过滤器的执行顺序与它们添加到列表中的顺序一致。

    List<Filter> filters = new ArrayList<>();
    Filter rowFilter1 = new RowFilter(CompareOperator.GREATER_OR_EQUAL,
            new BinaryComparator(Bytes.toBytes("row-3")));
    filters.add(rowFilter1);
    Filter rowFilter2 = new RowFilter(CompareOperator.LESS_OR_EQUAL, new BinaryComparator(Bytes.toBytes("row-6")));
    filters.add(rowFilter2);
    Filter rowFilter3 = new RowFilter(CompareOperator.EQUAL, new BinaryComparator(Bytes.toBytes("row-3")));
    filters.add(rowFilter3);
    FilterList filterList1 = new FilterList(filters);
    Scan scan = new Scan();
    scan.setFilter(filterList1);
    ResultScanner scanner = table.getScanner(scan);
    for (Result result : scanner) {
        for(Cell cell : result.listCells()){
            String family = Bytes.toString(CellUtil.cloneFamily(cell));
            String qualifier = Bytes.toString(CellUtil.cloneQualifier(cell));
            String value = Bytes.toString(CellUtil.cloneValue(cell));
            System.out.println("\t" + family + ":" + qualifier + " " + value);
        }
    }
    scanner.close();
    FilterList filterList2 = new FilterList(FilterList.Operator.MUST_PASS_ONE,filters);
    scan.setFilter(filterList2);
    ResultScanner scanner2 = table.getScanner(scan);
    for (Result result : scanner2) {
        for(Cell cell : result.listCells()){
            String family = Bytes.toString(CellUtil.cloneFamily(cell));
            String qualifier = Bytes.toString(CellUtil.cloneQualifier(cell));
            String value = Bytes.toString(CellUtil.cloneValue(cell));
            System.out.println("\t" + family + ":" + qualifier + " " + value);
        }
    }
    conn.close();

    解释:

    第一个扫描中的过滤器过滤了许多数据,正是由于列表中任意一个过滤器过滤了该数据,该数据就会被丢弃,只有当数据经过了所有过滤器的筛选才会被传回客户端。

    第二种模式(MUST_PASS_ONE)的FilterList允许数据只需要通过一种过滤器的过滤就可以被返回。

    测试:

    1.查询以9结尾并且大于row50的数据;

    2.查询行键名中包含93或者值为value10的数据。

    package step2;
    import java.io.IOException;
    import java.util.ArrayList;
    import java.util.List;
    import org.apache.hadoop.cli.util.*;
    import org.apache.hadoop.conf.*;
    import org.apache.hadoop.hbase.*;
    import org.apache.hadoop.hbase.client.*;
    import org.apache.hadoop.hbase.filter.*;
    import org.apache.hadoop.hbase.filter.SubstringComparator;
    import org.apache.hadoop.hbase.util.*;
    public class Task {
        public void query(String tName) throws Exception {
            /********* Begin *********/
            Configuration config = new Configuration();
            Connection conn = ConnectionFactory.createConnection(config);
            TableName tableName = TableName.valueOf("test_tb2");
            Table table = conn.getTable(tableName);
            Filter regFilter = new RowFilter(CompareOperator.EQUAL
                    ,new RegexStringComparator(".*9$"));
            Filter moreThanFilter = new RowFilter(CompareOperator.GREATER
                    , new BinaryComparator(Bytes.toBytes("row50")));
            List<Filter> list = new ArrayList<>();
            list.add(regFilter);
            list.add(moreThanFilter);
            FilterList  filterList1 = new FilterList(list);
            Scan scan1 = new Scan();
            scan1.setFilter(filterList1);
            ResultScanner scanner1 = table.getScanner(scan1);
            for (Result result : scanner1) {
                System.out.println(Bytes.toString(result.getRow()));
                for(Cell cell : result.listCells()){
                    String family = Bytes.toString(CellUtil.cloneFamily(cell));
                    String qualifier = Bytes.toString(CellUtil.cloneQualifier(cell));
                    String value = Bytes.toString(CellUtil.cloneValue(cell));
                    System.out.println("\t" + family + ":" + qualifier + " " + value);
                }
            }
            scanner1.close();
            //第二次查询
            Filter subFilter = new RowFilter(CompareOperator.EQUAL,new SubstringComparator("93"));
            Filter valueFilter = new ValueFilter(CompareOperator.EQUAL, new BinaryComparator(Bytes.toBytes("value10")));
            List<Filter> list2 = new ArrayList<>();
            list2.add(subFilter);
            list2.add(valueFilter);
            FilterList filterList2 = new FilterList(FilterList.Operator.MUST_PASS_ONE,list2);
            Scan scan2 = new Scan();
            scan2.setFilter(filterList2);
            ResultScanner scanner2 = table.getScanner(scan2);
            for (Result result : scanner2) {
                System.out.println(Bytes.toString(result.getRow()));
                for(Cell cell : result.listCells()){
                    String family = Bytes.toString(CellUtil.cloneFamily(cell));
                    String qualifier = Bytes.toString(CellUtil.cloneQualifier(cell));
                    String value = Bytes.toString(CellUtil.cloneValue(cell));
                    System.out.println("\t" + family + ":" + qualifier + " " + value);
                }
            }
            scanner2.close();
            conn.close();
            /********* End *********/
        }
    }

     


    渣渣龙, 版权所有丨如未注明 , 均为原创丨本网站采用BY-NC-SA协议进行授权
    转载请注明原文链接:HBase过滤器
    喜欢 (1)

您必须 登录 才能发表评论!