开窗函数

1年前 (2020-03-28) 418次浏览 已收录 5个评论

开窗函数只能在hive中使用
开窗函数格式:
row_number() over (partition by XXX order by XXX desc) as rank
rank是从1开始

package kc;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.DataFrame;
import org.apache.spark.sql.SaveMode;
import org.apache.spark.sql.hive.HiveContext;

public class kaichuang {
public static void main(String[] args) {
 SparkConf conf = new SparkConf();
 conf.setAppName("windowfun");
 conf.set("spark.sql.shuffle.partitions","1");
 JavaSparkContext sc = new JavaSparkContext(conf);
 HiveContext hivecontext=new HiveContext(sc);
 hivecontext.sql("use saprk");
 hivecontext.sql("drop table if exists sales");
 hivecontext.sql("create table if not exists sales (riqi string,leibie string,jine Int) row format delimited fields terminated by '\t'");
 /*
  * 
  * 以类别分组
  * 
  * 1 a 10
  * 2 b 20
  * 3 a 30
  * 4 b 40
  * 5 a 50
  * 6 b 60
  * 
  * 
  * 排序后
  * 5 a 50 --rank 1
  * 3 a 30 --rank 2
  * 1 a 10 --rank 3
  * 6 b 60 --rank 1
  * 4 b 40 --rank 2
  * 2 b 20 --rank 3
  */
 DataFrame result = hivecontext.sql("select riqi,leibie,jine "
   + "from ("
   + "select riqi,leibie,jine,"
   + "row_number() over (partition by leibie order by jine desc) rank "
   + "from sales) t"
   + " where t.rank<=3");
 result.show(100);
 
 /*
  * 将结果保存到hive中
  */
 result.write().mode(SaveMode.Overwrite).saveAsTable("sales_result");
 sc.stop();
}
}
运行
/hadoop/spark-1.6.0/bin/spark-submit --master spark://node01:7077,node02:7077,node03:7077 --class kc.kaichuang kc2.jar

渣渣龙, 版权所有丨如未注明 , 均为原创丨本网站采用BY-NC-SA协议进行授权
转载请注明原文链接:开窗函数
喜欢 (0)

您必须 登录 才能发表评论!

(5)个小伙伴在吐槽
  1. 最好再详细点
    2020-03-28 16:22
  2. 奥利给
    2020-03-28 16:50
  3. 还可以
    2020-03-28 17:15
  4. 记住这个网站了
    2020-03-28 17:41
  5. 看了那么多博客,就你的能看懂
    2020-03-29 18:19