大数据-Hadoop+Hive+Spark集群安装（三）

接上文：大数据-Hadoop+Hive+Spark集群安装（二）

一、Hive命令：

beeline

!connect jdbc:hive2://localhost:10000

用户名密码登陆 hadoop,123456789

见hive-site.xml配置

hive.jdbc_passwd.auth.hadoop

123456789

本地直接Hive命令

建表：

CREATE TABLE goods_ids(

id string,

goods_id BIGINT,

source int,

cat_id_1 int,

cat_id_2 int,

cat_id_3 int,

cat_id_4 int,

cat_name_1 string,

cat_name_2 string,

cat_name_3 string,

cat_name_4 string,

category string

) row format delimited fields terminated by "," stored as orc;

orc是压缩快,快速列存取 ,效率比rcfile高,是rcfile的改良版本，加上红色部分即可。如果不加默认为TextFile格式。

如果是textfile格式，可以直接将txt数据文件导入到表中，导入命令：

load data local inpath '/data/data/goods_ids.txt' into table goods_ids1;

可以建一个orc的表，再建一个textfile的表goods_ids1,把文件导入goods_ids，在通过sql将数据同步到orc表中，删除textfile表：insert into goods_ids select * from goods_ids1 。

索引：

create index goods_ids_goodsid_index on table goods_ids (goods_id)

as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'

with deferred rebuild

in table goods_ids_goodsid_index_table; 创建

alter index goods_ids_goodsid_index on goods_ids rebuild; rebuild成功

show index on goods_ids; 查看

drop index goods_ids_goodsid_index on goods_ids; 删除

如果启动下面的错误：修改mysql hivedb VERSION表，SCHEMA_VERSION值改成2.3.0

Exception in thread "main" MetaException(message:Hive Schema version 2.3.0 does not match metastore's schema version 1.2.0 Metastore is not upgraded or corrupt)

或者（永久性办法）

hive-site.xml补充：

hive.metastore.schema.verification

false

二、优化-1

大字段的分区代替索引，索引用起来比较繁琐，新增数据索引表也不会自动变化，需要重新构建。

大字段分区可以会根据虚拟字段进行分文件夹存储。

CREATE TABLE goods_info_daysales(

id string,

goods_id BIGINT,

goods_name string,

sales int,

`date` date

) PARTITIONED BY (day string) row format delimited fields terminated by "," stored as orc;

虚拟day分区字段。

insert into goods_info_daysales partition(day='2020-08-24') select * from goods_info_daysales1 where `date`='2020-08-24';

select * from goods_info_daysales where day='2020-08-24';查询速度根据分区条件进行定位，增加查询速度。

三、运维

-- 删除库drop database if exists db_name;

-- 强制删除库drop database if exists db_name cascade;

-- 删除表drop table if exists employee;

-- 清空表truncate table employee;

-- 清空表，第二种方式insert overwrite table employee select * from employee where 1=0;

-- 删除分区alter table employee_table drop partition (stat_year_month>='2018-01');

-- 按条件删除数据insert overwrite table employee_table select * from employee_table where id>'180203a15f';

四、java

java -cp "yourFile.jar:jars/*" package.className

五、参考配置

配置文件

注意：本文归作者所有，未经作者允许，不得转载

大数据-Hadoop+Hive+Spark集群安装（三）

全部评论: 0 条

热门文章

最新发布

热门用户