Skip to content Skip to sidebar Skip to footer

Hive - How To Efficiently Create Table As Select?

I have a hive table, htable that's partitioned on foo and bar. I want to create a small subset of this table for experiments, so I would think the thing to do would be create tabl

Solution 1:

Add distribute by foo, bar:

insert into new_table partition (foo, bar) select * from htable
     whererand() < 0.01 and foo in (a,b) 
    distribute by foo, bar

this will reduce memory consumption.

Post a Comment for "Hive - How To Efficiently Create Table As Select?"