Solution 1 :
I did similar work several month ago, hive-testbench can be an option. Check the
README.md about how to make it happen.
You need to configure
$HADOOP_HOME/etc/hadoop/core-site.xml to your AWS S3 bucket, the data will be generated in AWS directly.
Pass data scale parameter to
./tpcds-setup.sh to generate date with different scale.