You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently we have 2.5 billion transactions in pinot , generated by importer.
We need to write code to either duplicate this data or generate fake data.
The text was updated successfully, but these errors were encountered:
The current transactions take up 9800 files,. from "transaction_1.avro" through "transaction_9799.avro" .
I am going to write a Java program that takes 3 arguments:
input filename
output filename
number of 3-year shifts to apply to the consensus_timestamp field.
For every record in the file, it will apply that offset (e.g. "1" -> all dates get shifted forward by 3 years, "-1" -> all dates get shifted back by 3 years), and we will eventually end up making 39 copies of the data. 16 going back in time (the earliest timestamp would be 48 years before September 2019, so September 1971, still ahead of the Unix epoch), and 23 going forward in time (to August 2091). Once the program is written, we can test it on a single file, verify it works, then determine if we want to all of copy the existing "transactions" (with a -3-year time shift) or just a fraction of them.
Eventually, we will want 39 copies of the data ingested into pinot, but it doesn't all have to be within one directory at the same time.
Currently we have 2.5 billion transactions in pinot , generated by importer.
We need to write code to either duplicate this data or generate fake data.
The text was updated successfully, but these errors were encountered: