use goquery to analyze html. use bloom filter for Data Deduplication. use channel to realize concurrency.