Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Twitter token multiplexing with database #44

Open
wants to merge 252 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
252 commits
Select commit Hold shift + click to select a range
db3d121
Merge branch 'mmou-twitter' of github.com:c4fcm/CivilServant into mmo…
mmou Apr 16, 2017
c4c814a
add twitter models, connect, controller
mmou Apr 17, 2017
6e5612b
make string types bigger, redo alembic, other small changes
mmou Apr 17, 2017
50158a3
add TwitterUserState.PROTECTED
mmou Apr 17, 2017
b2ce785
partial work and notes, committing so i can switch branches
mmou Apr 20, 2017
3ed1b1b
basic work to refactor connect classes. probably doesn't pass all tests
mmou Apr 28, 2017
2dfa0ea
split twitter, lumen code into 4 main controller methods. rough work …
mmou Apr 28, 2017
2d251e3
make query_and_archive_tweets work. nothing else confirmed to work.
mmou May 2, 2017
132a844
5 controller methods seem to work.
mmou May 3, 2017
e948782
updated alembic scripts to keep dev, production, and test synchronized
May 12, 2017
40b3fc3
make twitter lumen code work. add schedule_twitter_jobs.py. tests don…
mmou May 12, 2017
5b21661
fix bugs. redo alembic again...
May 12, 2017
85a0837
fix small bugs in lumen twitter code
mmou May 12, 2017
3c1441d
edit alembic versions. add some log statements.
May 30, 2017
51dd05f
fix fetch_twitter_tweets bugs. add backfill option. fix random typos
mmou Jun 7, 2017
42d12d7
add tweets backfill option. fix other small things.
Jun 15, 2017
a90a8c1
Merge branch 'mmou-twitter' of https://github.com/mitmedialab/CivilSe…
Jun 15, 2017
3829fa3
twitter observational analysis basic profiling code
mmou Jun 15, 2017
54129a0
initial work on TwitterObservationalAnalysisController
mmou Jun 20, 2017
741d6bf
data migration code to remove duplicate twitter user records.
mmou Jun 28, 2017
fd8e4fc
fixes to twitter_controller, twitter_observational_analysis_controller.
mmou Jun 28, 2017
7e8234c
merge diffs in twitter_observational_analysis_controller
Jul 10, 2017
bf59e44
fix twitter/lumen bugs. update all tests. all tests pass. anonymize f…
mmou Jul 18, 2017
729d0c7
make twitter queries less expensive.
mmou Jul 19, 2017
03d1d19
start on email_db_report.generate_twitter_report. untested.
mmou Aug 16, 2017
3e40beb
add tests for twitter_observational_analysis_controller. could be ext…
mmou Aug 25, 2017
b7e6949
merge with master. fix alembic. all tests pass.
mmou Aug 25, 2017
761696d
added twitter token multiplexing support
natematias Aug 28, 2017
1593ddd
remove merge text from email_db_report
mmou Aug 28, 2017
0a271cc
Merge branch 'mmou-twitter' of github.com:c4fcm/CivilServant into mmo…
mmou Aug 28, 2017
770024b
updated code to hannahmore status
Aug 28, 2017
92baf3e
Merge branch 'mmou-twitter' of https://github.com/mitmedialab/CivilSe…
Aug 28, 2017
32ac011
update twitter_connect to fix rate limit bug
mmou Aug 29, 2017
a9a724f
added model LumenNoticeExpandedURL
natematias Jan 13, 2018
f2c0423
Added requests and requests_futures modules
pushshift Jan 16, 2018
dd2a11e
Merge branch 'mmou-twitter' of https://github.com/mitmedialab/CivilSe…
pushshift Jan 16, 2018
2d0d7c8
Code to bulkunshorten URLs
pushshift Jan 16, 2018
7c8103d
Fixed description (dict not array of dicts)
pushshift Jan 16, 2018
017d27f
Replace int where CONST variable belonged
pushshift Jan 16, 2018
61be467
Added normalization of urls and ability to handle relative paths in l…
pushshift Jan 17, 2018
2693875
Removed recursive call -- not needed
pushshift Jan 25, 2018
85bb5fe
Integrated URL unshortening into Lumen Controller
pushshift Jan 25, 2018
58b1b94
updated tests to surface cs job state issue
natematias Feb 17, 2018
ac602dd
consistent LumenNotice job state in case of exception
jonathanzong Feb 17, 2018
1458618
clean up lumen controller and test
jonathanzong Feb 17, 2018
89cb1d7
added fixture data for user lookup test
natematias Feb 17, 2018
35b361f
Merge branch 'mmou-twitter' of https://github.com/mitmedialab/CivilSe…
natematias Feb 17, 2018
ef3ec21
fix test lumen
jonathanzong Feb 17, 2018
04fb149
consistent job state even when exception for archive twitter users
jonathanzong Feb 17, 2018
7a6557d
clean extra var in twitter controller
jonathanzong Feb 17, 2018
bc3b88f
fixed timeouts for job schedulers and experiment schedulers
natematias Feb 17, 2018
c3237d7
fix tests
jonathanzong Feb 17, 2018
1d5c13d
fixed twitter rate limit token rotation bug
natematias Feb 18, 2018
e04e2bf
fixed token rotation problem
natematias Feb 18, 2018
84a176c
consistent job state for archive tweets
jonathanzong Feb 18, 2018
139dc18
Merge branch 'mmou-twitter' of github.com:mitmedialab/CivilServant in…
jonathanzong Feb 18, 2018
425162e
updated requirements.txt for proper airbrake verison
Feb 18, 2018
ef58410
fixed code to correctly handle multiple lumen notices pointing to a s…
Feb 18, 2018
a24240e
bugfixes. Updated email report script
Feb 19, 2018
9a177fd
load up key files
notconfusing Jul 11, 2018
0327ad1
make auto incrementing
notconfusing Jul 12, 2018
8e19157
token rotation working
notconfusing Jul 15, 2018
c0fbdd2
loading to db test working
notconfusing Jul 16, 2018
03920e9
all three tests working
notconfusing Jul 16, 2018
fee5630
test timing of all exhausted case
notconfusing Jul 16, 2018
67781a1
comments for jnm
notconfusing Jul 18, 2018
9e5507a
committing current changes on civictechai
Jul 23, 2018
d94e047
Merge branch 'mmou-twitter' of https://github.com/mitmedialab/CivilSe…
natematias Jul 23, 2018
e163ad9
added a script to backfill tweets
natematias Jul 23, 2018
e5258e3
resolved multiple head revisions by adjusting the down_revision for t…
natematias Jul 23, 2018
6c088a2
test_lumen, test_twitter, test_twitter_connect, and relevant test_con…
natematias Jul 24, 2018
32d7008
updated lumen connect to handle API changes
natematias Jul 24, 2018
728cd11
fixed paren problem
natematias Jul 24, 2018
cef096d
added lumen auth example config file
natematias Jul 25, 2018
25b4e30
add seperate check in method to be used by twitter controller and upd…
notconfusing Jul 27, 2018
604d5fd
Schedulable twitter functions now check-in their tokens after running…
notconfusing Jul 27, 2018
4a2504a
Schedulable twitter functions now check-in their tokens after running…
notconfusing Jul 31, 2018
8ec6444
Lumen hack to work around next_page brokenness
notconfusing Aug 1, 2018
83b5465
updated lumen controller to correctly handle errors in URL resolution
natematias Aug 6, 2018
5f19f86
merged updated branch and fixed unicode bug
natematias Aug 8, 2018
7746589
Handle over capacity error with constant backoff
notconfusing Aug 9, 2018
079e307
Insert timing statements
notconfusing Aug 17, 2018
526e5f3
Handle internal error, and in addition refactored handling
notconfusing Aug 20, 2018
5a8364f
Fix None-url error.
notconfusing Aug 20, 2018
aa2601b
start multiquery foundation by creting 'last attempted process' state
notconfusing Aug 21, 2018
50e5a7a
new ORM abstraction confirmed doing same thing
notconfusing Aug 22, 2018
49e47c7
switch query tweets to looping style
notconfusing Aug 23, 2018
0f51c14
connect needs to reraise errors
notconfusing Aug 23, 2018
27d71b9
connect needs to reraise errors
notconfusing Aug 23, 2018
b35dfe5
fix alembic snafu
notconfusing Aug 23, 2018
10fd6f9
remove frontfilling from snapshotting
notconfusing Aug 23, 2018
c2aeea1
forgot type int
notconfusing Aug 23, 2018
4208627
Merge branch 'mmou-twitter' of github.com:mitmedialab/CivilServant in…
notconfusing Aug 24, 2018
09d4daa
concurrent fetch tweets and a CLI for starting and stopping everything
notconfusing Aug 27, 2018
b2eb016
get CS_ENV out of script if not present
notconfusing Aug 27, 2018
c8c8a2d
get CS_ENV out of script if not present
notconfusing Aug 28, 2018
e95e423
add index to twitterstatus record_created_at
notconfusing Aug 28, 2018
030f228
close sessions with checkin to try and avoid server gone away error.
notconfusing Aug 28, 2018
3314e5e
add first draft of email log report
notconfusing Aug 29, 2018
ba6c288
one script to email them all
notconfusing Aug 29, 2018
eb17ab1
make script run from absolute
notconfusing Aug 29, 2018
69dea16
utils mistake
notconfusing Aug 29, 2018
5cb13ec
really fix sourcing problem i hope
notconfusing Aug 29, 2018
e08c4a8
make email log work on server
natematias Aug 29, 2018
65b25a4
really fix sourcing problem i hope
notconfusing Aug 29, 2018
b711dbb
try to catch detachedinstance error
notconfusing Aug 30, 2018
d827407
close session once done
notconfusing Aug 30, 2018
b34f4fe
don't close on checkin
notconfusing Aug 31, 2018
6233088
don't close on checkin
notconfusing Aug 31, 2018
a488471
Merge remote-tracking branch 'origin/mmou-twitter' into mmou-twitter
notconfusing Aug 31, 2018
8edb5f8
introduce invalidation token protocol
notconfusing Sep 1, 2018
3fe0e34
bump up task multicount
natematias Sep 1, 2018
ede4be2
seperate connections for conn and control
notconfusing Sep 1, 2018
db647c7
always get a new key
notconfusing Sep 2, 2018
295671d
seperate connections for conn and control
notconfusing Sep 2, 2018
2364ecc
log and retry db errors
notconfusing Sep 2, 2018
fc56ac4
don't ever stop logging, not even after 20mb
notconfusing Sep 5, 2018
dd939d8
tests for scheduler
notconfusing Sep 6, 2018
dddd8d3
frontfiller is experiment length sensitive
notconfusing Sep 7, 2018
6afcc27
bugfix on thread param passing
notconfusing Sep 7, 2018
2148e21
bugfix on thread param passing
notconfusing Sep 7, 2018
454c608
Merge branch 'mmou-twitter' of https://github.com/mitmedialab/CivilSe…
natematias Sep 7, 2018
75ea069
add collection seconds statement
notconfusing Sep 8, 2018
1e6456b
Merge branch 'mmou-twitter' of https://github.com/mitmedialab/CivilSe…
natematias Sep 8, 2018
5339398
lumen syntax creation and more logging for collection_seconds
notconfusing Sep 8, 2018
4a2b2ee
Merge branch 'mmou-twitter' of https://github.com/mitmedialab/CivilSe…
natematias Sep 8, 2018
728a083
lumen syntax creation and more logging for collection_seconds
notconfusing Sep 8, 2018
79c0ada
Merge branch 'mmou-twitter' of https://github.com/mitmedialab/CivilSe…
natematias Sep 8, 2018
cce87c9
lumen syntax creation and more logging for collection_seconds
notconfusing Sep 8, 2018
2c3fb39
Merge remote-tracking branch 'origin/mmou-twitter' into mmou-twitter
notconfusing Sep 8, 2018
2d31d76
Merge branch 'mmou-twitter' of https://github.com/mitmedialab/CivilSe…
natematias Sep 8, 2018
668c23b
restartabe experiments
notconfusing Sep 10, 2018
36af9f1
Merge branch 'mmou-twitter' of https://github.com/mitmedialab/CivilSe…
natematias Sep 10, 2018
27fba58
exit if in past, frontfill reports
notconfusing Sep 11, 2018
7b64ebc
create index on twitter_statuses created date
notconfusing Sep 11, 2018
28ef37a
lumen syntax creation and more logging for collection_seconds
notconfusing Sep 12, 2018
ace4f5b
lumen syntax creation and more logging for collection_seconds
notconfusing Sep 14, 2018
e58a36b
email reports host generalizations
notconfusing Sep 14, 2018
6c8091b
Merge branch 'mmou-twitter' of https://github.com/mitmedialab/CivilSe…
notconfusing Sep 14, 2018
e00499a
skip fancy things before holiday
notconfusing Sep 14, 2018
73c58d1
better formatting on recusrse
notconfusing Sep 14, 2018
58f9608
* accept set error in initializiation handling
notconfusing Oct 3, 2018
972e03b
fix log report error\n fix logfile error stderr forwarding
notconfusing Oct 3, 2018
3781241
Merge branch 'mmou-twitter' of https://github.com/mitmedialab/CivilSe…
notconfusing Oct 3, 2018
d0010e6
better formatting on recusrse
notconfusing Oct 5, 2018
66a989f
Merge branch 'mmou-twitter' of https://github.com/mitmedialab/CivilSe…
notconfusing Oct 5, 2018
ac29c97
production to use random tokens not sequential
notconfusing Oct 6, 2018
fb2fc85
stack_trace shouldn't cause error, and email log uses backlogs too.
notconfusing Oct 8, 2018
7a5c2f8
add randomization to users
notconfusing Oct 16, 2018
693e222
stack_trace shouldn't cause error, and email log uses backlogs too.
notconfusing Oct 18, 2018
fd9b1b0
stop logging the compiled sql statements
notconfusing Oct 26, 2018
2814e79
Don't repeat once a job is put on queue.
notconfusing Oct 31, 2018
7a17555
Create fill table in database to verify how many fills are occurring
notconfusing Jan 3, 2019
2c1e7b1
Move fill_start_time to be defined before threads are instantiated.
notconfusing Jan 4, 2019
95568f8
Move fill_start_time to be defined before threads are instantiated.
notconfusing Jan 11, 2019
d8fc037
Experiment languages and job state add
notconfusing Jan 11, 2019
72503d7
url unshortening runner and arithemetic bug fix
notconfusing Mar 28, 2019
7c00189
catch invalid url
notconfusing Apr 8, 2019
97e1cfe
keep user state in redis
notconfusing Apr 9, 2019
aee232f
keep user state in redis
notconfusing Apr 9, 2019
688deb4
Merge remote-tracking branch 'origin/mmou-twitter' into mmou-twitter
notconfusing Apr 9, 2019
de59903
cannot use fstrings
notconfusing Apr 9, 2019
42012d1
remove redis, but find leftovers that bulkunshorten collapsed.
notconfusing Apr 10, 2019
4b746a5
add interface for a single user
notconfusing Apr 10, 2019
e405485
reduce hops limit and timeout to speed things up
notconfusing Apr 10, 2019
f87f5eb
found where the pop was happening, don't rekey redirects to their red…
notconfusing Apr 10, 2019
45c08c3
catch missing schema exception
notconfusing Apr 15, 2019
d4823c4
try speed up with set difference
notconfusing Apr 15, 2019
cc975a6
try speed up with set difference
notconfusing Apr 19, 2019
4dc04c3
try speed up with set difference
notconfusing Apr 24, 2019
1fbe629
no fstrings
notconfusing Apr 24, 2019
1e918fa
no fstrings
notconfusing Apr 24, 2019
ef54b6c
Merge remote-tracking branch 'origin/mmou-twitter' into mmou-twitter
notconfusing Apr 24, 2019
fba2c90
actually check key
notconfusing Apr 24, 2019
accf019
actually check key
notconfusing May 7, 2019
3e4de02
output
notconfusing May 7, 2019
88ecb9e
create new unshortener version that maintains dimensionality
notconfusing Aug 6, 2019
5f1aee8
implement extract twitter urls methods (#47)
mmou Aug 7, 2019
82aaa02
Merge pull request #48 from mitmedialab/unshortener-stable-dims
notconfusing Aug 28, 2019
d6b5188
new unshorten_urls controller to convert expanded to unshortened
notconfusing Aug 28, 2019
4911583
f'ing fstring support
notconfusing Aug 28, 2019
13b7db3
f'ing fstring support
notconfusing Aug 28, 2019
c7246aa
actually check key
notconfusing Aug 28, 2019
eb39fe0
make sure status codes give errors too
notconfusing Aug 28, 2019
d4c4623
close the session properly
notconfusing Aug 29, 2019
9ea0919
add idempotent by default
notconfusing Aug 29, 2019
6ff906e
Revert "add idempotent by default"
notconfusing Aug 29, 2019
a9ca6d9
add idempotent by default
notconfusing Aug 29, 2019
08ada1c
restore development cleanliness
notconfusing Aug 29, 2019
62deb7c
don't call unshorten unnecssarily
notconfusing Aug 29, 2019
1474a92
some daftie used to colons after https::
notconfusing Aug 29, 2019
56c7994
invalid schema catch
notconfusing Aug 29, 2019
96ec479
small defaults tweak to speed up rate
notconfusing Aug 30, 2019
b0d1089
cathc unicode errors too
notconfusing Aug 30, 2019
0d39970
cathc unicode errors too
notconfusing Aug 30, 2019
f7bb89d
new catch everything approach
notconfusing Aug 30, 2019
fdcffa2
add intermediate redirectiong state
notconfusing Aug 30, 2019
5e08845
add get_tlds method (#50)
mmou Oct 1, 2019
9338a97
Auto stash before merge of "mmou-twitter-analysis" and "origin/mmou-t…
notconfusing Nov 29, 2019
761dc39
move ipynbs
notconfusing Nov 29, 2019
662fac5
Merge branch 'mmou-twitter-analysis' into notconfusing-unshortening-a…
notconfusing Nov 29, 2019
4c1204f
Merge branch 'notconfusing-unshortening-again' into mmou-twitter
notconfusing Nov 29, 2019
1ba57d9
first version of comparison onboarder
notconfusing Mar 20, 2020
c6f4689
set up queries to match
notconfusing Mar 20, 2020
4c7c124
write matching code simplest version
notconfusing Mar 26, 2020
73cf061
finishe reporting logic
notconfusing Mar 27, 2020
42f039c
add twitter commands to dmca cmd
notconfusing Apr 14, 2020
634113f
allow generation to happen in a loop
notconfusing Apr 24, 2020
ddc2136
update db email report
notconfusing May 6, 2020
d795505
ugg mysqldb needs to be an old version
notconfusing May 8, 2020
13df32b
need matching args in scheduler code
notconfusing May 22, 2020
cab646b
new way to check for empty copyrighted urls
notconfusing May 22, 2020
6ce2e9d
do some validation
notconfusing May 23, 2020
fd1f8aa
don't log failed tries and report more on ratestate
notconfusing May 29, 2020
a571cbd
give more info about fialing random id users
notconfusing Jun 1, 2020
6960f7f
give more info about fialing random id users
notconfusing Jun 2, 2020
b397257
add retryable logic to twitter branch
notconfusing Jun 3, 2020
6b8c1a8
pipe in retryable
notconfusing Jun 3, 2020
21bb21b
as most recent et speedup
notconfusing Jun 11, 2020
f8d9b33
skinny column version of try fix
notconfusing Jun 11, 2020
e0563e1
abandon tu-et join strategy, use internal guratantee instead
notconfusing Jun 12, 2020
ee1b111
update email db report
notconfusing Jun 12, 2020
2aba958
use <br> to make readable
notconfusing Jun 12, 2020
2ed1416
report fixes and schedule fixes
notconfusing Jun 16, 2020
c6c0752
handle error where twitter response may have changed
notconfusing Jun 25, 2020
cbe38d0
add failed queue count to db report
notconfusing Jun 25, 2020
fe2d301
dynamic control group sizes
notconfusing Jun 29, 2020
be1d36d
fix misspelling
notconfusing Jun 30, 2020
94d126d
need a mu
notconfusing Jun 30, 2020
f4ae664
some extra add retryables
notconfusing Jun 30, 2020
209c271
run the filler more often for les time
notconfusing Jul 3, 2020
cc8c7e8
run the filler more often for les time
notconfusing Jul 3, 2020
1cd4c63
timedelta needs seconds
notconfusing Jul 4, 2020
a5513a9
add retryable to fill query
notconfusing Jul 6, 2020
edca2c0
add fills
notconfusing Jul 8, 2020
10de0d7
add in backfills
notconfusing Jul 8, 2020
1e1c070
Merge remote-tracking branch 'origin/mmou-twitter' into mmou-twitter
notconfusing Jul 8, 2020
99a86c1
log exceptions with log.exceptions
notconfusing Jul 9, 2020
a3796e3
log info on shorter running things
notconfusing Jul 13, 2020
e502b13
start adding things for the new observation study
notconfusing Aug 10, 2020
3212e2c
update cmd
notconfusing Aug 11, 2020
ee23070
Merge branch 'mmou-twitter' of https://github.com/mitmedialab/CivilSe…
notconfusing Aug 11, 2020
99d626f
fix header being written on retry
notconfusing Aug 11, 2020
a2dc5ae
rand min and max from commandline
notconfusing Aug 12, 2020
7df5f71
check done users just within range
notconfusing Aug 12, 2020
29594f2
add back in mysteriously missing snapshots
notconfusing Aug 20, 2020
6ae55a9
fetch tweets more often
notconfusing Aug 20, 2020
16cabee
Merge branch 'mmou-twitter' of https://github.com/mitmedialab/CivilSe…
notconfusing Aug 20, 2020
8236729
just archive some processed users, not everyone ever.
notconfusing Aug 20, 2020
409fd6a
more retryability
notconfusing Oct 29, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
log exceptions with log.exceptions
notconfusing committed Jul 9, 2020
commit 99a86c1ff7aa5af138680b007291f3b5d335b25e
9 changes: 4 additions & 5 deletions utils/common.py
Original file line number Diff line number Diff line change
@@ -92,8 +92,8 @@ def update_all_CS_JobState(row_to_state, field, db_session, log):
log.info("Updated {0} {1} {2} fields to new CS_JobState.".format(len(row_to_state),
type(list(row_to_state.keys())[0]), field))
except:
log.error("Error while saving DB Session for updating {0} {1} {2} fields to new CS_JobState.".format(
len(row_to_state), type(list(row_to_state.keys())[0]), field), extra=sys.exc_info()[0])
log.exception("Error while saving DB Session for updating {0} {1} {2} fields to new CS_JobState.".format(
len(row_to_state), type(list(row_to_state.keys())[0]), field))


def update_CS_JobState(rows, field, to_state, db_session, log):
@@ -107,10 +107,9 @@ def update_CS_JobState(rows, field, to_state, db_session, log):
db_session.commit()
log.info("Updated {0} {1} {2} fields to {3}.".format(len(rows), type(rows[0]), field, to_state))
except:
log.error(
log.exception(
"Error while saving DB Session for updating {0} {1} {2} fields to {3}.".format(len(rows), type(rows[0]),
field, to_state),
extra=sys.exc_info()[0])
field, to_state))


def reset_CS_JobState_In_Progress(rows, field, db_session, log):