Commit Graph

108 Commits

Author SHA1 Message Date
Rishi Gupta
6369d23633 counts.py: Rename ZerverCountQuery to DataCollector.
Not the final form of DataCollector, but the name change causes a big diff
so separating it out.
2017-04-14 11:41:07 -07:00
Rishi Gupta
b3991e2557 counts.py: Move CountStat.group_by into ZerverCountQuery.
Part of a larger refactoring to reduce cyclic dependencies between CountStat
and DataCollector (coming soon).
2017-04-14 11:41:07 -07:00
Rishi Gupta
341e1b54fc counts.py: Remove zerver_table from ZerverCountQuery.
Was only needed for filter_args, which are now gone.
2017-04-14 11:41:07 -07:00
Rishi Gupta
661de6bf25 counts.py: Remove filter_args argument from CountStat definition.
It turned out to not be that useful once we added subgroup. The previous
design of the CountStat object also assumed more reuseability of the *_query
strings than what ended up happening.

The filter_args also had some carrying costs:

* It's hard to be confident that filter_args other than the ones explicitly
  in our tests would have had expected behavior.
* The filter_args/join_args system is the most complex part of the CountStat
  object, and makes understanding the *_query strings unnecessarily
  difficult for a new contributor.
2017-04-14 11:41:07 -07:00
Rishi Gupta
4dfadba244 counts.py: Hardcode is_active=true in count_user_by_realm_query.
A step towards removing filter_args from the CountStat object.
2017-04-14 11:41:07 -07:00
Rishi Gupta
6bb97db136 analytics: Add active_users_audit:is_bot:day. 2017-04-14 11:41:07 -07:00
Rishi Gupta
cc75d83b74 counts.py: Reorder count_stats_ to put similar stats together. 2017-04-14 11:41:07 -07:00
Rishi Gupta
2f74ccabf9 analytics: Add 15day_actives CountStat. 2017-04-14 11:41:07 -07:00
Rishi Gupta
9b661ca91f analytics: Replace CountStat.is_gauge with interval.
Groundwork for allowing stats like "Monthly Active Users".

CountStat.interval is no longer as clean a value as before, so removed it
from views.get_chart_data. It wasn't being used by the frontend anyway.

Removing interval from logger calls in counts.py is not a big loss since we
now include the frequency (which is typically also the interval) in
CountStat.property.
2017-04-14 11:41:07 -07:00
Rishi Gupta
d6c5c672d3 analytics: Add minutes_active CountStat. 2017-04-14 11:41:07 -07:00
hollywoodno
dd067c761a analytics: Separate private messages from group private messages.
This makes it possible for our graphs to show the group private
message counts as separate from 1:1 private messages.

Fixes #4102.
2017-03-20 11:46:29 -07:00
Rishi Gupta
7c6f0033ed analytics: Add test for do_drop_all_analytics_tables. 2017-03-14 16:59:54 -07:00
Rishi Gupta
87981a2bf1 analytics: Fix direct import of models in migrations. 2017-03-14 16:59:54 -07:00
Rishi Gupta
ebebd04587 analytics: Fix ValueErrors affecting test coverage.
Pathways that only catch internal code errors should use AssertionError so
that they are not included when computing test coverage.
2017-03-14 16:59:54 -07:00
Rishi Gupta
b18bfe6771 analytics: Standardize format of zerver count queries.
count_message_type_by_user_query is in a different format (no WHERE clause)
from the rest since I'm having a hard time reasoning about how that would
interact with the LEFT JOIN, especially given that there are %(join_args)s.
2017-03-14 16:59:54 -07:00
Rishi Gupta
8feea6c598 analytics: Add LoggingCountStat for number of users. 2017-03-04 16:46:09 -08:00
Raghav Jajodia
a3a03bd6a5 mypy: Added Dict, List and Set imports.
Fixed mypy errors associated with the upgrade.
2017-03-04 14:33:44 -08:00
Rishi Gupta
8bea47d6b5 analytics: Do a stylistic cleanup of TestProcessCountStat. 2017-03-03 16:12:12 -08:00
Rishi Gupta
6c784d6321 analytics: Refactor COUNT_STATS declaration to not repeat itself. 2017-03-03 16:11:28 -08:00
Rishi Gupta
20255e48a4 analytics: Change messages_sent_to_stream to a daily stat.
Analytics database tables are getting big, and so we're likely moving to a
model where ~all stats are day stats, and we keep hourly stats only for the
last N days.

Also changed the name because:
* messages_sent_* suggests the counts (summed over subgroup) should be the
  same as the other messages_sent stats, but they are different (these don't
  include PMs).
* messages_sent_by_stream:is_bot:day is longer than 32 characters, the max
  allowable length for a BaseCount.property.

Includes a database migration to remove the old stat from the analytics
tables.
2017-03-03 16:11:28 -08:00
Tim Abbott
d6e38e2a5c lint: Clean up E123 PEP-8 rule. 2017-01-23 21:34:26 -08:00
Rishi Gupta
37bdc7c010 analytics: Remove COUNT_STATS['messages_sent:hour'].
Having both messages_sent:hour and messages_sent:is_bot:day is confusing,
since a single messages_sent:is_bot:hour would have a superset of the
information and take less total space. This commit and its parent together
replace the two stats with a single messages_sent:is_bot:hour.
2017-01-17 15:54:57 -08:00
Rishi Gupta
b593ac9d7c analytics: Change messages_sent:is_bot to hourly frequency.
In preparation for replacing messages_sent.
2017-01-17 15:54:57 -08:00
Rishi Gupta
68fcb4152f analytics: Remove interval field from *Count tables.
Includes a database migration. The interval field was originally there to
facilitate time aggregation (e.g. aggregate_hour_to_day), but we now do such
aggregations in views code or in the frontend.
2017-01-17 15:54:57 -08:00
Rishi Gupta
a8f2ebb443 analytics: Include interval in COUNT_STATS property names. 2017-01-17 15:54:57 -08:00
Rishi Gupta
12d277d4f4 analytics: Change messages_sent:client stat to daily frequency.
A few reasons:
* Our two other subgroup'd message stats in UserCount are at CountStat.DAY
  frequency (messages_sent:is_bot and messages_sent:message_type).
* Keeping this stat at hourly frequency would likely double the size of our
  analytics table, given the current stats. (Counterpoint: if there are
  roughly as many active streams as active users, and we keep
  messages_sent_to_stream:is_bot at hourly frequency, then maybe this stat
  is only a 30% or 50% increase).
* We're currently only showing this on the frontend as a pie chart anyway.
2017-01-17 15:54:57 -08:00
Rishi Gupta
3f2a002c6e analytics/lib/counts.py: Fix one of the COUNT_STATS definitions.
Fixes an error in the definition of
COUNT_STATS['messages_sent_to_stream:is_bot']. The CountStat needs a
group_by argument since it is supposed to group by UserProfile.is_bot.
2017-01-10 20:41:07 -08:00
Rishi Gupta
977f5b9178 analytics/lib/counts.py: Fix error in count_message_type_by_user_query.
This query counts the number of messages each user has sent, subgroup'd by
whether the message was a private_message (PM or sent to a huddle), sent to
a 'private_stream', or sent to a 'public_stream'.

We need to join on zerver_stream to find out whether stream messages were
sent to public streams or private streams, but it needs to be a LEFT JOIN
rather than a JOIN so that we preserve the messages sent to non-streams.
2017-01-10 20:41:07 -08:00
Rishi Gupta
552d626ef2 analytics: Fix FillState.last_modified not being updated.
We were updating FillState with FillState.objects.filter(..).update(..),
which does not update the last_modified field (which has auto_now=True).
The correct incantation is the save() method of the actual FillState
object.
2017-01-08 23:36:34 -08:00
Rishi Gupta
190d320afa analytics: Change CountStat.property from Text to str. 2017-01-08 17:24:51 -08:00
Rishi Gupta
f5899dd14b analytics: Add lib/ function to drop all analytics tables. 2017-01-08 17:24:51 -08:00
Rishi Gupta
2211b8b102 analytics: Change count_message_by_stream to join on UserProfile.
It seems unlikely we will need count_message_by_stream without the
UserProfile table in the future, so write count_message_by_stream_and_is_bot
in the usual query form and replace count_message_by_stream with it.
This also has the benefit of shortening our list of "special case" queries
from two to one.

The pathways of the removed test will be covered more thoroughly in the new
TestCountStats tests.
2016-12-20 12:03:23 -08:00
Rishi Gupta
6992f9784c analytics: Update TestCountStat prototype. 2016-12-20 12:03:23 -08:00
Rishi Gupta
93a10a475a counts.py: Fix count_message_type_by_user_query. 2016-12-15 16:02:12 -08:00
Rishi Gupta
4f3e1b2ece analytics/lib/counts.py: Fix messages_sent_to_stream:is_bot.
Adds a new query.
2016-12-15 16:02:12 -08:00
Rishi Gupta
87b47ec283 analytics: Add __unicode__ method to the CountStat object. 2016-12-15 16:02:12 -08:00
anirudhjain75
beaa62cafa mypy: Convert several directories to use typing.Text.
Specifically, these directories are converted: [analytics/, scripts/,
tools/, zerver/management/, zilencer/, zproject/]
2016-12-07 20:51:05 -08:00
nikolay
abc2ff4a06 pep8: Fix many rule E128 violations.
[Tweaked by tabbott to adjust some approaches used in wrapping]
2016-12-03 13:33:31 -08:00
bulat22101
adebc75740 pep8: Fix E502 violations 2016-12-03 10:56:36 -08:00
AZtheAsian
1ba150fa85 pep8: Fix E203 violations 2016-12-01 20:37:57 -08:00
Rafid Aslam
c5316b4002 lint: Fix E127 pep8 violations.
Fix pep8: E127 continuation line over-indented for visual indent
style issue.
2016-12-01 10:23:55 -08:00
umkay
dc8463e09c analytics: Remove incorrect filter args for stat.
The filter args dictionary applies to the X table in a count X by Y query,
which in this case is the zerver_message table. This stat had an incorrect set
of arguments meant for the zerver_userprofile table.
2016-11-10 12:25:21 -08:00
umkay
e6ac8c3543 analytics: Add extra count stats.
Fill in remaining countstats in counts.py for our intended use cases.
2016-11-03 16:50:39 -07:00
umkay
298890d125 analytics: Rename count stats and associated properties.
Our current naming convention is getting unwieldy. The subgroup now goes
on the right side of the colon.
2016-11-03 16:50:39 -07:00
umkay
5490442580 analytics: Replace all joins in raw SQL with natural joins.
We alter the behavior of our queries to no longer write rows with 0 counts
to the db, and pad with 0s in the related views code. As a result we are
also able to combine the where and join clause conditions in the sql
queries. This new behavior is also updated in our tests.
2016-11-03 16:50:39 -07:00
umkay
5e5a0d4db9 analytics: Add user-level count query for messages sent to {PMs, streams}.
Adds a count_X_by_Y_query to counts.py, similar in spirit to a
count_recipient_by_user query, where we would join on the Message,
Recipient, and UserProfile table. Here, we also join on the Stream table in
order to distinguish private and public streams, and we merge the counts for
PM and Huddle type messages into a single subgroup.
2016-11-01 17:00:43 -07:00
umkay
610e92b94e analytics: Add subgroup column to analytics tables.
This is a major change to the analytics schema, and is the first step in a
number of refactorings and performance improvements. For instance, it allows

* Grouping sets of similar CountStats in the *Count tables. For instance,
  active{_humans,_bots} will now have the same property, but have different
  subgroup values.

* Combining queries that differ only in their value on 1 filter clause, so
  that we make fewer passes through the zerver tables. For instance, instead
  of running a query for each of messages_sent_to_public_streams and
  messages_sent_to_private_streams, we can now run a single query with a
  group by on Stream.invite_only, and store the group by value in the
  subgroup column.
2016-10-27 16:33:58 -07:00
Rishi Gupta
54016e1096 analytics: Remove outdated comment in counts.py. 2016-10-25 13:42:55 -07:00
umkay
87d22c9e4d analytics: Fix count_stream_by_realm.
Add a join clause on zerver_message in count_stream_by_realm,
otherwise we only output the final total streamcount for a realm
for every time entry.
2016-10-22 19:10:36 -07:00
umkay
906a4e3b26 analytics: Add performance and transaction logging to counts.py.
For each database query made by an analytics function, log time spent and
the number of rows changed to var/logs/analytics.log.
In the spirit of write ahead logging, for each (stat, end_time)
update, log the start and end of the "transaction", as well as time
spent.
2016-10-17 16:10:03 -07:00