mirror of
				https://github.com/zulip/zulip.git
				synced 2025-11-04 05:53:43 +00:00 
			
		
		
		
	docs: Use code syntax in analytics subsystem doc for readability.
				
					
				
			This doc was using a lot of references of to class names etc. without putting them in `<code>`, making it harder to read.
This commit is contained in:
		
				
					committed by
					
						
						Tim Abbott
					
				
			
			
				
	
			
			
			
						parent
						
							34ceafadd5
						
					
				
				
					commit
					cda25e8a4a
				
			@@ -20,64 +20,63 @@ effectively modify the system.
 | 
			
		||||
 | 
			
		||||
There are three main components:
 | 
			
		||||
 | 
			
		||||
- models: The UserCount, StreamCount, RealmCount, and InstallationCount
 | 
			
		||||
  tables (analytics/models.py) collect and store time series data.
 | 
			
		||||
- stat definitions: The CountStat objects in the COUNT_STATS dictionary
 | 
			
		||||
  (analytics/lib/counts.py) define the set of stats Zulip collects.
 | 
			
		||||
- accounting: The FillState table (analytics/models.py) keeps track of what
 | 
			
		||||
  has been collected for which CountStats.
 | 
			
		||||
- models: The `UserCount`, `StreamCount`, `RealmCount`, and `InstallationCount`
 | 
			
		||||
  tables (`analytics/models.py`) collect and store time series data.
 | 
			
		||||
- stat definitions: The `CountStat` objects in the `COUNT_STATS` dictionary
 | 
			
		||||
  (`analytics/lib/counts.py`) define the set of stats Zulip collects.
 | 
			
		||||
- accounting: The `FillState` table (`analytics/models.py`) keeps track of what
 | 
			
		||||
  has been collected for which `CountStat`.
 | 
			
		||||
 | 
			
		||||
The next several sections will dive into the details of these components.
 | 
			
		||||
 | 
			
		||||
## The \*Count database tables
 | 
			
		||||
## The `*Count` database tables
 | 
			
		||||
 | 
			
		||||
The Zulip analytics system is built around collecting time series data in a
 | 
			
		||||
set of database tables. Each of these tables has the following fields:
 | 
			
		||||
 | 
			
		||||
- property: A human readable string uniquely identifying a CountStat
 | 
			
		||||
  object. Example: "active_users:is_bot:hour" or "messages_sent:client:day".
 | 
			
		||||
- subgroup: Almost all CountStats are further sliced by subgroup. For
 | 
			
		||||
  "active_users:is_bot:day", this column will be False for measurements of
 | 
			
		||||
  humans, and True for measurements of bots. For "messages_sent:client:day",
 | 
			
		||||
- property: A human readable string uniquely identifying a `CountStat`
 | 
			
		||||
  object. Example: `"active_users:is_bot:hour"` or `"messages_sent:client:day"`.
 | 
			
		||||
- subgroup: Almost all `CountStat` objects are further sliced by subgroup. For
 | 
			
		||||
  `"active_users:is_bot:day"`, this column will be `False` for measurements of
 | 
			
		||||
  humans, and `True` for measurements of bots. For `"messages_sent:client:day"`,
 | 
			
		||||
  this column is the client_id of the client under consideration.
 | 
			
		||||
- end_time: A datetime indicating the end of a time interval. It will be on
 | 
			
		||||
  an hour (or UTC day) boundary for stats collected at hourly (or daily)
 | 
			
		||||
  frequency. The time interval is determined by the CountStat.
 | 
			
		||||
- various "id" fields: Foreign keys into Realm, UserProfile, Stream, or
 | 
			
		||||
  nothing. E.g. the RealmCount table has a foreign key into Realm.
 | 
			
		||||
- value: The integer counts. For "active_users:is_bot:hour" in the
 | 
			
		||||
  RealmCount table, this is the number of active humans or bots (depending
 | 
			
		||||
  on subgroup) in a particular realm at a particular end_time. For
 | 
			
		||||
  "messages_sent:client:day" in the UserCount table, this is the number of
 | 
			
		||||
  frequency. The time interval is determined by the `CountStat`.
 | 
			
		||||
- various "id" fields: Foreign keys into `Realm`, `UserProfile`, `Stream`, or
 | 
			
		||||
  nothing. E.g. the `RealmCount` table has a foreign key into `Realm`.
 | 
			
		||||
- value: The integer counts. For `"active_users:is_bot:hour"` in the
 | 
			
		||||
  `RealmCount` table, this is the number of active humans or bots (depending
 | 
			
		||||
  on subgroup) in a particular realm at a particular `end_time`. For
 | 
			
		||||
  `"messages_sent:client:day"` in the `UserCount` table, this is the number of
 | 
			
		||||
  messages sent by a particular user, from a particular client, on the day
 | 
			
		||||
  ending at end_time.
 | 
			
		||||
  ending at `end_time`.
 | 
			
		||||
 | 
			
		||||
There are four tables: UserCount, StreamCount, RealmCount, and
 | 
			
		||||
InstallationCount. Every CountStat is initially collected into UserCount,
 | 
			
		||||
StreamCount, or RealmCount. Every stat in UserCount and StreamCount is
 | 
			
		||||
aggregated into RealmCount, and then all stats are aggregated from
 | 
			
		||||
RealmCount into InstallationCount. So for example,
 | 
			
		||||
"messages_sent:client:day" has rows in UserCount corresponding to (user,
 | 
			
		||||
end_time, client) triples. These are summed to rows in RealmCount
 | 
			
		||||
corresponding to triples of (realm, end_time, client). And then these are
 | 
			
		||||
summed to rows in InstallationCount with totals for pairs of (end_time,
 | 
			
		||||
client).
 | 
			
		||||
There are four tables: `UserCount`, `StreamCount`, `RealmCount`, and
 | 
			
		||||
`InstallationCount`. Every `CountStat` is initially collected into `UserCount`,
 | 
			
		||||
`StreamCount`, or `RealmCount`. Every stat in `UserCount` and `StreamCount` is
 | 
			
		||||
aggregated into `RealmCount`, and then all stats are aggregated from
 | 
			
		||||
`RealmCount` into `InstallationCount`. So for example,
 | 
			
		||||
`"messages_sent:client:day"` has rows in `UserCount` corresponding to
 | 
			
		||||
`(user, end_time, client)` triples. These are summed to rows in `RealmCount`
 | 
			
		||||
corresponding to triples of `(realm, end_time, client)`. And then these are
 | 
			
		||||
summed to rows in `InstallationCount` with totals for pairs of `(end_time, client)`.
 | 
			
		||||
 | 
			
		||||
Note: In most cases, we do not store rows with value 0. See
 | 
			
		||||
[Performance strategy](#performance-strategy) below.
 | 
			
		||||
 | 
			
		||||
## CountStats
 | 
			
		||||
 | 
			
		||||
CountStats declare what analytics data should be generated and stored. The
 | 
			
		||||
CountStat class definition and instances live in `analytics/lib/counts.py`.
 | 
			
		||||
`CountStat` objects declare what analytics data should be generated and stored. The
 | 
			
		||||
`CountStat` class definition and instances live in `analytics/lib/counts.py`.
 | 
			
		||||
These declarations specify at a high level which tables should be populated
 | 
			
		||||
by the system and with what data.
 | 
			
		||||
 | 
			
		||||
## The FillState table
 | 
			
		||||
 | 
			
		||||
The default Zulip production configuration runs a cron job once an hour that
 | 
			
		||||
updates the \*Count tables for each of the CountStats in the COUNT_STATS
 | 
			
		||||
dictionary. The FillState table simply keeps track of the last end_time that
 | 
			
		||||
updates the `*Count` tables for each of the `CountStat` objects in the `COUNT_STATS`
 | 
			
		||||
dictionary. The `FillState` table simply keeps track of the last `end_time` that
 | 
			
		||||
we successfully updated each stat. It also enables the analytics system to
 | 
			
		||||
recover from errors (by retrying) and to monitor that the cron job is
 | 
			
		||||
running and running to completion.
 | 
			
		||||
@@ -94,27 +93,27 @@ designed set of tables in PostgreSQL.
 | 
			
		||||
This requires some care to avoid making the analytics tables larger than the
 | 
			
		||||
rest of the Zulip database or adding a ton of computational load, but with
 | 
			
		||||
careful design, we can make the analytics system very low cost to operate.
 | 
			
		||||
Also, note that a Zulip application database has 2 huge tables: Message and
 | 
			
		||||
UserMessage, and everything else is small and thus not performance or
 | 
			
		||||
Also, note that a Zulip application database has 2 huge tables: `Message` and
 | 
			
		||||
`UserMessage`, and everything else is small and thus not performance or
 | 
			
		||||
space-sensitive, so it's important to optimize how many expensive queries we
 | 
			
		||||
do against those 2 tables.
 | 
			
		||||
 | 
			
		||||
There are a few important principles that we use to make the system
 | 
			
		||||
efficient:
 | 
			
		||||
 | 
			
		||||
- Not repeating work to keep things up to date (via FillState)
 | 
			
		||||
- Storing data in the \*Count tables to avoid our endpoints hitting the core
 | 
			
		||||
  Message/UserMessage tables is key, because some queries could take minutes
 | 
			
		||||
- Not repeating work to keep things up to date (via `FillState`)
 | 
			
		||||
- Storing data in the `*Count` tables to avoid our endpoints hitting the core
 | 
			
		||||
  `Message`/`UserMessage` tables is key, because some queries could take minutes
 | 
			
		||||
  to calculate. This allows any expensive operations to run offline, and
 | 
			
		||||
  then the endpoints to server data to users can be fast.
 | 
			
		||||
- Doing expensive operations inside the database, rather than fetching data
 | 
			
		||||
  to Python and then sending it back to the database (which can be far
 | 
			
		||||
  slower if there's a lot of data involved). The Django ORM currently
 | 
			
		||||
  doesn't support the "insert into .. select" type SQL query that's needed
 | 
			
		||||
  doesn't support the `"insert into .. select"` type SQL query that's needed
 | 
			
		||||
  for this, which is why we use raw database queries (which we usually avoid
 | 
			
		||||
  in Zulip) rather than the ORM.
 | 
			
		||||
- Aggregating where possible to avoid unnecessary queries against the
 | 
			
		||||
  Message and UserMessage tables. E.g. rather than querying the Message
 | 
			
		||||
  `Message` and `UserMessage` tables. E.g. rather than querying the `Message`
 | 
			
		||||
  table both to generate sent message counts for each realm and again for
 | 
			
		||||
  each user, we just query for each user, and then add up the numbers for
 | 
			
		||||
  the users to get the totals for the realm.
 | 
			
		||||
@@ -147,18 +146,18 @@ analytics tests, to make sure it stays that way as we refactor.
 | 
			
		||||
 | 
			
		||||
The system discussed above is designed primarily around the technical
 | 
			
		||||
problem of showing useful analytics about things where the raw data is
 | 
			
		||||
already stored in the database (e.g. Message, UserMessage). This is great
 | 
			
		||||
already stored in the database (e.g. `Message`, `UserMessage`). This is great
 | 
			
		||||
because we can always backfill that data to the beginning of time, but of
 | 
			
		||||
course sometimes one wants to do analytics on things that aren't worth
 | 
			
		||||
storing every data point for (e.g. activity data, request performance
 | 
			
		||||
statistics, etc.). There is currently a reference implementation of a
 | 
			
		||||
"LoggingCountStat" that shows how to handle such a situation.
 | 
			
		||||
`LoggingCountStat` that shows how to handle such a situation.
 | 
			
		||||
 | 
			
		||||
## Analytics UI development and testing
 | 
			
		||||
 | 
			
		||||
### Setup and testing
 | 
			
		||||
 | 
			
		||||
The main testing approach for the /stats page UI is manual testing.
 | 
			
		||||
The main testing approach for the `/stats` page UI is manual testing.
 | 
			
		||||
For most UI testing, you can visit `/stats/realm/analytics` while
 | 
			
		||||
logged in as Iago (this is the server administrator view of stats for
 | 
			
		||||
a given realm). The only piece that you can't test here is the "Me"
 | 
			
		||||
@@ -178,24 +177,24 @@ the updated graphs.
 | 
			
		||||
 | 
			
		||||
The relevant files are:
 | 
			
		||||
 | 
			
		||||
- analytics/views/stats.py: All chart data requests from the /stats page call
 | 
			
		||||
- `analytics/views/stats.py`: All chart data requests from the /stats page call
 | 
			
		||||
  get_chart_data in this file.
 | 
			
		||||
- web/src/stats/stats.js: The JavaScript and Plotly code.
 | 
			
		||||
- templates/analytics/stats.html
 | 
			
		||||
- web/styles/stats.css and web/styles/portico.css: We are in the
 | 
			
		||||
- `web/src/stats/stats.js`: The JavaScript and Plotly code.
 | 
			
		||||
- `templates/analytics/stats.html`
 | 
			
		||||
- `web/styles/stats.css` and `web/styles/portico.css`: We are in the
 | 
			
		||||
  process of re-styling this page to use in-app css instead of portico css,
 | 
			
		||||
  but there is currently still a lot of portico influence.
 | 
			
		||||
- analytics/urls.py: Has the URL routes; it's unlikely you will have to
 | 
			
		||||
- `analytics/urls.py`: Has the URL routes; it's unlikely you will have to
 | 
			
		||||
  modify this, including for adding a new graph.
 | 
			
		||||
 | 
			
		||||
Most of the code is self-explanatory, and for adding say a new graph, the
 | 
			
		||||
answer to most questions is to copy what the other graphs do. It is easy
 | 
			
		||||
when writing this sort of code to have a lot of semi-repeated code blocks
 | 
			
		||||
(especially in stats.js); it's good to do what you can to reduce this.
 | 
			
		||||
(especially in `stats.js`); it's good to do what you can to reduce this.
 | 
			
		||||
 | 
			
		||||
Tips and tricks:
 | 
			
		||||
 | 
			
		||||
- Use `$.get` to fetch data from the backend. You can grep through stats.js
 | 
			
		||||
- Use `$.get` to fetch data from the backend. You can grep through `stats.js`
 | 
			
		||||
  to find examples of this.
 | 
			
		||||
- The Plotly documentation is at
 | 
			
		||||
  <https://plot.ly/javascript/> (check out the full reference, event
 | 
			
		||||
@@ -205,11 +204,11 @@ Tips and tricks:
 | 
			
		||||
- Unless a graph has a ton of data, it is typically better to just redraw it
 | 
			
		||||
  when something changes (e.g. in the various aggregation click handlers)
 | 
			
		||||
  rather than to use retrace or relayout or do other complicated
 | 
			
		||||
  things. Performance on the /stats page is nice but not critical, and we've
 | 
			
		||||
  things. Performance on the `/stats` page is nice but not critical, and we've
 | 
			
		||||
  run into a lot of small bugs when trying to use Plotly's retrace/relayout.
 | 
			
		||||
- There is a way to access raw d3 functionality through Plotly, though it
 | 
			
		||||
  isn't documented well.
 | 
			
		||||
- 'paper' as a Plotly option refers to the bounding box of the graph (or
 | 
			
		||||
- `'paper'` as a Plotly option refers to the bounding box of the graph (or
 | 
			
		||||
  something related to that).
 | 
			
		||||
- You can't right click and inspect the elements of a Plotly graph (e.g. the
 | 
			
		||||
  bars in a bar graph) in your browser, since there is an interaction layer
 | 
			
		||||
@@ -218,10 +217,10 @@ Tips and tricks:
 | 
			
		||||
 | 
			
		||||
### /activity page
 | 
			
		||||
 | 
			
		||||
- There's a somewhat less developed /activity page, for server
 | 
			
		||||
- There's a somewhat less developed `/activity` page, for server
 | 
			
		||||
  administrators, showing data on all the realms on a server. To
 | 
			
		||||
  access it, you need to have the `is_staff` bit set on your
 | 
			
		||||
  UserProfile object. You can set it using `manage.py shell` and
 | 
			
		||||
  editing the UserProfile object directly. A great future project is
 | 
			
		||||
  `UserProfile` object. You can set it using `manage.py shell` and
 | 
			
		||||
  editing the `UserProfile` object directly. A great future project is
 | 
			
		||||
  to clean up that page's data sources, and make this a documented
 | 
			
		||||
  interface.
 | 
			
		||||
 
 | 
			
		||||
		Reference in New Issue
	
	Block a user