Release Zulip Server 1.9.2.

import: Fix uploading avatars with S3 upload backend.
This should hopefully be the last commit of this form; ultimately, my hope is that we'll be able to refactor the semi-duplicated logic in this file to avoid so much effort going into keeping this correct.
2025-10-24 08:33:43 +00:00 · 2019-01-29 16:33:36 -08:00 · 2019-01-29 16:26:19 -08:00 · 2019-01-29 16:17:27 -08:00 · 2019-01-29 16:14:10 -08:00 · 2019-01-29 16:13:25 -08:00
18 changed files with 328 additions and 118 deletions
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -54,7 +54,7 @@ author = 'The Zulip Team'
 # The short X.Y version.
 version = '1.9'
 # The full version, including alpha/beta/rc tags.
-release = '1.9.1'
+release = '1.9.2'

 # This allows us to insert a warning that appears only on an unreleased
 # version, e.g. to say that something is likely to have changed.
--- a/docs/overview/changelog.md
+++ b/docs/overview/changelog.md
@@ -7,6 +7,21 @@ All notable changes to the Zulip server are documented in this file.
 This section lists notable unreleased changes; it is generally updated
 in bursts.

+### 1.9.2 -- 2019-01-29
+
+This release migrates Zulip off a deprecated Google+ API (necessary
+for Google Authentication to continue working past March 7), and
+contains a few bug fixes for the installer and Slack import.  It has
+minimal changes for existing servers not using Google authentication.
+
+- Updated the Google Auth integration to stop using a deprecated and
+  soon-to-be-removed Google+ authentication API.
+- Improved installer error messages for common configuration problems.
+- Fixed several bugs in Slack, Gitter, and HipChat import tools.
+- Fixed a subtle bug in garbage-collection of the node_modules cache.
+- Optimized performance of Slack import for organizations with
+  thousands of users.
+
 ### 1.9.1 -- 2018-11-30

 This release is primarily intended to improve the experience for new
--- a/docs/production/requirements.md
+++ b/docs/production/requirements.md
@@ -37,8 +37,12 @@ upgrading.

 If you're using Ubuntu, the
 [Ubuntu universe repository][ubuntu-repositories] must be
-[enabled][enable-universe], which is usually just `sudo
-add-apt-repository universe`.
+[enabled][enable-universe], which is usually just:
+
+```
+sudo add-apt-repository universe
+sudo apt update
+```

 [ubuntu-repositories]:
 https://help.ubuntu.com/community/Repositories/Ubuntu
--- a/scripts/lib/clean_emoji_cache.py
+++ b/scripts/lib/clean_emoji_cache.py
@@ -35,7 +35,8 @@ def get_caches_in_use(threshold_days):
            # This happens for a deployment directory extracted from a
            # tarball, which just has a copy of the emoji data, not a symlink.
            continue
-        caches_in_use.add(os.readlink(emoji_link_path))
+        # The actual cache path doesn't include the /emoji
+        caches_in_use.add(os.path.dirname(os.readlink(emoji_link_path)))
    return caches_in_use

 def main(args: argparse.Namespace) -> None:
--- a/scripts/lib/clean_node_cache.py
+++ b/scripts/lib/clean_node_cache.py
@@ -45,7 +45,8 @@ def get_caches_in_use(threshold_days):
            # If 'package.json' file doesn't exist then no node_modules
            # cache is associated with this setup.
            continue
-        caches_in_use.add(os.readlink(node_modules_link_path))
+        # The actual cache path doesn't include the /node_modules
+        caches_in_use.add(os.path.dirname(os.readlink(node_modules_link_path)))

    return caches_in_use

--- a/scripts/lib/install
+++ b/scripts/lib/install
@@ -110,7 +110,10 @@ if [ "$os_id" = Ubuntu ] && ! apt-cache policy |
    cat <<'EOF'

 You must enable the Ubuntu Universe repository before installing
-Zulip.  You can do this with: `add-apt-repository universe`.
+Zulip.  You can do this with:
+
+    sudo add-apt-repository universe
+    sudo apt update

 For more information, see:
  https://zulip.readthedocs.io/en/latest/production/requirements.html
--- a/scripts/lib/upgrade-zulip-stage-2
+++ b/scripts/lib/upgrade-zulip-stage-2
@@ -40,6 +40,8 @@ parser.add_argument("--skip-migrations", dest="skip_migrations", action='store_t
                    help="Skip doing migrations.")
 parser.add_argument("--from-git", dest="from_git", action='store_true',
                    help="Upgrading from git, so run update-prod-static.")
+parser.add_argument("--skip-purge-old-deployments", dest="skip_purge_old_deployments",
+                    action="store_true", help="Skip purging old deployments.")
 args = parser.parse_args()

 deploy_path = args.deploy_path
@@ -168,4 +170,8 @@ logging.info("Restarting Zulip...")
 subprocess.check_output(["./scripts/restart-server"], preexec_fn=su_to_zulip)
 logging.info("Upgrade complete!")

-subprocess.check_call(["./scripts/purge-old-deployments"])
+if not args.skip_purge_old_deployments:
+    logging.info("Purging old deployments...")
+    subprocess.check_call(["./scripts/purge-old-deployments"])
+else:
+    logging.info("Skipping purging old deployments.")
--- a/version.py
+++ b/version.py
@@ -1,4 +1,4 @@
-ZULIP_VERSION = "1.9.1"
+ZULIP_VERSION = "1.9.2"

 # Bump the minor PROVISION_VERSION to indicate that folks should provision
 # only when going from an old version of the code to a newer version. Bump
--- a/zerver/data_import/gitter.py
+++ b/zerver/data_import/gitter.py
@@ -217,8 +217,13 @@ def get_usermentions(message: Dict[str, Any], user_map: Dict[str, int],
        for mention in message['mentions']:
            if mention.get('userId') in user_map:
                gitter_mention = '@%s' % (mention['screenName'])
-                zulip_mention = ('@**%s**' %
-                                 (user_short_name_to_full_name[mention['screenName']]))
+                if mention['screenName'] not in user_short_name_to_full_name:
+                    logging.info("Mentioned user %s never sent any messages, so has no full name data" %
+                                 mention['screenName'])
+                    full_name = mention['screenName']
+                else:
+                    full_name = user_short_name_to_full_name[mention['screenName']]
+                zulip_mention = ('@**%s**' % (full_name,))
                message['text'] = message['text'].replace(gitter_mention, zulip_mention)

                mentioned_user_ids.append(user_map[mention['userId']])
--- a/zerver/data_import/hipchat.py
+++ b/zerver/data_import/hipchat.py
@@ -313,16 +313,31 @@ def write_emoticon_data(realm_id: int,

    fn = 'emoticons.json'
    data_file = os.path.join(data_dir, fn)
+    if not os.path.exists(data_file):
+        logging.warning("HipChat export does not contain emoticons.json.")
+        logging.warning("As a result, custom emoji cannot be imported.")
+        return []
+
    with open(data_file) as f:
        data = ujson.load(f)

-    flat_data = [
-        dict(
-            path=d['Emoticon']['path'],
-            name=d['Emoticon']['shortcut'],
-        )
-        for d in data
-    ]
+    if isinstance(data, dict) and 'Emoticons' in data:
+        # Handle the hc-migrate export format for emoticons.json.
+        flat_data = [
+            dict(
+                path=d['path'],
+                name=d['shortcut'],
+            )
+            for d in data['Emoticons']
+        ]
+    else:
+        flat_data = [
+            dict(
+                path=d['Emoticon']['path'],
+                name=d['Emoticon']['shortcut'],
+            )
+            for d in data
+        ]

    emoji_folder = os.path.join(output_dir, 'emoji')
    os.makedirs(emoji_folder, exist_ok=True)
@@ -506,7 +521,7 @@ def process_message_file(realm_id: int,
            )

            if is_pm_data:
-                if sender_id != fn_id:
+                if int(sender_id) != int(fn_id):
                    # PMs are in multiple places in the Hipchat export,
                    # and we only use the copy from the sender
                    return None
--- a/zerver/data_import/import_util.py
+++ b/zerver/data_import/import_util.py
@@ -3,9 +3,10 @@ import requests
 import shutil
 import logging
 import os
+import traceback
 import ujson

-from typing import List, Dict, Any, Optional, Set, Callable
+from typing import List, Dict, Any, Optional, Set, Callable, Iterable, Tuple, TypeVar
 from django.forms.models import model_to_dict

 from zerver.models import Realm, RealmEmoji, Subscription, Recipient, \
@@ -13,7 +14,7 @@ from zerver.models import Realm, RealmEmoji, Subscription, Recipient, \
 from zerver.data_import.sequencer import NEXT_ID
 from zerver.lib.actions import STREAM_ASSIGNMENT_COLORS as stream_colors
 from zerver.lib.avatar_hash import user_avatar_path_from_ids
-from zerver.lib.parallel import run_parallel
+from zerver.lib.parallel import run_parallel, JobData

 # stubs
 ZerverFieldsT = Dict[str, Any]
@@ -263,9 +264,15 @@ def build_usermessages(zerver_usermessage: List[ZerverFieldsT],
                       subscriber_map: Dict[int, Set[int]],
                       recipient_id: int,
                       mentioned_user_ids: List[int],
-                       message_id: int) -> None:
+                       message_id: int,
+                       long_term_idle: Optional[Set[int]]=None) -> Tuple[int, int]:
    user_ids = subscriber_map.get(recipient_id, set())

+    if long_term_idle is None:
+        long_term_idle = set()
+
+    user_messages_created = 0
+    user_messages_skipped = 0
    if user_ids:
        for user_id in sorted(user_ids):
            is_mentioned = user_id in mentioned_user_ids
@@ -274,6 +281,12 @@ def build_usermessages(zerver_usermessage: List[ZerverFieldsT],
            # It's possible we don't even get PMs from them.
            is_private = False

+            if not is_mentioned and not is_private and user_id in long_term_idle:
+                # these users are long-term idle
+                user_messages_skipped += 1
+                continue
+            user_messages_created += 1
+
            usermessage = build_user_message(
                user_id=user_id,
                message_id=message_id,
@@ -282,6 +295,7 @@ def build_usermessages(zerver_usermessage: List[ZerverFieldsT],
            )

            zerver_usermessage.append(usermessage)
+    return (user_messages_created, user_messages_skipped)

 def build_user_message(user_id: int,
                       message_id: int,
@@ -388,7 +402,7 @@ def process_avatars(avatar_list: List[ZerverFieldsT], avatar_dir: str, realm_id:
    downloaded.  For simpler conversions see write_avatar_png.
    """

-    def get_avatar(avatar_upload_item: List[str]) -> int:
+    def get_avatar(avatar_upload_item: List[str]) -> None:
        avatar_url = avatar_upload_item[0]

        image_path = os.path.join(avatar_dir, avatar_upload_item[1])
@@ -398,7 +412,6 @@ def process_avatars(avatar_list: List[ZerverFieldsT], avatar_dir: str, realm_id:
        with open(image_path, 'wb') as image_file:
            shutil.copyfileobj(response.raw, image_file)
        shutil.copy(image_path, original_image_path)
-        return 0

    logging.info('######### GETTING AVATARS #########\n')
    logging.info('DOWNLOADING AVATARS .......\n')
@@ -425,7 +438,7 @@ def process_avatars(avatar_list: List[ZerverFieldsT], avatar_dir: str, realm_id:

    # Run downloads parallely
    output = []
-    for (status, job) in run_parallel(get_avatar, avatar_upload_list, threads=threads):
+    for (status, job) in run_parallel_wrapper(get_avatar, avatar_upload_list, threads=threads):
        output.append(job)

    logging.info('######### GETTING AVATARS FINISHED #########\n')
@@ -458,10 +471,32 @@ def write_avatar_png(avatar_folder: str,
        s3_path=image_path,
        realm_id=realm_id,
        user_profile_id=user_id,
+        # We only write the .original file; ask the importer to do the thumbnailing.
+        importer_should_thumbnail=True,
    )

    return metadata

+ListJobData = TypeVar('ListJobData')
+def run_parallel_wrapper(f: Callable[[ListJobData], None], full_items: List[ListJobData],
+                         threads: int=6) -> Iterable[Tuple[int, List[ListJobData]]]:
+    logging.info("Distributing %s items across %s threads" % (len(full_items), threads))
+
+    def wrapping_function(items: List[ListJobData]) -> int:
+        count = 0
+        for item in items:
+            try:
+                f(item)
+            except Exception:
+                logging.info("Error processing item: %s" % (item,))
+                traceback.print_exc()
+            count += 1
+            if count % 1000 == 0:
+                logging.info("A download thread finished %s items" % (count,))
+        return 0
+    job_lists = [full_items[i::threads] for i in range(threads)]  # type: List[List[ListJobData]]
+    return run_parallel(wrapping_function, job_lists, threads=threads)
+
 def process_uploads(upload_list: List[ZerverFieldsT], upload_dir: str,
                    threads: int) -> List[ZerverFieldsT]:
    """
@@ -471,7 +506,7 @@ def process_uploads(upload_list: List[ZerverFieldsT], upload_dir: str,
    1. upload_list: List of uploads to be mapped in uploads records.json file
    2. upload_dir: Folder where the downloaded uploads are saved
    """
-    def get_uploads(upload: List[str]) -> int:
+    def get_uploads(upload: List[str]) -> None:
        upload_url = upload[0]
        upload_path = upload[1]
        upload_path = os.path.join(upload_dir, upload_path)
@@ -480,7 +515,6 @@ def process_uploads(upload_list: List[ZerverFieldsT], upload_dir: str,
        os.makedirs(os.path.dirname(upload_path), exist_ok=True)
        with open(upload_path, 'wb') as upload_file:
            shutil.copyfileobj(response.raw, upload_file)
-        return 0

    logging.info('######### GETTING ATTACHMENTS #########\n')
    logging.info('DOWNLOADING ATTACHMENTS .......\n')
@@ -493,7 +527,7 @@ def process_uploads(upload_list: List[ZerverFieldsT], upload_dir: str,

    # Run downloads parallely
    output = []
-    for (status, job) in run_parallel(get_uploads, upload_url_list, threads=threads):
+    for (status, job) in run_parallel_wrapper(get_uploads, upload_url_list, threads=threads):
        output.append(job)

    logging.info('######### GETTING ATTACHMENTS FINISHED #########\n')
@@ -522,7 +556,7 @@ def process_emojis(zerver_realmemoji: List[ZerverFieldsT], emoji_dir: str,
    2. emoji_dir: Folder where the downloaded emojis are saved
    3. emoji_url_map: Maps emoji name to its url
    """
-    def get_emojis(upload: List[str]) -> int:
+    def get_emojis(upload: List[str]) -> None:
        emoji_url = upload[0]
        emoji_path = upload[1]
        upload_emoji_path = os.path.join(emoji_dir, emoji_path)
@@ -531,7 +565,6 @@ def process_emojis(zerver_realmemoji: List[ZerverFieldsT], emoji_dir: str,
        os.makedirs(os.path.dirname(upload_emoji_path), exist_ok=True)
        with open(upload_emoji_path, 'wb') as emoji_file:
            shutil.copyfileobj(response.raw, emoji_file)
-        return 0

    emoji_records = []
    upload_emoji_list = []
@@ -555,7 +588,7 @@ def process_emojis(zerver_realmemoji: List[ZerverFieldsT], emoji_dir: str,

    # Run downloads parallely
    output = []
-    for (status, job) in run_parallel(get_emojis, upload_emoji_list, threads=threads):
+    for (status, job) in run_parallel_wrapper(get_emojis, upload_emoji_list, threads=threads):
        output.append(job)

    logging.info('######### GETTING EMOJIS FINISHED #########\n')
--- a/zerver/data_import/slack.py
+++ b/zerver/data_import/slack.py
@@ -10,11 +10,13 @@ import logging
 import random
 import requests

+from collections import defaultdict
+
 from django.conf import settings
 from django.db import connection
 from django.utils.timezone import now as timezone_now
 from django.forms.models import model_to_dict
-from typing import Any, Dict, List, Optional, Tuple, Set
+from typing import Any, Dict, List, Optional, Tuple, Set, Iterator
 from zerver.forms import check_subdomain_available
 from zerver.models import Reaction, RealmEmoji, Realm, UserProfile, Recipient, \
    CustomProfileField, CustomProfileFieldValue
@@ -263,6 +265,9 @@ def build_customprofilefields_values(custom_field_map: ZerverFieldsT, fields: Ze
                                     user_id: int, custom_field_id: int,
                                     custom_field_values: List[ZerverFieldsT]) -> int:
    for field, value in fields.items():
+        if value['value'] == "":
+            # Skip writing entries for fields with an empty value
+            continue
        custom_field_value = CustomProfileFieldValue(
            id=custom_field_id,
            value=value['value'])
@@ -296,7 +301,7 @@ def get_user_email(user: ZerverFieldsT, domain_name: str) -> str:
        else:
            raise AssertionError("Could not identify bot type")
        return slack_bot_name.replace("Bot", "").replace(" ", "") + "-bot@%s" % (domain_name,)
-    if get_user_full_name(user) == "slackbot":
+    if get_user_full_name(user).lower() == "slackbot":
        return "imported-slackbot-bot@%s" % (domain_name,)
    raise AssertionError("Could not find email address for Slack user %s" % (user,))

@@ -434,9 +439,61 @@ def get_subscription(channel_members: List[str], zerver_subscription: List[Zerve
        subscription_id += 1
    return subscription_id

+def process_long_term_idle_users(slack_data_dir: str, users: List[ZerverFieldsT],
+                                 added_users: AddedUsersT, added_channels: AddedChannelsT,
+                                 zerver_userprofile: List[ZerverFieldsT]) -> Set[int]:
+    """Algorithmically, we treat users who have sent at least 10 messages
+    or have sent a message within the last 60 days as active.
+    Everyone else is treated as long-term idle, which means they will
+    have a slighly slower first page load when coming back to
+    Zulip.
+    """
+    all_messages = get_messages_iterator(slack_data_dir, added_channels)
+
+    sender_counts = defaultdict(int)  # type: Dict[str, int]
+    recent_senders = set()  # type: Set[str]
+    NOW = float(timezone_now().timestamp())
+    for message in all_messages:
+        timestamp = float(message['ts'])
+        slack_user_id = get_message_sending_user(message)
+        if not slack_user_id:
+            # Ignore messages without user names
+            continue
+
+        if slack_user_id in recent_senders:
+            continue
+
+        if NOW - timestamp < 60:
+            recent_senders.add(slack_user_id)
+
+        sender_counts[slack_user_id] += 1
+    for (slack_sender_id, count) in sender_counts.items():
+        if count > 10:
+            recent_senders.add(slack_sender_id)
+
+    long_term_idle = set()
+
+    for slack_user in users:
+        if slack_user["id"] in recent_senders:
+            continue
+        zulip_user_id = added_users[slack_user['id']]
+        long_term_idle.add(zulip_user_id)
+
+    # Record long-term idle status in zerver_userprofile
+    for user_profile_row in zerver_userprofile:
+        if user_profile_row['id'] in long_term_idle:
+            user_profile_row['long_term_idle'] = True
+            # Setting last_active_message_id to 1 means the user, if
+            # imported, will get the full message history for the
+            # streams they were on.
+            user_profile_row['last_active_message_id'] = 1
+
+    return long_term_idle
+
 def convert_slack_workspace_messages(slack_data_dir: str, users: List[ZerverFieldsT], realm_id: int,
                                     added_users: AddedUsersT, added_recipient: AddedRecipientsT,
                                     added_channels: AddedChannelsT, realm: ZerverFieldsT,
+                                     zerver_userprofile: List[ZerverFieldsT],
                                     zerver_realmemoji: List[ZerverFieldsT], domain_name: str,
                                     output_dir: str,
                                     chunk_size: int=MESSAGE_BATCH_CHUNK_SIZE) -> Tuple[List[ZerverFieldsT],
@@ -448,12 +505,12 @@ def convert_slack_workspace_messages(slack_data_dir: str, users: List[ZerverFiel
    2. uploads, which is a list of uploads to be mapped in uploads records.json
    3. attachment, which is a list of the attachments
    """
-    all_messages = get_all_messages(slack_data_dir, added_channels)

-    # we sort the messages according to the timestamp to show messages with
-    # the proper date order
-    all_messages = sorted(all_messages, key=lambda message: message['ts'])
+    long_term_idle = process_long_term_idle_users(slack_data_dir, users, added_users,
+                                                  added_channels, zerver_userprofile)

+    # Now, we actually import the messages.
+    all_messages = get_messages_iterator(slack_data_dir, added_channels)
    logging.info('######### IMPORTING MESSAGES STARTED #########\n')

    total_reactions = []  # type: List[ZerverFieldsT]
@@ -461,8 +518,6 @@ def convert_slack_workspace_messages(slack_data_dir: str, users: List[ZerverFiel
    total_uploads = []  # type: List[ZerverFieldsT]

    # The messages are stored in batches
-    low_index = 0
-    upper_index = low_index + chunk_size
    dump_file_id = 1

    subscriber_map = make_subscriber_map(
@@ -470,14 +525,21 @@ def convert_slack_workspace_messages(slack_data_dir: str, users: List[ZerverFiel
    )

    while True:
-        message_data = all_messages[low_index:upper_index]
+        message_data = []
+        _counter = 0
+        for msg in all_messages:
+            _counter += 1
+            message_data.append(msg)
+            if _counter == chunk_size:
+                break
        if len(message_data) == 0:
            break
+
        zerver_message, zerver_usermessage, attachment, uploads, reactions = \
            channel_message_to_zerver_message(
                realm_id, users, added_users, added_recipient, message_data,
                zerver_realmemoji, subscriber_map, added_channels,
-                domain_name)
+                domain_name, long_term_idle)

        message_json = dict(
            zerver_message=zerver_message,
@@ -491,26 +553,39 @@ def convert_slack_workspace_messages(slack_data_dir: str, users: List[ZerverFiel
        total_attachments += attachment
        total_uploads += uploads

-        low_index = upper_index
-        upper_index = chunk_size + low_index
        dump_file_id += 1

    logging.info('######### IMPORTING MESSAGES FINISHED #########\n')
    return total_reactions, total_uploads, total_attachments

-def get_all_messages(slack_data_dir: str, added_channels: AddedChannelsT) -> List[ZerverFieldsT]:
-    all_messages = []  # type: List[ZerverFieldsT]
+def get_messages_iterator(slack_data_dir: str, added_channels: AddedChannelsT) -> Iterator[ZerverFieldsT]:
+    """This function is an iterator that returns all the messages across
+       all Slack channels, in order by timestamp.  It's important to
+       not read all the messages into memory at once, because for
+       large imports that can OOM kill."""
+    all_json_names = defaultdict(list)  # type: Dict[str, List[str]]
    for channel_name in added_channels.keys():
        channel_dir = os.path.join(slack_data_dir, channel_name)
        json_names = os.listdir(channel_dir)
        for json_name in json_names:
+            all_json_names[json_name].append(channel_dir)
+
+    # Sort json_name by date
+    for json_name in sorted(all_json_names.keys()):
+        messages_for_one_day = []  # type: List[ZerverFieldsT]
+        for channel_dir in all_json_names[json_name]:
            message_dir = os.path.join(channel_dir, json_name)
            messages = get_data_file(message_dir)
+            channel_name = os.path.basename(channel_dir)
            for message in messages:
                # To give every message the channel information
                message['channel_name'] = channel_name
-            all_messages += messages
-    return all_messages
+            messages_for_one_day += messages
+
+        # we sort the messages according to the timestamp to show messages with
+        # the proper date order
+        for message in sorted(messages_for_one_day, key=lambda m: m['ts']):
+            yield message

 def channel_message_to_zerver_message(realm_id: int,
                                      users: List[ZerverFieldsT],
@@ -520,11 +595,12 @@ def channel_message_to_zerver_message(realm_id: int,
                                      zerver_realmemoji: List[ZerverFieldsT],
                                      subscriber_map: Dict[int, Set[int]],
                                      added_channels: AddedChannelsT,
-                                      domain_name: str) -> Tuple[List[ZerverFieldsT],
-                                                                 List[ZerverFieldsT],
-                                                                 List[ZerverFieldsT],
-                                                                 List[ZerverFieldsT],
-                                                                 List[ZerverFieldsT]]:
+                                      domain_name: str,
+                                      long_term_idle: Set[int]) -> Tuple[List[ZerverFieldsT],
+                                                                         List[ZerverFieldsT],
+                                                                         List[ZerverFieldsT],
+                                                                         List[ZerverFieldsT],
+                                                                         List[ZerverFieldsT]]:
    """
    Returns:
    1. zerver_message, which is a list of the messages
@@ -543,6 +619,8 @@ def channel_message_to_zerver_message(realm_id: int,
    with open(NAME_TO_CODEPOINT_PATH) as fp:
        name_to_codepoint = ujson.load(fp)

+    total_user_messages = 0
+    total_skipped_user_messages = 0
    for message in all_messages:
        user = get_message_sending_user(message)
        if not user:
@@ -620,14 +698,19 @@ def channel_message_to_zerver_message(realm_id: int,
        zerver_message.append(zulip_message)

        # construct usermessages
-        build_usermessages(
+        (num_created, num_skipped) = build_usermessages(
            zerver_usermessage=zerver_usermessage,
            subscriber_map=subscriber_map,
            recipient_id=recipient_id,
            mentioned_user_ids=mentioned_user_ids,
            message_id=message_id,
+            long_term_idle=long_term_idle,
        )
+        total_user_messages += num_created
+        total_skipped_user_messages += num_skipped

+    logging.debug("Created %s UserMessages; deferred %s due to long-term idle" % (
+        total_user_messages, total_skipped_user_messages))
    return zerver_message, zerver_usermessage, zerver_attachment, uploads_list, \
        reaction_list

@@ -802,7 +885,7 @@ def do_convert_data(slack_zip_file: str, output_dir: str, token: str, threads: i

    reactions, uploads_list, zerver_attachment = convert_slack_workspace_messages(
        slack_data_dir, user_list, realm_id, added_users, added_recipient, added_channels,
-        realm, realm['zerver_realmemoji'], domain_name, output_dir)
+        realm, realm['zerver_userprofile'], realm['zerver_realmemoji'], domain_name, output_dir)

    # Move zerver_reactions to realm.json file
    realm['zerver_reaction'] = reactions
--- a/zerver/lib/export.py
+++ b/zerver/lib/export.py
@@ -1287,8 +1287,6 @@ def do_export_realm(realm: Realm, output_dir: Path, threads: int,
    if not settings.TEST_SUITE:
        assert threads >= 1

-    assert os.path.exists("./manage.py")
-
    realm_config = get_realm_config()

    create_soft_link(source=output_dir, in_progress=True)
--- a/zerver/lib/import_realm.py
+++ b/zerver/lib/import_realm.py
@@ -621,6 +621,8 @@ def import_uploads_s3(bucket_name: str, import_dir: Path, processing_avatars: bo
            if record['s3_path'].endswith('.original'):
                user_profile = get_user_profile_by_id(record['user_profile_id'])
                upload_backend.ensure_medium_avatar_image(user_profile=user_profile)
+                if record.get("importer_should_thumbnail"):
+                    upload_backend.ensure_basic_avatar_image(user_profile=user_profile)

 def import_uploads(import_dir: Path, processing_avatars: bool=False,
                   processing_emojis: bool=False) -> None:
@@ -723,6 +725,10 @@ def do_import_realm(import_dir: Path, subdomain: str) -> Realm:
    # Remap the user IDs for notification_bot and friends to their
    # appropriate IDs on this server
    for item in data['zerver_userprofile_crossrealm']:
+        if item['email'].startswith("emailgateway@"):
+            # The email gateway bot's email is customized to a
+            # different domain on some servers.
+            item['email'] = settings.EMAIL_GATEWAY_BOT
        logging.info("Adding to ID map: %s %s" % (item['id'], get_system_bot(item['email']).id))
        new_user_id = get_system_bot(item['email']).id
        update_id_map(table='user_profile', old_id=item['id'], new_id=new_user_id)
@@ -903,6 +909,29 @@ def do_import_realm(import_dir: Path, subdomain: str) -> Realm:
    update_model_ids(Reaction, data, 'reaction')
    bulk_import_model(data, Reaction)

+    for user_profile in UserProfile.objects.filter(is_bot=False, realm=realm):
+        # Since we now unconditionally renumbers message IDs, we need
+        # to reset the user's pointer to what will be a valid value.
+        #
+        # For zulip->zulip imports, we could do something clever, but
+        # it should always be safe to reset to first unread message.
+        #
+        # Longer-term, the plan is to eliminate pointer as a concept.
+        first_unread_message = UserMessage.objects.filter(user_profile=user_profile).extra(
+            where=[UserMessage.where_unread()]
+        ).first()
+        if first_unread_message is not None:
+            user_profile.pointer = first_unread_message.message_id
+        else:
+            last_message = UserMessage.objects.filter(user_profile=user_profile).last()
+            if last_message is not None:
+                user_profile.pointer = last_message.message_id
+            else:
+                # -1 is the guard value for new user accounts with no messages.
+                user_profile.pointer = -1
+
+        user_profile.save(update_fields=["pointer"])
+
    # Do attachments AFTER message data is loaded.
    # TODO: de-dup how we read these json files.
    fn = os.path.join(import_dir, "attachment.json")
--- a/zerver/lib/upload.py
+++ b/zerver/lib/upload.py
@@ -182,6 +182,9 @@ class ZulipUploadBackend:
    def ensure_medium_avatar_image(self, user_profile: UserProfile) -> None:
        raise NotImplementedError()

+    def ensure_basic_avatar_image(self, user_profile: UserProfile) -> None:
+        raise NotImplementedError()
+
    def upload_realm_icon_image(self, icon_file: File, user_profile: UserProfile) -> None:
        raise NotImplementedError()

@@ -443,6 +446,27 @@ class S3UploadBackend(ZulipUploadBackend):
            resized_medium
        )

+    def ensure_basic_avatar_image(self, user_profile: UserProfile) -> None:  # nocoverage
+        # TODO: Refactor this to share code with ensure_medium_avatar_image
+        file_path = user_avatar_path(user_profile)
+        # Also TODO: Migrate to user_avatar_path(user_profile) + ".png".
+        s3_file_name = file_path
+
+        bucket_name = settings.S3_AVATAR_BUCKET
+        conn = S3Connection(settings.S3_KEY, settings.S3_SECRET_KEY)
+        bucket = get_bucket(conn, bucket_name)
+        key = bucket.get_key(file_path + ".original")
+        image_data = key.get_contents_as_string()
+
+        resized_avatar = resize_avatar(image_data)  # type: ignore # image_data is `bytes`, boto subs are wrong
+        upload_image_to_s3(
+            bucket_name,
+            s3_file_name,
+            "image/png",
+            user_profile,
+            resized_avatar
+        )
+
    def upload_emoji_image(self, emoji_file: File, emoji_file_name: str,
                           user_profile: UserProfile) -> None:
        content_type = guess_type(emoji_file.name)[0]
@@ -589,6 +613,19 @@ class LocalUploadBackend(ZulipUploadBackend):
        resized_medium = resize_avatar(image_data, MEDIUM_AVATAR_SIZE)
        write_local_file('avatars', file_path + '-medium.png', resized_medium)

+    def ensure_basic_avatar_image(self, user_profile: UserProfile) -> None:  # nocoverage
+        # TODO: Refactor this to share code with ensure_medium_avatar_image
+        file_path = user_avatar_path(user_profile)
+
+        output_path = os.path.join(settings.LOCAL_UPLOADS_DIR, "avatars", file_path + ".png")
+        if os.path.isfile(output_path):
+            return
+
+        image_path = os.path.join(settings.LOCAL_UPLOADS_DIR, "avatars", file_path + ".original")
+        image_data = open(image_path, "rb").read()
+        resized_avatar = resize_avatar(image_data)
+        write_local_file('avatars', file_path + '.png', resized_avatar)
+
    def upload_emoji_image(self, emoji_file: File, emoji_file_name: str,
                           user_profile: UserProfile) -> None:
        emoji_path = RealmEmoji.PATH_ID_TEMPLATE.format(
--- a/zerver/tests/test_auth_backends.py
+++ b/zerver/tests/test_auth_backends.py
@@ -874,9 +874,9 @@ class GoogleSubdomainLoginTest(GoogleOAuthTest):

    def test_google_oauth2_success(self) -> None:
        token_response = ResponseMock(200, {'access_token': "unique_token"})
-        account_data = dict(name=dict(formatted="Full Name"),
-                            emails=[dict(type="account",
-                                         value=self.example_email("hamlet"))])
+        account_data = dict(name="Full Name",
+                            email_verified=True,
+                            email=self.example_email("hamlet"))
        account_response = ResponseMock(200, account_data)
        result = self.google_oauth2_test(token_response, account_response,
                                         subdomain='zulip', next='/user_uploads/image')
@@ -892,24 +892,15 @@ class GoogleSubdomainLoginTest(GoogleOAuthTest):
                                 parsed_url.path)
        self.assertTrue(uri.startswith('http://zulip.testserver/accounts/login/subdomain/'))

-    def test_google_oauth2_no_fullname(self) -> None:
+    def test_user_cannot_log_without_verified_email(self) -> None:
        token_response = ResponseMock(200, {'access_token': "unique_token"})
-        account_data = dict(name=dict(givenName="Test", familyName="User"),
-                            emails=[dict(type="account",
-                                         value=self.example_email("hamlet"))])
+        account_data = dict(name="Full Name",
+                            email_verified=False,
+                            email=self.example_email("hamlet"))
        account_response = ResponseMock(200, account_data)
-        result = self.google_oauth2_test(token_response, account_response, subdomain='zulip')
-
-        data = load_subdomain_token(result)
-        self.assertEqual(data['email'], self.example_email("hamlet"))
-        self.assertEqual(data['name'], 'Test User')
-        self.assertEqual(data['subdomain'], 'zulip')
-        self.assertEqual(data['next'], '')
-        self.assertEqual(result.status_code, 302)
-        parsed_url = urllib.parse.urlparse(result.url)
-        uri = "{}://{}{}".format(parsed_url.scheme, parsed_url.netloc,
-                                 parsed_url.path)
-        self.assertTrue(uri.startswith('http://zulip.testserver/accounts/login/subdomain/'))
+        result = self.google_oauth2_test(token_response, account_response,
+                                         subdomain='zulip')
+        self.assertEqual(result.status_code, 400)

    def test_google_oauth2_mobile_success(self) -> None:
        self.user_profile = self.example_user('hamlet')
@@ -917,9 +908,9 @@ class GoogleSubdomainLoginTest(GoogleOAuthTest):
        self.user_profile.save()
        mobile_flow_otp = '1234abcd' * 8
        token_response = ResponseMock(200, {'access_token': "unique_token"})
-        account_data = dict(name=dict(formatted="Full Name"),
-                            emails=[dict(type="account",
-                                         value=self.user_profile.email)])
+        account_data = dict(name="Full Name",
+                            email_verified=True,
+                            email=self.user_profile.email)
        account_response = ResponseMock(200, account_data)
        self.assertEqual(len(mail.outbox), 0)

@@ -1137,9 +1128,9 @@ class GoogleSubdomainLoginTest(GoogleOAuthTest):

    def test_user_cannot_log_into_nonexisting_realm(self) -> None:
        token_response = ResponseMock(200, {'access_token': "unique_token"})
-        account_data = dict(name=dict(formatted="Full Name"),
-                            emails=[dict(type="account",
-                                         value=self.example_email("hamlet"))])
+        account_data = dict(name="Full Name",
+                            email_verified=True,
+                            email=self.example_email("hamlet"))
        account_response = ResponseMock(200, account_data)
        result = self.google_oauth2_test(token_response, account_response,
                                         subdomain='nonexistent')
@@ -1148,9 +1139,9 @@ class GoogleSubdomainLoginTest(GoogleOAuthTest):

    def test_user_cannot_log_into_wrong_subdomain(self) -> None:
        token_response = ResponseMock(200, {'access_token': "unique_token"})
-        account_data = dict(name=dict(formatted="Full Name"),
-                            emails=[dict(type="account",
-                                         value=self.example_email("hamlet"))])
+        account_data = dict(name="Full Name",
+                            email_verified=True,
+                            email=self.example_email("hamlet"))
        account_response = ResponseMock(200, account_data)
        result = self.google_oauth2_test(token_response, account_response,
                                         subdomain='zephyr')
@@ -1175,9 +1166,9 @@ class GoogleSubdomainLoginTest(GoogleOAuthTest):
        email = "newuser@zulip.com"
        realm = get_realm("zulip")
        token_response = ResponseMock(200, {'access_token': "unique_token"})
-        account_data = dict(name=dict(formatted="Full Name"),
-                            emails=[dict(type="account",
-                                         value=email)])
+        account_data = dict(name="Full Name",
+                            email_verified=True,
+                            email=email)
        account_response = ResponseMock(200, account_data)
        result = self.google_oauth2_test(token_response, account_response, subdomain='zulip',
                                         is_signup='1')
@@ -1267,17 +1258,6 @@ class GoogleLoginTest(GoogleOAuthTest):
        self.assertEqual(m.call_args_list[0][0][0],
                         "Google login failed making API call: Response text")

-    def test_google_oauth2_account_response_no_email(self) -> None:
-        token_response = ResponseMock(200, {'access_token': "unique_token"})
-        account_data = dict(name=dict(formatted="Full Name"),
-                            emails=[])
-        account_response = ResponseMock(200, account_data)
-        with mock.patch("logging.error") as m:
-            result = self.google_oauth2_test(token_response, account_response,
-                                             subdomain="zulip")
-        self.assertEqual(result.status_code, 400)
-        self.assertIn("Google oauth2 account email not found:", m.call_args_list[0][0][0])
-
    def test_google_oauth2_error_access_denied(self) -> None:
        result = self.client_get("/accounts/login/google/done/?error=access_denied")
        self.assertEqual(result.status_code, 302)
--- a/zerver/tests/test_slack_importer.py
+++ b/zerver/tests/test_slack_importer.py
@@ -20,6 +20,8 @@ from zerver.data_import.slack import (
    do_convert_data,
    process_avatars,
    process_message_files,
+    AddedChannelsT,
+    ZerverFieldsT,
 )
 from zerver.data_import.import_util import (
    build_zerver_realm,
@@ -55,7 +57,7 @@ import shutil
 import requests
 import os
 import mock
-from typing import Any, AnyStr, Dict, List, Optional, Set, Tuple
+from typing import Any, AnyStr, Dict, List, Optional, Set, Tuple, Iterator

 def remove_folder(path: str) -> None:
    if os.path.exists(path):
@@ -405,7 +407,7 @@ class SlackImporter(ZulipTestCase):
        self.assertEqual(zerver_usermessage[3]['id'], um_id + 4)
        self.assertEqual(zerver_usermessage[3]['message'], message_id)

-    @mock.patch("zerver.data_import.slack.build_usermessages", return_value = 2)
+    @mock.patch("zerver.data_import.slack.build_usermessages", return_value = (2, 4))
    def test_channel_message_to_zerver_message(self, mock_build_usermessage: mock.Mock) -> None:

        user_data = [{"id": "U066MTL5U", "name": "john doe", "deleted": False, "real_name": "John"},
@@ -446,7 +448,7 @@ class SlackImporter(ZulipTestCase):
            channel_message_to_zerver_message(
                1, user_data, added_users, added_recipient,
                all_messages, [], subscriber_map,
-                added_channels, 'domain')
+                added_channels, 'domain', set())
        # functioning already tested in helper function
        self.assertEqual(zerver_usermessage, [])
        # subtype: channel_join is filtered
@@ -483,14 +485,19 @@ class SlackImporter(ZulipTestCase):
        self.assertEqual(zerver_message[3]['sender'], 24)

    @mock.patch("zerver.data_import.slack.channel_message_to_zerver_message")
-    @mock.patch("zerver.data_import.slack.get_all_messages")
-    def test_convert_slack_workspace_messages(self, mock_get_all_messages: mock.Mock,
+    @mock.patch("zerver.data_import.slack.get_messages_iterator")
+    def test_convert_slack_workspace_messages(self, mock_get_messages_iterator: mock.Mock,
                                              mock_message: mock.Mock) -> None:
        os.makedirs('var/test-slack-import', exist_ok=True)
        added_channels = {'random': ('c5', 1), 'general': ('c6', 2)}  # type: Dict[str, Tuple[str, int]]
+
        time = float(timezone_now().timestamp())
        zerver_message = [{'id': 1, 'ts': time}, {'id': 5, 'ts': time}]

+        def fake_get_messages_iter(slack_data_dir: str, added_channels: AddedChannelsT) -> Iterator[ZerverFieldsT]:
+            import copy
+            return iter(copy.deepcopy(zerver_message))
+
        realm = {'zerver_subscription': []}  # type: Dict[str, Any]
        user_list = []  # type: List[Dict[str, Any]]
        reactions = [{"name": "grinning", "users": ["U061A5N1G"], "count": 1}]
@@ -498,14 +505,15 @@ class SlackImporter(ZulipTestCase):

        zerver_usermessage = [{'id': 3}, {'id': 5}, {'id': 6}, {'id': 9}]

-        mock_get_all_messages.side_effect = [zerver_message]
+        mock_get_messages_iterator.side_effect = fake_get_messages_iter
        mock_message.side_effect = [[zerver_message[:1], zerver_usermessage[:2],
                                     attachments, uploads, reactions[:1]],
                                    [zerver_message[1:2], zerver_usermessage[2:5],
                                     attachments, uploads, reactions[1:1]]]
+        # Hacky: We should include a zerver_userprofile, not the empty []
        test_reactions, uploads, zerver_attachment = convert_slack_workspace_messages(
            './random_path', user_list, 2, {}, {}, added_channels,
-            realm, [], 'domain', 'var/test-slack-import', chunk_size=1)
+            realm, [], [], 'domain', 'var/test-slack-import', chunk_size=1)
        messages_file_1 = os.path.join('var', 'test-slack-import', 'messages-000001.json')
        self.assertTrue(os.path.exists(messages_file_1))
        messages_file_2 = os.path.join('var', 'test-slack-import', 'messages-000002.json')
--- a/zerver/views/auth.py
+++ b/zerver/views/auth.py
@@ -395,7 +395,7 @@ def finish_google_oauth2(request: HttpRequest) -> HttpResponse:
    access_token = resp.json()['access_token']

    resp = requests.get(
-        'https://www.googleapis.com/plus/v1/people/me',
+        'https://www.googleapis.com/oauth2/v3/userinfo',
        params={'access_token': access_token}
    )
    if resp.status_code == 400:
@@ -406,21 +406,13 @@ def finish_google_oauth2(request: HttpRequest) -> HttpResponse:
        return HttpResponse(status=400)
    body = resp.json()

-    try:
-        full_name = body['name']['formatted']
-    except KeyError:
-        # Only google+ users have a formatted name. I am ignoring i18n here.
-        full_name = '{} {}'.format(
-            body['name']['givenName'], body['name']['familyName']
-        )
-    for email in body['emails']:
-        if email['type'] == 'account':
-            break
-    else:
-        logging.error('Google oauth2 account email not found: %s' % (body,))
+    if not body['email_verified']:
+        logging.error('Google oauth2 account email not verified.')
        return HttpResponse(status=400)

-    email_address = email['value']
+    # Extract the user info from the Google response
+    full_name = body['name']
+    email_address = body['email']

    try:
        realm = Realm.objects.get(string_id=subdomain)
Author	SHA1	Message	Date
Tim Abbott	8e7ac21fe0	Release Zulip Server 1.9.2.	2019-01-29 16:33:36 -08:00
Tim Abbott	cbfae3e0d0	import: Fix uploading avatars with S3 upload backend. This should hopefully be the last commit of this form; ultimately, my hope is that we'll be able to refactor the semi-duplicated logic in this file to avoid so much effort going into keeping this correct.	2019-01-29 16:26:19 -08:00
Tim Abbott	ffeb4340a9	auth: Migrate Google authentication off deprecated name API. As part of Google+ being removed, they've eliminated support for the /plus/v1/people/me endpoint. Replace it with the very similar /oauth2/v3/userinfo endpoint.	2019-01-29 16:17:27 -08:00
Matthew Wegner	79f781b9ea	import: Normalize Slackbot String Comparison. In very old Slack workspaces, slackbot can appear as "Slackbot", and the import script only checks for "slackbot" (case sensitive). This breaks the import--it throws the assert that immediately follows the test. I don't know how common this is, but it definitely affected our import. The simple fix is to compare against a lowercased-version of the user's full name.	2019-01-29 16:14:10 -08:00
Tim Abbott	fd89df63b4	hipchat: Fix importing of private messages. Apparently a stupid typing issue meant that we broke this a few weeks ago.	2019-01-29 16:13:25 -08:00
Tim Abbott	509d335705	import: Handle corner case around EMAIL_GATEWAY_BOT emails.	2019-01-29 16:12:01 -08:00
Tim Abbott	ce28ccf2bf	import: Fix pointer logic for zulip->zulip imports. Previously, the pointer was almost guaranteed to be an invalid random value, because we renumber message IDs unconditionally now.	2019-01-29 16:11:56 -08:00
Tim Abbott	4adbeedef6	hipchat: Handle unusual emoticons.json format. Apparently, hc-migrate can generate emoticons.json files with a somewhat different format. Assuming that other files are in the normal format, we should be able to handle it like this. See report in #11135.	2019-01-29 16:11:34 -08:00
Tim Abbott	09ed7d5b77	hipchat: Handle case where emoticons.json is not in export. Apparently, some methods of exporting from HipChat do not include an emoticons.json file. We could test for this using the `include_emoticons` field in `metadata.json`, but we currently don't even bother to read that file. Rather than changing that, we just print a warning and proceed. This is arguably better anyway, in that often not having emoticons.json is the result of user error when exporting, and it's nice to flag that this is happening. Fixes #11135.	2019-01-29 16:11:30 -08:00
Tim Abbott	f445d3f589	import: Ensure presence of basic avatar images for HipChat. Our HipChat conversion tool didn't properly handle basic avatar images, resulting in only the medium-size avatar images being imported properly. This fixes that bug by asking the import tool to do the thumbnailing for the basic avatar image (from the .original file) as well as the medium avatar image.	2019-01-29 16:11:23 -08:00
Tim Abbott	e8ee374d4f	slack import: Import long-inactive users as long-term idle. This avoids creating UserMessage rows for long-inactive users in organizations with many thousands of users.	2019-01-29 16:10:59 -08:00
Tim Abbott	9ff5359522	export: Remove assertion on current working directory. This command hasn't made deep assumptions about CWD for a long time, and this enables users to run it through a symlink (etc.). Fixes #10961.	2019-01-29 16:10:26 -08:00
Tim Abbott	a31f56443a	import: Avoid unnecessary forks when downloading attachments. The previous implementation used run_parallel incorrectly, passing it a set of very small jobs (each was to download a single file), which meant that we'd end up forking once for every file to download. This correct implementation sends each of N threads 1/N of the files to download, which is more consistent with the goal of distributing the download work between N threads.	2019-01-29 16:09:42 -08:00
rht	0b263d8b8c	slack import: Eliminate need to load all messages into memory. This works by yielding messages sorted based on timestamp. Because the Slack exports are broken into files by date, it's convenient to do a 2-layer sorting process, where we open all the files for a given day, and then sort their messages by timestamp before yielding them. Fixes #10930.	2019-01-29 16:09:37 -08:00
Tim Abbott	ad00b02c66	slack import: Fix all messages being imported to one channel. This was an ugly variable-escape-from-loop regression introduced in `e59ff6e6db`.	2019-01-29 16:09:06 -08:00
Tim Abbott	02f2ae4048	slack import: Fix empty values for custom profile fields. The Slack import process would incorrectly issue CustomProfileFieldValue entries with a value of "" for users who didn't have a given CustomProfileField (especially common for the "skype" and "phone" fields). This had no user-visible effect, but certainly added some clutter in the database.	2019-01-29 16:09:02 -08:00
Tim Abbott	56d4426738	gitter: Do something reasonable with invalid fullnames.	2019-01-29 16:08:55 -08:00
Tim Abbott	f0fe7d3887	scripts: Recommend apt update after enabling universe. One needs to manually do an apt update after add-apt-repository, or it won't actually work.	2019-01-29 16:07:31 -08:00
Sumanth V Rao	21166fbdf9	upgrade-zulip-stage-2: Added argument to skip purging old deployments. This makes it possible to add --skip-purge-old-deployments in the deploy_options section of /etc/zulip/zulip.conf, and control whether old deployments are purged automatically on a system. We still need to do https://github.com/zulip/zulip/issues/10534 and probably also to add these arguments to be directly passed into upgrade-zulip, but that can wait for future work. Fixes #10946.	2019-01-29 16:06:08 -08:00
Tim Abbott	b2c865aab5	scripts: Fix incorrect garbage-collection of emoji/node caches. Apparently, we were incorrectly expressing the paths in the caches_in_use data structures for these two cache-cleaning algorithms, resulting in the default threshhold_days algorithm controlling which caches could be garbage-collected. While the emoji one was just a performance optimization for upgrade-zulip-from-git, it was possible for the main `node_modules` cache in use in production to be GCed, resulting in LaTeX rendering being broken.	2019-01-29 16:05:32 -08:00