Compare commits

..

20 Commits
1.9.1 ... 1.9.2

Author SHA1 Message Date
Tim Abbott
8e7ac21fe0 Release Zulip Server 1.9.2. 2019-01-29 16:33:36 -08:00
Tim Abbott
cbfae3e0d0 import: Fix uploading avatars with S3 upload backend.
This should hopefully be the last commit of this form; ultimately, my
hope is that we'll be able to refactor the semi-duplicated logic in
this file to avoid so much effort going into keeping this correct.
2019-01-29 16:26:19 -08:00
Tim Abbott
ffeb4340a9 auth: Migrate Google authentication off deprecated name API.
As part of Google+ being removed, they've eliminated support for the
/plus/v1/people/me endpoint.  Replace it with the very similar
/oauth2/v3/userinfo endpoint.
2019-01-29 16:17:27 -08:00
Matthew Wegner
79f781b9ea import: Normalize Slackbot String Comparison.
In very old Slack workspaces, slackbot can appear as "Slackbot", and
the import script only checks for "slackbot" (case sensitive).  This
breaks the import--it throws the assert that immediately follows the
test.  I don't know how common this is, but it definitely affected our
import.

The simple fix is to compare against a lowercased-version of the
user's full name.
2019-01-29 16:14:10 -08:00
Tim Abbott
fd89df63b4 hipchat: Fix importing of private messages.
Apparently a stupid typing issue meant that we broke this a few weeks
ago.
2019-01-29 16:13:25 -08:00
Tim Abbott
509d335705 import: Handle corner case around EMAIL_GATEWAY_BOT emails. 2019-01-29 16:12:01 -08:00
Tim Abbott
ce28ccf2bf import: Fix pointer logic for zulip->zulip imports.
Previously, the pointer was almost guaranteed to be an invalid random
value, because we renumber message IDs unconditionally now.
2019-01-29 16:11:56 -08:00
Tim Abbott
4adbeedef6 hipchat: Handle unusual emoticons.json format.
Apparently, hc-migrate can generate emoticons.json files with a
somewhat different format.  Assuming that other files are in the
normal format, we should be able to handle it like this.

See report in #11135.
2019-01-29 16:11:34 -08:00
Tim Abbott
09ed7d5b77 hipchat: Handle case where emoticons.json is not in export.
Apparently, some methods of exporting from HipChat do not include an
emoticons.json file.  We could test for this using the
`include_emoticons` field in `metadata.json`, but we currently don't
even bother to read that file.  Rather than changing that, we just
print a warning and proceed.  This is arguably better anyway, in that
often not having emoticons.json is the result of user error when
exporting, and it's nice to flag that this is happening.

Fixes #11135.
2019-01-29 16:11:30 -08:00
Tim Abbott
f445d3f589 import: Ensure presence of basic avatar images for HipChat.
Our HipChat conversion tool didn't properly handle basic avatar
images, resulting in only the medium-size avatar images being imported
properly.  This fixes that bug by asking the import tool to do the
thumbnailing for the basic avatar image (from the .original file) as
well as the medium avatar image.
2019-01-29 16:11:23 -08:00
Tim Abbott
e8ee374d4f slack import: Import long-inactive users as long-term idle.
This avoids creating UserMessage rows for long-inactive users in
organizations with many thousands of users.
2019-01-29 16:10:59 -08:00
Tim Abbott
9ff5359522 export: Remove assertion on current working directory.
This command hasn't made deep assumptions about CWD for a long time,
and this enables users to run it through a symlink (etc.).

Fixes #10961.
2019-01-29 16:10:26 -08:00
Tim Abbott
a31f56443a import: Avoid unnecessary forks when downloading attachments.
The previous implementation used run_parallel incorrectly, passing it
a set of very small jobs (each was to download a single file), which
meant that we'd end up forking once for every file to download.

This correct implementation sends each of N threads 1/N of the files
to download, which is more consistent with the goal of distributing
the download work between N threads.
2019-01-29 16:09:42 -08:00
rht
0b263d8b8c slack import: Eliminate need to load all messages into memory.
This works by yielding messages sorted based on timestamp.  Because
the Slack exports are broken into files by date, it's convenient to do
a 2-layer sorting process, where we open all the files for a given
day, and then sort their messages by timestamp before yielding them.

Fixes #10930.
2019-01-29 16:09:37 -08:00
Tim Abbott
ad00b02c66 slack import: Fix all messages being imported to one channel.
This was an ugly variable-escape-from-loop regression introduced in
e59ff6e6db.
2019-01-29 16:09:06 -08:00
Tim Abbott
02f2ae4048 slack import: Fix empty values for custom profile fields.
The Slack import process would incorrectly issue
CustomProfileFieldValue entries with a value of "" for users who
didn't have a given CustomProfileField (especially common for the
"skype" and "phone" fields).  This had no user-visible effect, but
certainly added some clutter in the database.
2019-01-29 16:09:02 -08:00
Tim Abbott
56d4426738 gitter: Do something reasonable with invalid fullnames. 2019-01-29 16:08:55 -08:00
Tim Abbott
f0fe7d3887 scripts: Recommend apt update after enabling universe.
One needs to manually do an apt update after add-apt-repository, or it
won't actually work.
2019-01-29 16:07:31 -08:00
Sumanth V Rao
21166fbdf9 upgrade-zulip-stage-2: Added argument to skip purging old deployments.
This makes it possible to add --skip-purge-old-deployments in the
deploy_options section of /etc/zulip/zulip.conf, and control whether
old deployments are purged automatically on a system.

We still need to do https://github.com/zulip/zulip/issues/10534 and
probably also to add these arguments to be directly passed into
upgrade-zulip, but that can wait for future work.

Fixes #10946.
2019-01-29 16:06:08 -08:00
Tim Abbott
b2c865aab5 scripts: Fix incorrect garbage-collection of emoji/node caches.
Apparently, we were incorrectly expressing the paths in the
caches_in_use data structures for these two cache-cleaning algorithms,
resulting in the default threshhold_days algorithm controlling which
caches could be garbage-collected.  While the emoji one was just a
performance optimization for upgrade-zulip-from-git, it was possible
for the main `node_modules` cache in use in production to be GCed,
resulting in LaTeX rendering being broken.
2019-01-29 16:05:32 -08:00
18 changed files with 328 additions and 118 deletions

View File

@@ -54,7 +54,7 @@ author = 'The Zulip Team'
# The short X.Y version.
version = '1.9'
# The full version, including alpha/beta/rc tags.
release = '1.9.1'
release = '1.9.2'
# This allows us to insert a warning that appears only on an unreleased
# version, e.g. to say that something is likely to have changed.

View File

@@ -7,6 +7,21 @@ All notable changes to the Zulip server are documented in this file.
This section lists notable unreleased changes; it is generally updated
in bursts.
### 1.9.2 -- 2019-01-29
This release migrates Zulip off a deprecated Google+ API (necessary
for Google Authentication to continue working past March 7), and
contains a few bug fixes for the installer and Slack import. It has
minimal changes for existing servers not using Google authentication.
- Updated the Google Auth integration to stop using a deprecated and
soon-to-be-removed Google+ authentication API.
- Improved installer error messages for common configuration problems.
- Fixed several bugs in Slack, Gitter, and HipChat import tools.
- Fixed a subtle bug in garbage-collection of the node_modules cache.
- Optimized performance of Slack import for organizations with
thousands of users.
### 1.9.1 -- 2018-11-30
This release is primarily intended to improve the experience for new

View File

@@ -37,8 +37,12 @@ upgrading.
If you're using Ubuntu, the
[Ubuntu universe repository][ubuntu-repositories] must be
[enabled][enable-universe], which is usually just `sudo
add-apt-repository universe`.
[enabled][enable-universe], which is usually just:
```
sudo add-apt-repository universe
sudo apt update
```
[ubuntu-repositories]:
https://help.ubuntu.com/community/Repositories/Ubuntu

View File

@@ -35,7 +35,8 @@ def get_caches_in_use(threshold_days):
# This happens for a deployment directory extracted from a
# tarball, which just has a copy of the emoji data, not a symlink.
continue
caches_in_use.add(os.readlink(emoji_link_path))
# The actual cache path doesn't include the /emoji
caches_in_use.add(os.path.dirname(os.readlink(emoji_link_path)))
return caches_in_use
def main(args: argparse.Namespace) -> None:

View File

@@ -45,7 +45,8 @@ def get_caches_in_use(threshold_days):
# If 'package.json' file doesn't exist then no node_modules
# cache is associated with this setup.
continue
caches_in_use.add(os.readlink(node_modules_link_path))
# The actual cache path doesn't include the /node_modules
caches_in_use.add(os.path.dirname(os.readlink(node_modules_link_path)))
return caches_in_use

View File

@@ -110,7 +110,10 @@ if [ "$os_id" = Ubuntu ] && ! apt-cache policy |
cat <<'EOF'
You must enable the Ubuntu Universe repository before installing
Zulip. You can do this with: `add-apt-repository universe`.
Zulip. You can do this with:
sudo add-apt-repository universe
sudo apt update
For more information, see:
https://zulip.readthedocs.io/en/latest/production/requirements.html

View File

@@ -40,6 +40,8 @@ parser.add_argument("--skip-migrations", dest="skip_migrations", action='store_t
help="Skip doing migrations.")
parser.add_argument("--from-git", dest="from_git", action='store_true',
help="Upgrading from git, so run update-prod-static.")
parser.add_argument("--skip-purge-old-deployments", dest="skip_purge_old_deployments",
action="store_true", help="Skip purging old deployments.")
args = parser.parse_args()
deploy_path = args.deploy_path
@@ -168,4 +170,8 @@ logging.info("Restarting Zulip...")
subprocess.check_output(["./scripts/restart-server"], preexec_fn=su_to_zulip)
logging.info("Upgrade complete!")
subprocess.check_call(["./scripts/purge-old-deployments"])
if not args.skip_purge_old_deployments:
logging.info("Purging old deployments...")
subprocess.check_call(["./scripts/purge-old-deployments"])
else:
logging.info("Skipping purging old deployments.")

View File

@@ -1,4 +1,4 @@
ZULIP_VERSION = "1.9.1"
ZULIP_VERSION = "1.9.2"
# Bump the minor PROVISION_VERSION to indicate that folks should provision
# only when going from an old version of the code to a newer version. Bump

View File

@@ -217,8 +217,13 @@ def get_usermentions(message: Dict[str, Any], user_map: Dict[str, int],
for mention in message['mentions']:
if mention.get('userId') in user_map:
gitter_mention = '@%s' % (mention['screenName'])
zulip_mention = ('@**%s**' %
(user_short_name_to_full_name[mention['screenName']]))
if mention['screenName'] not in user_short_name_to_full_name:
logging.info("Mentioned user %s never sent any messages, so has no full name data" %
mention['screenName'])
full_name = mention['screenName']
else:
full_name = user_short_name_to_full_name[mention['screenName']]
zulip_mention = ('@**%s**' % (full_name,))
message['text'] = message['text'].replace(gitter_mention, zulip_mention)
mentioned_user_ids.append(user_map[mention['userId']])

View File

@@ -313,16 +313,31 @@ def write_emoticon_data(realm_id: int,
fn = 'emoticons.json'
data_file = os.path.join(data_dir, fn)
if not os.path.exists(data_file):
logging.warning("HipChat export does not contain emoticons.json.")
logging.warning("As a result, custom emoji cannot be imported.")
return []
with open(data_file) as f:
data = ujson.load(f)
flat_data = [
dict(
path=d['Emoticon']['path'],
name=d['Emoticon']['shortcut'],
)
for d in data
]
if isinstance(data, dict) and 'Emoticons' in data:
# Handle the hc-migrate export format for emoticons.json.
flat_data = [
dict(
path=d['path'],
name=d['shortcut'],
)
for d in data['Emoticons']
]
else:
flat_data = [
dict(
path=d['Emoticon']['path'],
name=d['Emoticon']['shortcut'],
)
for d in data
]
emoji_folder = os.path.join(output_dir, 'emoji')
os.makedirs(emoji_folder, exist_ok=True)
@@ -506,7 +521,7 @@ def process_message_file(realm_id: int,
)
if is_pm_data:
if sender_id != fn_id:
if int(sender_id) != int(fn_id):
# PMs are in multiple places in the Hipchat export,
# and we only use the copy from the sender
return None

View File

@@ -3,9 +3,10 @@ import requests
import shutil
import logging
import os
import traceback
import ujson
from typing import List, Dict, Any, Optional, Set, Callable
from typing import List, Dict, Any, Optional, Set, Callable, Iterable, Tuple, TypeVar
from django.forms.models import model_to_dict
from zerver.models import Realm, RealmEmoji, Subscription, Recipient, \
@@ -13,7 +14,7 @@ from zerver.models import Realm, RealmEmoji, Subscription, Recipient, \
from zerver.data_import.sequencer import NEXT_ID
from zerver.lib.actions import STREAM_ASSIGNMENT_COLORS as stream_colors
from zerver.lib.avatar_hash import user_avatar_path_from_ids
from zerver.lib.parallel import run_parallel
from zerver.lib.parallel import run_parallel, JobData
# stubs
ZerverFieldsT = Dict[str, Any]
@@ -263,9 +264,15 @@ def build_usermessages(zerver_usermessage: List[ZerverFieldsT],
subscriber_map: Dict[int, Set[int]],
recipient_id: int,
mentioned_user_ids: List[int],
message_id: int) -> None:
message_id: int,
long_term_idle: Optional[Set[int]]=None) -> Tuple[int, int]:
user_ids = subscriber_map.get(recipient_id, set())
if long_term_idle is None:
long_term_idle = set()
user_messages_created = 0
user_messages_skipped = 0
if user_ids:
for user_id in sorted(user_ids):
is_mentioned = user_id in mentioned_user_ids
@@ -274,6 +281,12 @@ def build_usermessages(zerver_usermessage: List[ZerverFieldsT],
# It's possible we don't even get PMs from them.
is_private = False
if not is_mentioned and not is_private and user_id in long_term_idle:
# these users are long-term idle
user_messages_skipped += 1
continue
user_messages_created += 1
usermessage = build_user_message(
user_id=user_id,
message_id=message_id,
@@ -282,6 +295,7 @@ def build_usermessages(zerver_usermessage: List[ZerverFieldsT],
)
zerver_usermessage.append(usermessage)
return (user_messages_created, user_messages_skipped)
def build_user_message(user_id: int,
message_id: int,
@@ -388,7 +402,7 @@ def process_avatars(avatar_list: List[ZerverFieldsT], avatar_dir: str, realm_id:
downloaded. For simpler conversions see write_avatar_png.
"""
def get_avatar(avatar_upload_item: List[str]) -> int:
def get_avatar(avatar_upload_item: List[str]) -> None:
avatar_url = avatar_upload_item[0]
image_path = os.path.join(avatar_dir, avatar_upload_item[1])
@@ -398,7 +412,6 @@ def process_avatars(avatar_list: List[ZerverFieldsT], avatar_dir: str, realm_id:
with open(image_path, 'wb') as image_file:
shutil.copyfileobj(response.raw, image_file)
shutil.copy(image_path, original_image_path)
return 0
logging.info('######### GETTING AVATARS #########\n')
logging.info('DOWNLOADING AVATARS .......\n')
@@ -425,7 +438,7 @@ def process_avatars(avatar_list: List[ZerverFieldsT], avatar_dir: str, realm_id:
# Run downloads parallely
output = []
for (status, job) in run_parallel(get_avatar, avatar_upload_list, threads=threads):
for (status, job) in run_parallel_wrapper(get_avatar, avatar_upload_list, threads=threads):
output.append(job)
logging.info('######### GETTING AVATARS FINISHED #########\n')
@@ -458,10 +471,32 @@ def write_avatar_png(avatar_folder: str,
s3_path=image_path,
realm_id=realm_id,
user_profile_id=user_id,
# We only write the .original file; ask the importer to do the thumbnailing.
importer_should_thumbnail=True,
)
return metadata
ListJobData = TypeVar('ListJobData')
def run_parallel_wrapper(f: Callable[[ListJobData], None], full_items: List[ListJobData],
threads: int=6) -> Iterable[Tuple[int, List[ListJobData]]]:
logging.info("Distributing %s items across %s threads" % (len(full_items), threads))
def wrapping_function(items: List[ListJobData]) -> int:
count = 0
for item in items:
try:
f(item)
except Exception:
logging.info("Error processing item: %s" % (item,))
traceback.print_exc()
count += 1
if count % 1000 == 0:
logging.info("A download thread finished %s items" % (count,))
return 0
job_lists = [full_items[i::threads] for i in range(threads)] # type: List[List[ListJobData]]
return run_parallel(wrapping_function, job_lists, threads=threads)
def process_uploads(upload_list: List[ZerverFieldsT], upload_dir: str,
threads: int) -> List[ZerverFieldsT]:
"""
@@ -471,7 +506,7 @@ def process_uploads(upload_list: List[ZerverFieldsT], upload_dir: str,
1. upload_list: List of uploads to be mapped in uploads records.json file
2. upload_dir: Folder where the downloaded uploads are saved
"""
def get_uploads(upload: List[str]) -> int:
def get_uploads(upload: List[str]) -> None:
upload_url = upload[0]
upload_path = upload[1]
upload_path = os.path.join(upload_dir, upload_path)
@@ -480,7 +515,6 @@ def process_uploads(upload_list: List[ZerverFieldsT], upload_dir: str,
os.makedirs(os.path.dirname(upload_path), exist_ok=True)
with open(upload_path, 'wb') as upload_file:
shutil.copyfileobj(response.raw, upload_file)
return 0
logging.info('######### GETTING ATTACHMENTS #########\n')
logging.info('DOWNLOADING ATTACHMENTS .......\n')
@@ -493,7 +527,7 @@ def process_uploads(upload_list: List[ZerverFieldsT], upload_dir: str,
# Run downloads parallely
output = []
for (status, job) in run_parallel(get_uploads, upload_url_list, threads=threads):
for (status, job) in run_parallel_wrapper(get_uploads, upload_url_list, threads=threads):
output.append(job)
logging.info('######### GETTING ATTACHMENTS FINISHED #########\n')
@@ -522,7 +556,7 @@ def process_emojis(zerver_realmemoji: List[ZerverFieldsT], emoji_dir: str,
2. emoji_dir: Folder where the downloaded emojis are saved
3. emoji_url_map: Maps emoji name to its url
"""
def get_emojis(upload: List[str]) -> int:
def get_emojis(upload: List[str]) -> None:
emoji_url = upload[0]
emoji_path = upload[1]
upload_emoji_path = os.path.join(emoji_dir, emoji_path)
@@ -531,7 +565,6 @@ def process_emojis(zerver_realmemoji: List[ZerverFieldsT], emoji_dir: str,
os.makedirs(os.path.dirname(upload_emoji_path), exist_ok=True)
with open(upload_emoji_path, 'wb') as emoji_file:
shutil.copyfileobj(response.raw, emoji_file)
return 0
emoji_records = []
upload_emoji_list = []
@@ -555,7 +588,7 @@ def process_emojis(zerver_realmemoji: List[ZerverFieldsT], emoji_dir: str,
# Run downloads parallely
output = []
for (status, job) in run_parallel(get_emojis, upload_emoji_list, threads=threads):
for (status, job) in run_parallel_wrapper(get_emojis, upload_emoji_list, threads=threads):
output.append(job)
logging.info('######### GETTING EMOJIS FINISHED #########\n')

View File

@@ -10,11 +10,13 @@ import logging
import random
import requests
from collections import defaultdict
from django.conf import settings
from django.db import connection
from django.utils.timezone import now as timezone_now
from django.forms.models import model_to_dict
from typing import Any, Dict, List, Optional, Tuple, Set
from typing import Any, Dict, List, Optional, Tuple, Set, Iterator
from zerver.forms import check_subdomain_available
from zerver.models import Reaction, RealmEmoji, Realm, UserProfile, Recipient, \
CustomProfileField, CustomProfileFieldValue
@@ -263,6 +265,9 @@ def build_customprofilefields_values(custom_field_map: ZerverFieldsT, fields: Ze
user_id: int, custom_field_id: int,
custom_field_values: List[ZerverFieldsT]) -> int:
for field, value in fields.items():
if value['value'] == "":
# Skip writing entries for fields with an empty value
continue
custom_field_value = CustomProfileFieldValue(
id=custom_field_id,
value=value['value'])
@@ -296,7 +301,7 @@ def get_user_email(user: ZerverFieldsT, domain_name: str) -> str:
else:
raise AssertionError("Could not identify bot type")
return slack_bot_name.replace("Bot", "").replace(" ", "") + "-bot@%s" % (domain_name,)
if get_user_full_name(user) == "slackbot":
if get_user_full_name(user).lower() == "slackbot":
return "imported-slackbot-bot@%s" % (domain_name,)
raise AssertionError("Could not find email address for Slack user %s" % (user,))
@@ -434,9 +439,61 @@ def get_subscription(channel_members: List[str], zerver_subscription: List[Zerve
subscription_id += 1
return subscription_id
def process_long_term_idle_users(slack_data_dir: str, users: List[ZerverFieldsT],
added_users: AddedUsersT, added_channels: AddedChannelsT,
zerver_userprofile: List[ZerverFieldsT]) -> Set[int]:
"""Algorithmically, we treat users who have sent at least 10 messages
or have sent a message within the last 60 days as active.
Everyone else is treated as long-term idle, which means they will
have a slighly slower first page load when coming back to
Zulip.
"""
all_messages = get_messages_iterator(slack_data_dir, added_channels)
sender_counts = defaultdict(int) # type: Dict[str, int]
recent_senders = set() # type: Set[str]
NOW = float(timezone_now().timestamp())
for message in all_messages:
timestamp = float(message['ts'])
slack_user_id = get_message_sending_user(message)
if not slack_user_id:
# Ignore messages without user names
continue
if slack_user_id in recent_senders:
continue
if NOW - timestamp < 60:
recent_senders.add(slack_user_id)
sender_counts[slack_user_id] += 1
for (slack_sender_id, count) in sender_counts.items():
if count > 10:
recent_senders.add(slack_sender_id)
long_term_idle = set()
for slack_user in users:
if slack_user["id"] in recent_senders:
continue
zulip_user_id = added_users[slack_user['id']]
long_term_idle.add(zulip_user_id)
# Record long-term idle status in zerver_userprofile
for user_profile_row in zerver_userprofile:
if user_profile_row['id'] in long_term_idle:
user_profile_row['long_term_idle'] = True
# Setting last_active_message_id to 1 means the user, if
# imported, will get the full message history for the
# streams they were on.
user_profile_row['last_active_message_id'] = 1
return long_term_idle
def convert_slack_workspace_messages(slack_data_dir: str, users: List[ZerverFieldsT], realm_id: int,
added_users: AddedUsersT, added_recipient: AddedRecipientsT,
added_channels: AddedChannelsT, realm: ZerverFieldsT,
zerver_userprofile: List[ZerverFieldsT],
zerver_realmemoji: List[ZerverFieldsT], domain_name: str,
output_dir: str,
chunk_size: int=MESSAGE_BATCH_CHUNK_SIZE) -> Tuple[List[ZerverFieldsT],
@@ -448,12 +505,12 @@ def convert_slack_workspace_messages(slack_data_dir: str, users: List[ZerverFiel
2. uploads, which is a list of uploads to be mapped in uploads records.json
3. attachment, which is a list of the attachments
"""
all_messages = get_all_messages(slack_data_dir, added_channels)
# we sort the messages according to the timestamp to show messages with
# the proper date order
all_messages = sorted(all_messages, key=lambda message: message['ts'])
long_term_idle = process_long_term_idle_users(slack_data_dir, users, added_users,
added_channels, zerver_userprofile)
# Now, we actually import the messages.
all_messages = get_messages_iterator(slack_data_dir, added_channels)
logging.info('######### IMPORTING MESSAGES STARTED #########\n')
total_reactions = [] # type: List[ZerverFieldsT]
@@ -461,8 +518,6 @@ def convert_slack_workspace_messages(slack_data_dir: str, users: List[ZerverFiel
total_uploads = [] # type: List[ZerverFieldsT]
# The messages are stored in batches
low_index = 0
upper_index = low_index + chunk_size
dump_file_id = 1
subscriber_map = make_subscriber_map(
@@ -470,14 +525,21 @@ def convert_slack_workspace_messages(slack_data_dir: str, users: List[ZerverFiel
)
while True:
message_data = all_messages[low_index:upper_index]
message_data = []
_counter = 0
for msg in all_messages:
_counter += 1
message_data.append(msg)
if _counter == chunk_size:
break
if len(message_data) == 0:
break
zerver_message, zerver_usermessage, attachment, uploads, reactions = \
channel_message_to_zerver_message(
realm_id, users, added_users, added_recipient, message_data,
zerver_realmemoji, subscriber_map, added_channels,
domain_name)
domain_name, long_term_idle)
message_json = dict(
zerver_message=zerver_message,
@@ -491,26 +553,39 @@ def convert_slack_workspace_messages(slack_data_dir: str, users: List[ZerverFiel
total_attachments += attachment
total_uploads += uploads
low_index = upper_index
upper_index = chunk_size + low_index
dump_file_id += 1
logging.info('######### IMPORTING MESSAGES FINISHED #########\n')
return total_reactions, total_uploads, total_attachments
def get_all_messages(slack_data_dir: str, added_channels: AddedChannelsT) -> List[ZerverFieldsT]:
all_messages = [] # type: List[ZerverFieldsT]
def get_messages_iterator(slack_data_dir: str, added_channels: AddedChannelsT) -> Iterator[ZerverFieldsT]:
"""This function is an iterator that returns all the messages across
all Slack channels, in order by timestamp. It's important to
not read all the messages into memory at once, because for
large imports that can OOM kill."""
all_json_names = defaultdict(list) # type: Dict[str, List[str]]
for channel_name in added_channels.keys():
channel_dir = os.path.join(slack_data_dir, channel_name)
json_names = os.listdir(channel_dir)
for json_name in json_names:
all_json_names[json_name].append(channel_dir)
# Sort json_name by date
for json_name in sorted(all_json_names.keys()):
messages_for_one_day = [] # type: List[ZerverFieldsT]
for channel_dir in all_json_names[json_name]:
message_dir = os.path.join(channel_dir, json_name)
messages = get_data_file(message_dir)
channel_name = os.path.basename(channel_dir)
for message in messages:
# To give every message the channel information
message['channel_name'] = channel_name
all_messages += messages
return all_messages
messages_for_one_day += messages
# we sort the messages according to the timestamp to show messages with
# the proper date order
for message in sorted(messages_for_one_day, key=lambda m: m['ts']):
yield message
def channel_message_to_zerver_message(realm_id: int,
users: List[ZerverFieldsT],
@@ -520,11 +595,12 @@ def channel_message_to_zerver_message(realm_id: int,
zerver_realmemoji: List[ZerverFieldsT],
subscriber_map: Dict[int, Set[int]],
added_channels: AddedChannelsT,
domain_name: str) -> Tuple[List[ZerverFieldsT],
List[ZerverFieldsT],
List[ZerverFieldsT],
List[ZerverFieldsT],
List[ZerverFieldsT]]:
domain_name: str,
long_term_idle: Set[int]) -> Tuple[List[ZerverFieldsT],
List[ZerverFieldsT],
List[ZerverFieldsT],
List[ZerverFieldsT],
List[ZerverFieldsT]]:
"""
Returns:
1. zerver_message, which is a list of the messages
@@ -543,6 +619,8 @@ def channel_message_to_zerver_message(realm_id: int,
with open(NAME_TO_CODEPOINT_PATH) as fp:
name_to_codepoint = ujson.load(fp)
total_user_messages = 0
total_skipped_user_messages = 0
for message in all_messages:
user = get_message_sending_user(message)
if not user:
@@ -620,14 +698,19 @@ def channel_message_to_zerver_message(realm_id: int,
zerver_message.append(zulip_message)
# construct usermessages
build_usermessages(
(num_created, num_skipped) = build_usermessages(
zerver_usermessage=zerver_usermessage,
subscriber_map=subscriber_map,
recipient_id=recipient_id,
mentioned_user_ids=mentioned_user_ids,
message_id=message_id,
long_term_idle=long_term_idle,
)
total_user_messages += num_created
total_skipped_user_messages += num_skipped
logging.debug("Created %s UserMessages; deferred %s due to long-term idle" % (
total_user_messages, total_skipped_user_messages))
return zerver_message, zerver_usermessage, zerver_attachment, uploads_list, \
reaction_list
@@ -802,7 +885,7 @@ def do_convert_data(slack_zip_file: str, output_dir: str, token: str, threads: i
reactions, uploads_list, zerver_attachment = convert_slack_workspace_messages(
slack_data_dir, user_list, realm_id, added_users, added_recipient, added_channels,
realm, realm['zerver_realmemoji'], domain_name, output_dir)
realm, realm['zerver_userprofile'], realm['zerver_realmemoji'], domain_name, output_dir)
# Move zerver_reactions to realm.json file
realm['zerver_reaction'] = reactions

View File

@@ -1287,8 +1287,6 @@ def do_export_realm(realm: Realm, output_dir: Path, threads: int,
if not settings.TEST_SUITE:
assert threads >= 1
assert os.path.exists("./manage.py")
realm_config = get_realm_config()
create_soft_link(source=output_dir, in_progress=True)

View File

@@ -621,6 +621,8 @@ def import_uploads_s3(bucket_name: str, import_dir: Path, processing_avatars: bo
if record['s3_path'].endswith('.original'):
user_profile = get_user_profile_by_id(record['user_profile_id'])
upload_backend.ensure_medium_avatar_image(user_profile=user_profile)
if record.get("importer_should_thumbnail"):
upload_backend.ensure_basic_avatar_image(user_profile=user_profile)
def import_uploads(import_dir: Path, processing_avatars: bool=False,
processing_emojis: bool=False) -> None:
@@ -723,6 +725,10 @@ def do_import_realm(import_dir: Path, subdomain: str) -> Realm:
# Remap the user IDs for notification_bot and friends to their
# appropriate IDs on this server
for item in data['zerver_userprofile_crossrealm']:
if item['email'].startswith("emailgateway@"):
# The email gateway bot's email is customized to a
# different domain on some servers.
item['email'] = settings.EMAIL_GATEWAY_BOT
logging.info("Adding to ID map: %s %s" % (item['id'], get_system_bot(item['email']).id))
new_user_id = get_system_bot(item['email']).id
update_id_map(table='user_profile', old_id=item['id'], new_id=new_user_id)
@@ -903,6 +909,29 @@ def do_import_realm(import_dir: Path, subdomain: str) -> Realm:
update_model_ids(Reaction, data, 'reaction')
bulk_import_model(data, Reaction)
for user_profile in UserProfile.objects.filter(is_bot=False, realm=realm):
# Since we now unconditionally renumbers message IDs, we need
# to reset the user's pointer to what will be a valid value.
#
# For zulip->zulip imports, we could do something clever, but
# it should always be safe to reset to first unread message.
#
# Longer-term, the plan is to eliminate pointer as a concept.
first_unread_message = UserMessage.objects.filter(user_profile=user_profile).extra(
where=[UserMessage.where_unread()]
).first()
if first_unread_message is not None:
user_profile.pointer = first_unread_message.message_id
else:
last_message = UserMessage.objects.filter(user_profile=user_profile).last()
if last_message is not None:
user_profile.pointer = last_message.message_id
else:
# -1 is the guard value for new user accounts with no messages.
user_profile.pointer = -1
user_profile.save(update_fields=["pointer"])
# Do attachments AFTER message data is loaded.
# TODO: de-dup how we read these json files.
fn = os.path.join(import_dir, "attachment.json")

View File

@@ -182,6 +182,9 @@ class ZulipUploadBackend:
def ensure_medium_avatar_image(self, user_profile: UserProfile) -> None:
raise NotImplementedError()
def ensure_basic_avatar_image(self, user_profile: UserProfile) -> None:
raise NotImplementedError()
def upload_realm_icon_image(self, icon_file: File, user_profile: UserProfile) -> None:
raise NotImplementedError()
@@ -443,6 +446,27 @@ class S3UploadBackend(ZulipUploadBackend):
resized_medium
)
def ensure_basic_avatar_image(self, user_profile: UserProfile) -> None: # nocoverage
# TODO: Refactor this to share code with ensure_medium_avatar_image
file_path = user_avatar_path(user_profile)
# Also TODO: Migrate to user_avatar_path(user_profile) + ".png".
s3_file_name = file_path
bucket_name = settings.S3_AVATAR_BUCKET
conn = S3Connection(settings.S3_KEY, settings.S3_SECRET_KEY)
bucket = get_bucket(conn, bucket_name)
key = bucket.get_key(file_path + ".original")
image_data = key.get_contents_as_string()
resized_avatar = resize_avatar(image_data) # type: ignore # image_data is `bytes`, boto subs are wrong
upload_image_to_s3(
bucket_name,
s3_file_name,
"image/png",
user_profile,
resized_avatar
)
def upload_emoji_image(self, emoji_file: File, emoji_file_name: str,
user_profile: UserProfile) -> None:
content_type = guess_type(emoji_file.name)[0]
@@ -589,6 +613,19 @@ class LocalUploadBackend(ZulipUploadBackend):
resized_medium = resize_avatar(image_data, MEDIUM_AVATAR_SIZE)
write_local_file('avatars', file_path + '-medium.png', resized_medium)
def ensure_basic_avatar_image(self, user_profile: UserProfile) -> None: # nocoverage
# TODO: Refactor this to share code with ensure_medium_avatar_image
file_path = user_avatar_path(user_profile)
output_path = os.path.join(settings.LOCAL_UPLOADS_DIR, "avatars", file_path + ".png")
if os.path.isfile(output_path):
return
image_path = os.path.join(settings.LOCAL_UPLOADS_DIR, "avatars", file_path + ".original")
image_data = open(image_path, "rb").read()
resized_avatar = resize_avatar(image_data)
write_local_file('avatars', file_path + '.png', resized_avatar)
def upload_emoji_image(self, emoji_file: File, emoji_file_name: str,
user_profile: UserProfile) -> None:
emoji_path = RealmEmoji.PATH_ID_TEMPLATE.format(

View File

@@ -874,9 +874,9 @@ class GoogleSubdomainLoginTest(GoogleOAuthTest):
def test_google_oauth2_success(self) -> None:
token_response = ResponseMock(200, {'access_token': "unique_token"})
account_data = dict(name=dict(formatted="Full Name"),
emails=[dict(type="account",
value=self.example_email("hamlet"))])
account_data = dict(name="Full Name",
email_verified=True,
email=self.example_email("hamlet"))
account_response = ResponseMock(200, account_data)
result = self.google_oauth2_test(token_response, account_response,
subdomain='zulip', next='/user_uploads/image')
@@ -892,24 +892,15 @@ class GoogleSubdomainLoginTest(GoogleOAuthTest):
parsed_url.path)
self.assertTrue(uri.startswith('http://zulip.testserver/accounts/login/subdomain/'))
def test_google_oauth2_no_fullname(self) -> None:
def test_user_cannot_log_without_verified_email(self) -> None:
token_response = ResponseMock(200, {'access_token': "unique_token"})
account_data = dict(name=dict(givenName="Test", familyName="User"),
emails=[dict(type="account",
value=self.example_email("hamlet"))])
account_data = dict(name="Full Name",
email_verified=False,
email=self.example_email("hamlet"))
account_response = ResponseMock(200, account_data)
result = self.google_oauth2_test(token_response, account_response, subdomain='zulip')
data = load_subdomain_token(result)
self.assertEqual(data['email'], self.example_email("hamlet"))
self.assertEqual(data['name'], 'Test User')
self.assertEqual(data['subdomain'], 'zulip')
self.assertEqual(data['next'], '')
self.assertEqual(result.status_code, 302)
parsed_url = urllib.parse.urlparse(result.url)
uri = "{}://{}{}".format(parsed_url.scheme, parsed_url.netloc,
parsed_url.path)
self.assertTrue(uri.startswith('http://zulip.testserver/accounts/login/subdomain/'))
result = self.google_oauth2_test(token_response, account_response,
subdomain='zulip')
self.assertEqual(result.status_code, 400)
def test_google_oauth2_mobile_success(self) -> None:
self.user_profile = self.example_user('hamlet')
@@ -917,9 +908,9 @@ class GoogleSubdomainLoginTest(GoogleOAuthTest):
self.user_profile.save()
mobile_flow_otp = '1234abcd' * 8
token_response = ResponseMock(200, {'access_token': "unique_token"})
account_data = dict(name=dict(formatted="Full Name"),
emails=[dict(type="account",
value=self.user_profile.email)])
account_data = dict(name="Full Name",
email_verified=True,
email=self.user_profile.email)
account_response = ResponseMock(200, account_data)
self.assertEqual(len(mail.outbox), 0)
@@ -1137,9 +1128,9 @@ class GoogleSubdomainLoginTest(GoogleOAuthTest):
def test_user_cannot_log_into_nonexisting_realm(self) -> None:
token_response = ResponseMock(200, {'access_token': "unique_token"})
account_data = dict(name=dict(formatted="Full Name"),
emails=[dict(type="account",
value=self.example_email("hamlet"))])
account_data = dict(name="Full Name",
email_verified=True,
email=self.example_email("hamlet"))
account_response = ResponseMock(200, account_data)
result = self.google_oauth2_test(token_response, account_response,
subdomain='nonexistent')
@@ -1148,9 +1139,9 @@ class GoogleSubdomainLoginTest(GoogleOAuthTest):
def test_user_cannot_log_into_wrong_subdomain(self) -> None:
token_response = ResponseMock(200, {'access_token': "unique_token"})
account_data = dict(name=dict(formatted="Full Name"),
emails=[dict(type="account",
value=self.example_email("hamlet"))])
account_data = dict(name="Full Name",
email_verified=True,
email=self.example_email("hamlet"))
account_response = ResponseMock(200, account_data)
result = self.google_oauth2_test(token_response, account_response,
subdomain='zephyr')
@@ -1175,9 +1166,9 @@ class GoogleSubdomainLoginTest(GoogleOAuthTest):
email = "newuser@zulip.com"
realm = get_realm("zulip")
token_response = ResponseMock(200, {'access_token': "unique_token"})
account_data = dict(name=dict(formatted="Full Name"),
emails=[dict(type="account",
value=email)])
account_data = dict(name="Full Name",
email_verified=True,
email=email)
account_response = ResponseMock(200, account_data)
result = self.google_oauth2_test(token_response, account_response, subdomain='zulip',
is_signup='1')
@@ -1267,17 +1258,6 @@ class GoogleLoginTest(GoogleOAuthTest):
self.assertEqual(m.call_args_list[0][0][0],
"Google login failed making API call: Response text")
def test_google_oauth2_account_response_no_email(self) -> None:
token_response = ResponseMock(200, {'access_token': "unique_token"})
account_data = dict(name=dict(formatted="Full Name"),
emails=[])
account_response = ResponseMock(200, account_data)
with mock.patch("logging.error") as m:
result = self.google_oauth2_test(token_response, account_response,
subdomain="zulip")
self.assertEqual(result.status_code, 400)
self.assertIn("Google oauth2 account email not found:", m.call_args_list[0][0][0])
def test_google_oauth2_error_access_denied(self) -> None:
result = self.client_get("/accounts/login/google/done/?error=access_denied")
self.assertEqual(result.status_code, 302)

View File

@@ -20,6 +20,8 @@ from zerver.data_import.slack import (
do_convert_data,
process_avatars,
process_message_files,
AddedChannelsT,
ZerverFieldsT,
)
from zerver.data_import.import_util import (
build_zerver_realm,
@@ -55,7 +57,7 @@ import shutil
import requests
import os
import mock
from typing import Any, AnyStr, Dict, List, Optional, Set, Tuple
from typing import Any, AnyStr, Dict, List, Optional, Set, Tuple, Iterator
def remove_folder(path: str) -> None:
if os.path.exists(path):
@@ -405,7 +407,7 @@ class SlackImporter(ZulipTestCase):
self.assertEqual(zerver_usermessage[3]['id'], um_id + 4)
self.assertEqual(zerver_usermessage[3]['message'], message_id)
@mock.patch("zerver.data_import.slack.build_usermessages", return_value = 2)
@mock.patch("zerver.data_import.slack.build_usermessages", return_value = (2, 4))
def test_channel_message_to_zerver_message(self, mock_build_usermessage: mock.Mock) -> None:
user_data = [{"id": "U066MTL5U", "name": "john doe", "deleted": False, "real_name": "John"},
@@ -446,7 +448,7 @@ class SlackImporter(ZulipTestCase):
channel_message_to_zerver_message(
1, user_data, added_users, added_recipient,
all_messages, [], subscriber_map,
added_channels, 'domain')
added_channels, 'domain', set())
# functioning already tested in helper function
self.assertEqual(zerver_usermessage, [])
# subtype: channel_join is filtered
@@ -483,14 +485,19 @@ class SlackImporter(ZulipTestCase):
self.assertEqual(zerver_message[3]['sender'], 24)
@mock.patch("zerver.data_import.slack.channel_message_to_zerver_message")
@mock.patch("zerver.data_import.slack.get_all_messages")
def test_convert_slack_workspace_messages(self, mock_get_all_messages: mock.Mock,
@mock.patch("zerver.data_import.slack.get_messages_iterator")
def test_convert_slack_workspace_messages(self, mock_get_messages_iterator: mock.Mock,
mock_message: mock.Mock) -> None:
os.makedirs('var/test-slack-import', exist_ok=True)
added_channels = {'random': ('c5', 1), 'general': ('c6', 2)} # type: Dict[str, Tuple[str, int]]
time = float(timezone_now().timestamp())
zerver_message = [{'id': 1, 'ts': time}, {'id': 5, 'ts': time}]
def fake_get_messages_iter(slack_data_dir: str, added_channels: AddedChannelsT) -> Iterator[ZerverFieldsT]:
import copy
return iter(copy.deepcopy(zerver_message))
realm = {'zerver_subscription': []} # type: Dict[str, Any]
user_list = [] # type: List[Dict[str, Any]]
reactions = [{"name": "grinning", "users": ["U061A5N1G"], "count": 1}]
@@ -498,14 +505,15 @@ class SlackImporter(ZulipTestCase):
zerver_usermessage = [{'id': 3}, {'id': 5}, {'id': 6}, {'id': 9}]
mock_get_all_messages.side_effect = [zerver_message]
mock_get_messages_iterator.side_effect = fake_get_messages_iter
mock_message.side_effect = [[zerver_message[:1], zerver_usermessage[:2],
attachments, uploads, reactions[:1]],
[zerver_message[1:2], zerver_usermessage[2:5],
attachments, uploads, reactions[1:1]]]
# Hacky: We should include a zerver_userprofile, not the empty []
test_reactions, uploads, zerver_attachment = convert_slack_workspace_messages(
'./random_path', user_list, 2, {}, {}, added_channels,
realm, [], 'domain', 'var/test-slack-import', chunk_size=1)
realm, [], [], 'domain', 'var/test-slack-import', chunk_size=1)
messages_file_1 = os.path.join('var', 'test-slack-import', 'messages-000001.json')
self.assertTrue(os.path.exists(messages_file_1))
messages_file_2 = os.path.join('var', 'test-slack-import', 'messages-000002.json')

View File

@@ -395,7 +395,7 @@ def finish_google_oauth2(request: HttpRequest) -> HttpResponse:
access_token = resp.json()['access_token']
resp = requests.get(
'https://www.googleapis.com/plus/v1/people/me',
'https://www.googleapis.com/oauth2/v3/userinfo',
params={'access_token': access_token}
)
if resp.status_code == 400:
@@ -406,21 +406,13 @@ def finish_google_oauth2(request: HttpRequest) -> HttpResponse:
return HttpResponse(status=400)
body = resp.json()
try:
full_name = body['name']['formatted']
except KeyError:
# Only google+ users have a formatted name. I am ignoring i18n here.
full_name = '{} {}'.format(
body['name']['givenName'], body['name']['familyName']
)
for email in body['emails']:
if email['type'] == 'account':
break
else:
logging.error('Google oauth2 account email not found: %s' % (body,))
if not body['email_verified']:
logging.error('Google oauth2 account email not verified.')
return HttpResponse(status=400)
email_address = email['value']
# Extract the user info from the Google response
full_name = body['name']
email_address = body['email']
try:
realm = Realm.objects.get(string_id=subdomain)