Slack import: duplicate messages

Description

Timebox this if it turns out to be a massive timesink!

Initial GitHub report: https://github.com/mattermost/mattermost-server/issues/8348

Summary
Sometimes, when importing Slack exports through the web interface, messages are duplicated.

Steps to reproduce
I'm using clean Docker image mattermost/platform (Mattermost 4.7.1, Build Hash: a9d4c7d, Build Date: Sat Feb 17 00:47:16 UTC 2018) for every import attempt.

My Slack workspace is relatively large, it contains more than 30k messages. Archive with attachments (obtained with this tool) weighs 145 MB.

So, I start the Docker image, navigate to http://172.17.0.2:8065/ (for some reason, --publish 8065:80 doesn't work for me), create new empty team in Mattermost, raise the limit of users per team, and upload the zip file through the web interface.

Observed behavior (that appears unintentional)
When importing through the web interface, there's a great chance (in fact, I think I never managed to import archive without messages being duplicated) that messages will be duplicated. Usually they're only duplicated twice, but on one occasion there were 9 copies of each message.

Possible fixes
I don't have any ideas why this happens, but I have a couple of observations.

First, it seems to never happen when importing through the CLI (./bin/platform import slack ...).

Pruning the history a bit by opening archive and removing JSONs with old posts will result in archive that imports correctly. So this issue likely depends on the size of the archive.

I carefully read the log when messages where duplicated 9 times, and noticed an interesting pattern. At the end of Slack import, Mattermost purges its caches (

mattermost-server/app/slackimport.go
Line 682 in c5e8cb2
a.InvalidateAllCaches()

), which results in a logged message ([2018/02/21 17:02:07 UTC] [INFO] Purging all caches). In the log these line repeated exactly 9 times, and after each one (except the last) the same warnings about some user missing e-mail were logged. It looks as if, for some reason, the archive was processed 9 times in a row.
As of import report, it only included messages from the "last" import attempt, i.e. all users and all channels were merged with existing ones.

QA Test Steps

No testing required.

Mana

2

Assignee

George Goldberg

QA Assignee

Lindy Isherwood

Reporter

Lindy Isherwood

Epic Link

None

Fix versions

Mattermost Team

Platform

Sprint

None

Labels

None

QA Testing Areas

None

GitHub Issue

None

Components

None

Severity

None
Configure