Migrate. From Solo
On this page we will try several methods of clickhouse migrations from single host to running cluster.
First, let us upload big data set onto our test solo server.
#Create folder for test data
mkdir clickhouse_test_data
#Download 8GB archives
wget -O- https://zenodo.org/records/7923702 | grep -oP 'https://zenodo.org/records/7923702/files/flightlist_\d+_\d+\.csv\.gz' | xargs wget
Now create table
CREATE TABLE opensky
(
callsign String,
number String,
icao24 String,
registration String,
typecode String,
origin String,
destination String,
firstseen DateTime,
lastseen DateTime,
day DateTime,
latitude_1 Float64,
longitude_1 Float64,
altitude_1 Float64,
latitude_2 Float64,
longitude_2 Float64,
altitude_2 Float64
) ENGINE = MergeTree ORDER BY (origin, destination, callsign);Import data (you'll need 32GB RAM on clickhouse server).
Now checks in client
Results

If you want to make migration consistent, you should block client SQL writes onto solo server. For that you should set "readonly" flag for corresponding SQL users. (be careful with default, it also used in dist tables and replicas). Therefore
Now you should wait 1h idle while user sessions are closed and proceed.
Method 1. Select from Remote
This method is good when you want compatibility between different clickhouse version, have small table sizes and reliable connection. Remember that clickhouse does not have atomic transactions.
First you should create tables on your cluster. Replicated and Distributed.
Next. From one node of a clickhouse cluster run
Now check that you data is loaded and properly sharded. Run on cluster node

On my 10GBit channel migration took about 80 seconds and 18GB traffic.
You can see above that in distributed table has the same records number as in solo server. But in replicated table number is about a half of that. As should it be when data is sharded onto 2 replica sets.
BTW, if your connection will drop permanently or you will press CTRL C, you will have some rows copied anyway.

If you have unstable connection you can change default timeout from 5 min to 15min in query like this
Clickhouse will continue to download data if connection is restored in this time period.
Method 2. Export Import
Another version-independent way is to export table data and schema and import them into the target system with some schema editing if needed.
First we will export data.
You can see the size difference
Now you should export corresponding schema for each table you need
Do not forget to edit this sql files, change you destination db and engine if needed!
Now import. Create schema and upload data (don't forget to create db beforehand)
That's it. You can see table being imported.

Last updated
Was this helpful?