Data migration may not always smooth like copy past. The complexity goes peak when data volume is very larger and error tolerance level is minimum. This article, i am sharing some my experience of data migration to redshift. Hope this will help some people.
Basic working steps are belows,
- Export delimited flat file from SQL server
- Upload source file in S3
- Execute copy command
1. Export CSV from SQL server
A lots of way this can be done. But you must be carefully and will have to know some restrictions of redshift before going to prepare source file. Below i have listed some common consideration,
- Redshift only supports ASCII and UTF-8.
Please recheck you data, if there any possibility to contain junk character. Redshift copy command will fail if there any unsupported character in source file. Data cleanup is recommended for succesfull copy command.
- Redshift does not support datetime similarly SQL server. Redshift “date” type only store date part. If time part is important to you, datetime data will have to transform to timestamp.
Here, i prefer two methods to export flat file from Sql Server.
1.1 SQL Server Export Wizard
Will have to login using SQL Server management stadio, then you find Export data under Task menu by right clicking on database. Please do not forget to select utf-8 for destination file encoding.
Excellent technique to use BCC to export flat. Below i give a sample BCC command which will have to run in command promt. Data will be exported in Tab delimated file.
cmd> bcp "SELECT * FROM [database].dbo.[tablename]" queryout D:\data.txt -S[HostIP] -U[DbUser -PDbPassword] -c
2. Upload CSV in S3
Make Gzip and then upload the file into s3 using AWS console.
3. Execute Copy command
Now build your copy command and execute using Redshift client. Below i give basic copy command. You will get a lots of writing in Redshift documentation to build your custom copy command.
COPY tablename FROM 's3://data.txt' CREDENTIALS 'aws_access_key_id=[key] aws_secret_access_key=[key]' DELIMITER '\t' GZIP IGNOREHEADER 1
You may not lucky enough to succeed Copy command first attempt -): . If you get error, see redshift error log using below query. After fixing, then again execute copy command,
select * from stl_load_errors order by starttime desc limit 100
Copy command will be terminated in analyzing part, if there any error in source data. This means, no data will not imported for single error in source data. But some case, you may need to import data ignoring invalid rows. Then add “MAXERROR 1000” at the end of copy command. Actually MAXERROR can helpfull to identified invalid row because Redshift will log all “ignore rows” in stl_load_errors. So after identified and fixing errors, copy command should execute without “MAXERROR 1000”.
COPY tablename FROM 's3://testdata/data.csv' CREDENTIALS 'aws_access_key_id=[key] aws_secret_access_key=[key]' DELIMITER ',' GZIP IGNOREHEADER 1 MAXERROR 1000
Best wishes !!