Tuesday, April 8, 2014

Unknown/unsupported storage engine: InnoDB


140408 16:36:12 [ERROR] Unknown/unsupported storage engine: InnoDB
140408 16:36:12 [ERROR] Aborting

$ sudo rm /var/lib/mysql/ib_logfile*

then it will be fixed.

Setup MariaDB Spider Engine

1. Install MariaDB(10.0.4 or later version are compatible with spider engine).
1) Add the following to /etc/yum.repos.d/MariaDB.repo  or find your repo here
# MariaDB 10.0 CentOS repository list - created 2014-04-08 20:31 UTC # http://mariadb.org/mariadb/repositories/ [mariadb] name = MariaDB baseurl = http://yum.mariadb.org/10.0/centos6-amd64 gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB gpgcheck=1

2). 
$ sudo yum install MariaDB*

2. Activate spider engine

$ mysql -uroot -p
mysql> source /usr/share/mysql/install_spider.sql

3. Modify privileges, access Database remotely $ sudo service mysql stop
1) add the following to /etc/my.cnf
[mariadb]                                                                            
port=3306 
2) sudo service mysql start
3) figure out your mysql path, for example mine is
/usr/bin/mysql
4) set privileges (login mysql remotely, this is for the communication among all servers, run those commands on each node)
I have 3 nodes, dbnode1, dbnode2, and dbnode3
$ mysql -u root -p                                                                   
Enter password:                                                                      
mysql> use mysql                                                                     
mysql> GRANT ALL ON *.* to root@'dbnode1' IDENTIFIED BY 'your-root-password';  
mysql> GRANT ALL ON *.* to root@'dbnode2' IDENTIFIED BY 'your-root-password'; 
mysql> GRANT ALL ON *.* to root@'dbnode3' IDENTIFIED BY 'your-root-password';  
mysql> FLUSH PRIVILEGES;  


4 Set alias(easy access from other servers/nodes)
alias backend1='/usr/bin/mysql --user=root -password=your-pass --host=dbnode1 --port=3306'
alias backend2='/usr/bin/mysql --user=root -password=your-pass --host=dbnode2 --port=3306'
alias backend3='/usr/bin/mysql --user=root -password=your-pass --host=dbnode3 --port=3306'


5 Create database and table

Im using dbnode3 as my spider server, dbnode1 and dbnode2 are acting as normal databases:
1) dbnode1 and dbnode2
CREATE DATABASE backend;                                                             
CREATE TABLE backend.sbtest (                                                        
id int(10) unsigned NOT NULL AUTO_INCREMENT,                                         
k int(10) unsigned NOT NULL DEFAULT '0',                                             
c char(120) NOT NULL DEFAULT '',                                                     
pad char(60) NOT NULL DEFAULT '',                                                    
PRIMARY KEY (id),                                                                    
KEY k (k)                                                                            
) ENGINE=InnoDB; 

2) on dbnode3(spider server)
CREATE SERVER backend1                                                             
  FOREIGN DATA WRAPPER mysql                                                         
OPTIONS(                                                                             
  HOST 'dbnode1',                                                                    
  DATABASE 'backend',                                                                
  USER 'root',                                                                       
  PASSWORD 'your-pass',                                                               
  PORT 3306                                                                          
);    

CREATE SERVER backend2                                                               
  FOREIGN DATA WRAPPER mysql                                                         
OPTIONS(                                                                             
  HOST 'dbnode2',                                                                    
  DATABASE 'backend',                                                                
  USER 'root',                                                                       
  PASSWORD 'your-pass',                                                               
  PORT 3306                                                                          
);                                                                                   
                                                                                     
CREATE DATABASE IF NOT EXISTS backend;                                               
                                                  
CREATE  TABLE backend.sbtest                                                         
(                                                                                    
  id int(10) unsigned NOT NULL AUTO_INCREMENT,                                       
  k int(10) unsigned NOT NULL DEFAULT '0',                                           
  c char(120) NOT NULL DEFAULT '',                                                   
  pad char(60) NOT NULL DEFAULT '',                                                  
  PRIMARY KEY (id),                                                                  
  KEY k (k)                                                                          
) ENGINE=spider COMMENT='wrapper "mysql", table "sbtest"'                            
 PARTITION BY KEY (id)                                                               
(                                                                                    
 PARTITION pt1 COMMENT = 'srv "backend1"',                                           
 PARTITION pt2 COMMENT = 'srv "backend2"'                                            

) ; 


Now it should work.

important: check firewall on your system.


If you want to install MariaDB Galera Cluster(5.5 Series), take a look at this link:
http://matthewcasperson.blogspot.ca/2013/07/setting-up-galera-cluster-in-centos-6.html

Make sure install the MariaDB-Galera-Server other than MariaDB-Server.
























Wednesday, April 2, 2014

Amazon Redshift Experience

Purpose is to analyse a big SQL table. For example, the table I am going to query contains 90 million lines and each row has 17 columns.  All those data in 20K+ csv files will take 13GB disk space.

Checked on-line about Redshift's performance, here's the comparison of several famous analytic framworks.  All comparison results  says Redshift seems to be my best choice.  With the number of cluster nodes increasing, the performance is guaranteed to improve.

Important:
Loads data from Amazon S3 to Redshift,and S3 and Redshift MUST in the same region or you will get error.


Loads data from local to S3
Before I uploads data to S3, I split my 13GB file to 20K+ files, and then compress them into GZIP(the reason is you can import GZIP file to Redshift database directly, I like this!), I wrote a python script to gzip all files and upload to S3(boto python).

  • s3cmd
  • boto python library
  • Amazon S3 upload option
  • Amazon S3 CLI


Loads data from S3 to Redshift
Installs Postgres8.4 as a client side to communicate with Redshift database.
The following command is accessing the database. -h is the redshift endpoint name, -U is the master name, -d is database name and -p is port number
$ psql -h ... -U ... -d ... -p ...

The alternative way is workbenchSQL which is not working on my ubuntu12.04

I wrote another python script to loads data from S3 to Redshift.  You need to know is there's no way to provide password in the shell commands
for example, it is not possible to provide password to the command "psql -h ... -U ... -d ... -p ..."
I solved this problems with a hacky way:

$ export  PASSWORD=yourpassword                                                            
$ psql -h ... -U ... -d ... -p ... -c "copy ....delimiter... gzip;"

then loop the "COPY" commands to transfer files, it will not prompt for the password anymore.
-c means run the command on your terminal, and line quoted are the sql query.  If you import a csv file, field separate is comma, you have to provide "delimiter ',' ", if the csv file is compressed in gzip, you have to add gzip at the end.  You also need to provide the access key id and secret key in your COPY command.

Performance:
1. At the beginning, single node cluster(dw2.large), queries are
select count(*) from my_table;
select count(distinct columnX) from my_table;
select count(distinct columnY) from my_table where data_date like '2013-%';
the query spending time are 5 seconds, 18 seconds and 1m55seconds respectively.

2. Then resize to 2-node cluster(both dw2.large), execute the 3rd query and time spent decrease to 46 seconds.

Amazon Redshift provides UI to monitor the cluster, you can check status, performance, query history, load history easily.