Category Archives: MySQL

If we want to customize our ids or any other columns of the tables, we can do that in Hibernate without adding the extra tables or columns to the table.

Here we are creating the simple example by using the org.hibernate.usertype.UserType, to generate the customized id.

First we are creating the simple table in MYSQL database:

Following is the Coupon bean:

We will use following CustomId value object to generate our own type of Id

So we want to store the data into the COUPON table with the customized id type, for that we need to implement the interface org.hibernate.usertype.UserType. And need to specify which one is the value class to generate the custom id by using one of this interface method returnedClass().

Following is the implementation for CustomIdType class by providing the implementation for the UserType interface.

The brief description about above methods as follows:

sqlTypes: This method returns an array of int, telling Hibernate which SQL column types to use for persisting the entity properties.

returnedClass: This method specifies which Java value type is mapped by this custom type.

assemble: Hibernate calls this method when the instance is fetched from the second-level cache and converted back from binary serialized to the object form.

disassemble: Hibernate may cache any value-type instance in its second-level cache. For this purpose, Hibernate calls this method to convert the value-type instance to the serialized binary form. Return the current instance if the value type implements the interface; otherwise, convert it to a Serializable object.

deepCopy: This method creates a copy of the value type if the value type is mutable; otherwise, it returns the current instance. Note that when you create a copy of an object, you should also copy the object associations and collections.

equals: This method compares two instances of the value type mapped by this custom type to check whether they are equal.

hashCode: This method returns a hashcode for the instance, consistent with persistence equality.

isMutable: This method specifies whether the value type is mutable. Since immutable objects cannot be updated or deleted by the application, defining the value type as immutable allows Hibernate to do some minor performance optimization.

nullSafeGet: This method constructs the value-type instance when the instance is retrieved from the database. resultset is the JDBC ResultSet object containing the instance values, names is an array of the column names queried, and owner is the persistent object associated with the value-type instance. Note that you should handle the possibility of null values.

nullSafeSet: This method is called when the value-type instance is written to a prepared statement to be stored or updated in the database. Handle the possibility of null values. A multi-column type should be written to parameters starting from index.

replace: Assume that the application maintains an instance of the value type that its associated session has already closed. Such objects are not tracked and managed by Hibernate, so they are called detached. Hibernate lets you merge the detached object with a session-managed persistent object through the session’s merge() method. Hibernate calls the replace() method when two instances, detached and session-managed, are merged. The first and second arguments of this method are value-type instances associated with a detached and session-managed persistent object, respectively. The third argument represents the owner object, the persistent object that owns the original value type: Coupon in our case. This method replaces the existing (target) value in the persistent object we are merging with a new (original) value from the detached persistent object we are merging. For immutable objects or null values, return the first argument. For mutable objects, return at least a copy of the first argument through the deepCopy() method.

We need to configure the Hibernate configuration file. The example of configuration file as follows,


We need to write the mapping configuration for the Coupon to COUPON table, so as follows,


And also we need to write the mapping for CustomIdType to tell the hibernate we are using the CustomId as one of the type for our application like as follows,



OK, all set. This is the time to test our application, following is the test client program using the Java main method.


Hey, don’t forget to have hibernate cfg file available to classpath.

All The Best 🙂

If you are aware of “big data” initiative with Hadoop (, one of your first questions would be related to the
cluster sizing. What is the right hardware to choose in terms of price/performance? How much hardware you need to handle your data and your workload. Viagra By Mail

Cluster Size:
The cluster you want to use should be planned for X TB of usable capacity, where X is the amount you’ve calculated based on
your business needs. Don’t forget to take into account data growth rate and data retention period you need.
This is not a complex exercise so I hope you have at least a basic understanding of how much data you want to host on your Hadoop cluster.
To host X TB of data with the default replication factor of 3 you would need 3*X TB of raw storage.

Formula to calculate HDFS nodes Storage (H)
Below is the formula to calculate the HDFS Storage size required, when building a new Hadoop cluster. viagra genericas

C =Compression ratio. It depends on the type of compression used (Snappy, LZOP, …) and size of the data. When no compression is used, C=1.
R = Replication factor. It is usually 3 in a production cluster.
S = Initial size of data need to be moved to Hadoop. This could be a combination of historical data and incremental data.
(In this, we need to consider the growth rate of Initial Data as well, at least for next 3-6 months period, Like we have 500
TB data now, and it is expected that 50 TB will be ingested in next three months, and Output files from MR Jobs may
create at least 10 % of the initial data, then we need to consider 600 TB as the initial data size).

i = intermediate data factor. It is usually 1/3 or 1/4. It is Hadoop’s Intermediate working space dedicated to storing
intermediate results of Map Tasks are any temporary storage used in Pig or Hive. This is a common guidlines for many
production applications. Even Cloudera has recommended 25% for intermediate results.
120 % – or 1.2 times the above total size, this is because, We have to allow room for the file system underlying the HDFS.
For HDFS, this is ext3 or ext4 usually which gets very, very unhappy at much above 80% fill.
I.e. For example, if you have your cluster total size as 1200 TB, but it is recommended to use only up to 1000 TB. cialis coupon

With no compression i.e. C = 1, Replication factor of 3, and Intermediate factor of 0.25 = 1/4
H = 1*3*S/(1-1/4) = 3*S/(3/4) = 4*S

With the assumptions above, the Hadoop storage is estimated to be 4 times the size of the initial data size.
Formula to Calculate the No of data nodes:
Number of data nodes (n):

where d= disk space available per node. Here we also need to consider the RAM, IOPS bandwidth, CPU configurations of nodes as well. generic viagra

RAM Considerations:
We need to have enough RAM space for our own processes to run as well as buffer space for transferring data through the shuffle step.
Small memory means that we can’t run as many mappers in parallel as our CPU hardware will support which will slow down our processing.
The number of reducers is often more limited by how much random I/O the reducers cause on the source nodes than by memory,
but some reducers are very memory hungry.

64 GB should be enough for moderate sized dual socket motherboards, but there are definitely applications where
128 GB would improve speed. Moving above 128 GB is unlikely to improve speed for most motherboards.
Network speed is moderately important during normal operations.

The calculation works for data nodes, but assumes that

No node or storage failure
Only running map reduce or Hive, Pig (probably not a fair of putting it), but we might need to consider the formula when some analytic tools that create 3x + storage of processed data as intermediate storage are being used in the cluster.
Other performance characteristics are not relevant (processor, memory)
Adding new hardware is instantaneous.

avg data per node data usage (in GB) on the hive cluster

we need to provide the vertica team an estimate of the amount of data to be moved over
DFS Used: 1084955316224 (1010.44 GB)
DFS Used: 1090360705024 (1015.48 GB)
DFS Used: 1088533073920 (1013.78 GB)
DFS Used: 1048273481728 (976.28 GB)
so total data volume approx = (1010 + 1015 + 1013 + 976)/3 = 1.3 TB on disc where to buy viagra in gold coast

DFS Used: 4312122576896 (3.92 TB)
3.92/3 = 1.3 TB
we divide by replication factor….because that is the number of times we copy each bit of data..
number of nodes doesn’t matter

Problem: Try to create/run below query

Example 1:

ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes

Example 2:

ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes

Why we are getting above errors:
We are trying to create index, that index is combination of 2 columns(campaign+network) so we are getting ERROR 1071 error
If you’ve ever tried to add an index that includes a long varchar or multiple columns to an InnoDB table in MySQL, you may have seen this error:

By default, the index key prefix length limit is 767 bytes. For example, you might hit this limit with a column prefix index of more than 255 characters on a TEXT or VARCHAR column, assuming a utf8mb3 character set and the maximum of 3 bytes for each character.

solution :-  

When the innodb_large_prefix configuration option is enabled, the index key prefix length limit is raised to 3072 bytes for InnoDB tables that use DYNAMIC or COMPRESSED row format. by default innodb_large_prefix is on from mysql 5.7.x onwards.

set global innodb_large_prefix=true;
set global innodb_file_format = BARRACUDA;

mysql> set global innodb_large_prefix=true;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> set global innodb_file_format = BARRACUDA;
Query OK, 0 rows affected, 1 warning (0.00 sec)

After setting the above values we can able to create unique index ;

Query OK, 0 rows affected (0.08 sec)

Query OK, 0 rows affected (0.06 sec)
prior to mysql 5.5 the only solution the only work around this is to reduce the column size while creatign the index.
Here is the example.

for more reference:

commands to know the version of mysql server use the following commands

Barracuda: The code name for an InnoDB file format that supports the COMPRESSED row format that enables InnoDB table compression, and the DYNAMIC row format that improves the storage layout for long variable-length columns.

ETL Testing is quite different from conventional testing. There are many challenges we faced while performing ETL Testing. Here is the list of few ETL testing challenges i have experienced on Care Project.

ETL Testing Challenges

1. Verify Incompatible data.

2. Loss of data during ETL process.

3. Volume and complexity of data is very huge.

4. Missing business flow information.

5. Verify the data on the both source and destination systems.

6. Need to verify the NOT NULL condition of the field values.

Data is important for business to make the critical business decisions. ETL testing plays a significant role validating and ensuring that the business information is exact, consistent and reliable.

Also, it minimize hazard of data loss in production.

ETL vs Database Testing

There is a popular misunderstanding that database testing and data warehouse (ETL + BI) testing is similar, while the fact is that both hold different direction in testing.

1. Database testing is

done using smaller scale of data normally with OLTP (Online transaction processing) type of databases while, data warehouse testing is done with large volume with data involving OLAP (Online analytical processing) databases.

2. In database testing normally data is consistently injected from the uniform sources, while in data warehouse testing most of the data comes from different kind of data sources while are sequentially inconsistent.

3. We generally perform only CRUD(Create,read,update and delete) operations in database testing, while in data warehouse testing we use read-only (Select) operation.

There are number of universal verification’s that have to be carried out for any kind of data warehouse testing. Below is the list of objects that are treated as essential for validation in ETL testing.

1. Verify that data transformation from source to destination works as expected.

2. Verify that expected data is added in target system.

3. Verify that all the DB fields (columns) and fields data is loaded without any truncation.

4. Verify data checksum for record count match.

5. Verify NULL value fields.

6. Verify if there are any duplicates in the loaded data.

7. Verify that the data is incrementally getting updated.

These are main difference between both ETL and Database testing.

Before we learn about ETL testing its important to know about the terms Business Intelligence and Data warehouse.
Let’s get started with these two terms.

Business Intelligence(BI):BI is the process of collecting raw data or business data and turning it into information that is useful and more meaningful. The raw data is the records of the daily transaction of an organization such as interactions with customers,administrations of finance and so on. These data will be used of the Reporting and analysis purpose.

Data warehouse(DW):A data warehouse is a database that is designed for query and analysis rather than for transaction processing. The data warehouse is constructed by integrating the data from multiple heterogeneous sources.

It enables the company and organizations to consolidate data from several sources and separates analysis workload from the transaction workload.

Data is turned into high quality information to meet all enterprise reporting requirements for all levels of users.

Why do organizations need data warehouse?

Data is most important part of any organization, it may be everyday data or historical data. And Data is backbone of any report and reports are the baseline on which all the important management decisions are taken. Most of the companies are taking step forward for constructing their data warehouse to store and monitor real time data as well
as historical data.

Designing an efficient data warehouse is not an easy job. ETL tools is employed in order to make a flawless integration between different data sources from the final destination system.

ETL tool will work as an integrator, extracting data from the different sources,transforming it in preferred format based on the business transformations rules and loading it in DB known are Data warehouse.

What is ETL?

ETL stands for Extract-Transform-Load and it is a process of how data is loaded from the source system to the data warehouse. Data is extracted from an OLTP database,transformed to match the data warehouse schema and located into the data warehouse system.Many data warehouses also incorporate data from non-OLTP systems such as text,files,legacy systems and spreadsheets.

Diagram (ETL):

ETL Image

Process of ETL: In the above diagram, ETL(Extract-Transform-Load) its exacts the data from the different sources like source1, source2 and source3 and load the data into the centralized data base called ‘Data warehouse’.

The process, extracting the data from the different source systems and transform and finally load it into the
final destination system, this process is called ETL Testing.

Once the data is sucessfully loaded into the final destination system (Data warehouse), by using some Business Intelligence(BI) tools, ex like ‘ireports’,’tableau’, we can get the final results in the different formats like in the form of the reports, with these reports, Business people can take decisions easily. The process of verifying the reports which are generated from the Data warehouse system, is called (Business Intelligence) BI Testing.

The combination of the both ETL Testing + BI Testing is called ‘Data warehouse Testing’.

ETL (Extract-Transform-Load), During Extraction the desired data is identified and extracted from many different sources, including database systems and applications. Very often it is not possible to identify the specific subset of interest,therefore more data than necessary has to be extracted, so the identification of the relevant data will be done at a later point in time.

After the data is extracted, it has to be physically transported to the target system or to an intermediate system for further processing. Depending on the chosen way of transportation,some transformations can be done during this process. And finally loaded into the final destination system.

ETL or Data warehouse testing is categorized into below different types irrespective of technology or ETL tool used.

New Data warehouse Testing: New DW is built and Verified from the scratch. Data input is taken from the customer requirements and different data sources and new data warehouse is build and verified with the help of the ETL tools.

Report Testing: Reports are the end result of any data warehouse and basic purpose for which DW (Data warehouse) is build.

Report must be tested by validating layout,data in the report and calculations.

ETL Testing Techniques:

1. Verify that data is transformed correctly according to various business requirements and rules.

2. Make sure that all projected data is loaded into the data warehouse without any data loss and truncation.

3. Make sure that ETL appropriately rejects and replace default value.

Below are the some of the well know ETL Tools:

The most well known commercial tools are Ab Initio,IBM InfoSphere DataStage,Informatica,Oracle Data Integrator and SAP Data Integrator.

And there are few open source ETL tools like CloverETL, Pentaho and Talend.

Note: Challenges for an ETL Tester and Differences between Database testing (DB) and Data warehouse testing (DWH) will be on next blog.


We will be setting up a Ruby on Rails development environment on Ubuntu 14.04.

The reason we’re going to be using Ubuntu is because the majority of code you write will run on a Linux server. Ubuntu is one of the easiest Linux distributions to use with lots of documentation so it’s a great one to start with.

Note: Before going to start installation please make sure that you have root privileges or not.

Installing Ruby Dependencies:The first step is to install some dependencies for Ruby.

   sudo apt-get update
   sudo apt-get install git-core curl zlib1g-dev build-essential libssl-dev libreadline-dev     libyaml-dev libsqlite3-dev sqlite3 libxml2-dev libxslt1-dev libcurl4-openssl-dev python-software-properties libffi-dev

Next we’re going to be installing Ruby using rvm. You can install from source as well.

Install Ruby Using RVM:

    sudo apt-get install libgdbm-dev libncurses5-dev automake libtool bison libffi-dev
    gpg --keyserver hkp:// --recv-keys 409B6B1796C275462A1703113804BB82D39DC0E3
    curl -sSL | bash -s stable
    source ~/.rvm/scripts/rvm
    rvm install 2.2.3
    rvm use 2.2.3 --default
    ruby -v

Installing Rails Dependencies:Since Rails ships with many dependencies, we’re going to need to install a Javascript runtime like NodeJS. This lets you use Coffeescript and the Asset Pipeline in Rails which combines and minifies your javascript to provide a faster production environment.

    curl -sL | sudo -E bash -
    sudo apt-get install -y nodejs

We can use various Rails versions with each Ruby by creating gemsets and then installing Rails within those using the normal gem commands:

    rvm gemset create starterapp
    rvm 2.2.3@starterapp
    gem install rails -v 4.2.4
  • “rvm gemset create starterapp” command is to create a gemset name called starterapp.
  • “rvm 2.2.3@starterapp” command is to specify Ruby version and our new gemset.
  • “gem install rails -v 4.2.4” command is to install specific Rails version.

Now that you’ve installed Rails, you can run the rails -v command to make sure you have everything installed correctly:

     rails -v
     # Rails 4.2.4

The last step is to install Bundler.

     gem install bundler

Configuring Git:(If you already installed git please ignore this step)
We’ll be using Git for our version control system so we’re going to set it up to match our Github account. If you don’t already have a Github account, make sure to register. It will come in handy for the future.

Replace my name and email address in the following steps with the ones you used for your Github account.

    git config --global color.ui true
    git config --global "YOUR NAME"
    git config --global ""
    ssh-keygen -t rsa -b 4096 -C ""

The next step is to take the newly generated SSH key and add it to your Github account. You want to copy and paste the output of the following command and paste it here

    cat ~/.ssh/

Once you’ve done this, you can check and see if it worked:

    ssh -T

You should get a message like this:

    Hi excid3! You've successfully authenticated, but GitHub does not provide shell access.

Setting Up MySQL:

Rails ships with sqlite3 as the default database. Chances are you won’t want to use it because it’s stored as a simple file on disk. You’ll probably want something more robust like MySQL or PostgreSQL.If you’re coming from PHP, you may already be familiar with MySQL.

You can install MySQL server and client from the packages in the Ubuntu repository. As part of the installation process, you’ll set the password for the root user. This information will go into your Rails app’s database.yml file in the future.

    sudo apt-get install mysql-server mysql-client libmysqlclient-dev

Installing the libmysqlclient-dev gives you the necessary files to compile the mysql2 gem which is what Rails will use to connect to MySQL when you setup your Rails app.

Setting Up PostgreSQL:For PostgreSQL, we’re going to add a new repository to easily install a recent version of Postgres.

    sudo sh -c "echo 'deb precise-pgdg main' > /etc/apt/sources.list.d/pgdg.list"
    wget --quiet -O - | sudo apt-key add -
    sudo apt-get update
    sudo apt-get install postgresql-common
    sudo apt-get install postgresql-9.5 libpq-dev

The postgres installation doesn’t setup a user for you, so you’ll need to follow these steps to create a user with permission to create databases. Feel free to replace vijay with your username.

    sudo -u postgres createuser vijay -s

    # If you would like to set a password for the user, you can do the following
    sudo -u postgres psql
    postgres=# \password vijay

Final Steps: Let’s create your first Rails application:

    #### If you want to use SQLite (not recommended)
    rails new myapp

    #### If you want to use MySQL
    rails new myapp -d mysql

    #### If you want to use Postgres
    # Note that this will expect a postgres user with the same username
    # as your app, you may need to edit config/database.yml to match the
    # user you created earlier
    rails new myapp -d postgresql

    # Move into the application directory
    cd myapp

    # If you setup MySQL or Postgres with a username/password, modify the
    # config/database.yml file to contain the username/password that you specified

    # Create the database
    rake db:create

    rails server

Note: Sentence start’s with ‘#’ are comments not commands.

You can now visit http://localhost:3000 to view your new website. Now that you’ve got your machine setup, it’s time to start building some Rails applications.

If you received an error that said Access denied for user ‘root’@’localhost’ (using password: NO) then you need to update your config/database.yml file to match the database username and password.

Reference: gorails

Thanks for reading this Article. If you have any questions, feel free to post your comments and we’ll get back to you.

We have 4 different tables.

Table 1: Member;
Table 2: employer;
Table 3: employer_custom_field;
Table 4: employer_custom_value;

Member table contains Member information like memberid,datesignup,empid etc
Employer table contains information like empid,memberid,empname,domainname..etc
employer_custom_field contains employer custom fields deatials like field1,filed2,field3..etc
employer_custom_value contains employer custom field’s value deatials like value1,value2,value3..etc

Member has 1-1 relation with employer.
employer has 1-N relation with employer_custom_field.
employer has 1-N relation with employer_custom_value.
employer_custom_field has 1-1 relation with employer_custom_value.
Member has 1-N relation with employer_custom_value.



Problem statement:

create new table with above tables.
That table contains Member details with empid,empname,field1,2,3 and value1,value2,value3 in a row.

Output table shold be like below:



1. Go to –> root folder where you installed protractor


2. Use npm to install mysql library to connect from Protractor tests.

   D:\test> npm install mysql

3. go to –>


create a directory..

 D:\test\node_modules\protractor> mkdir tests 

this directory contains ‘.js’ files in which we write our testsuite


 describe('angularjs homepage todo list', function() {
    it('should add a todo', function() {

        element(by.model('todoList.todoText')).sendKeys('write first protractor test');

        var todoList = element.all(by.repeater('todo in todoList.todos'));
        expect(todoList.get(2).getText()).toEqual('write first protractor test');

        console.log("tests tests");

        // You wrote your first test, cross it off the list
        var completedAmount = element.all(by.css('.done-true'));

4. For Database connection details

As we created tests directory, under the same path create a directory with name dbconnection

  D:\test\node_modules\protractor> mkdir dbconnection 

this directory contains ‘.js’ file with dbconnection details


function ConnectDatabase() {
    var mysql = require('../../mysql');

    this.connection = mysql.createConnection({
        host: '********',
        user: '********',
        password: '********',
        database: '*********'
module.exports = ConnectDatabase;

5. Update the example.js file with, creating a DB connection object

In the Below spec there are two tests,
*E2E test
*DB Interaction


var ConnectDatabase = require("../connections/connectDatabase");
var connectDatabase = new ConnectDatabase();

describe('angularjs homepage todo list', function() {

    // E2E test

    it('should add a todo', function(done) {

        element(by.model('todoList.todoText')).sendKeys('write first protractor test');

        var todoList = element.all(by.repeater('todo in todoList.todos'));
        expect(todoList.get(2).getText()).toEqual('write first protractor test');


        // You wrote your first test, cross it off the list
        var completedAmount = element.all(by.css('.done-true'));

    // Test which interacts with MySql DB

    it('Should add', function(done) {

        var sql = 'SELECT * FROM MEMBER LIMIT 5';
        connectDatabase.connection.query(sql, function(err, rows) {

            if (!err) {
                console.log("solution is :", rows);


6. conf.js (from which the execution starts)

  exports.config = {
    directConnect: true,

    // Capabilities to be passed to the webdriver instance.
    capabilities: {
        'browserName': 'chrome'

    specs: ['../tests/example.js'],

    // Options to be passed to Jasmine-node.

    jasmineNodeOpts: {
        showColors: true,
        defaultTimeoutInterval: 50000

7. Run

 D:\test\node_modules\protractor\example>protractor conf.js  

8. see the command prompt..
tests gets passed and the query returns the Result..