Tuesday, October 28, 2014

How to use dynamic sql and use its return value



SET @sql = NULL;
SET @cout = '';
SELECT CONCAT('SELECT CONCAT(',GROUP_CONCAT(c.COLUMN_NAME, ',","'),') as v1 INTO @cout FROM 
   student where enrolment_month =given_month() and enrolment_year=given_year()')
INTO @sql
FROM INFORMATION_SCHEMA.COLUMNS c
WHERE c.TABLE_NAME IN ('client_group_types') ;
PREPARE sql_statement FROM @sql;
EXECUTE sql_statement;
DEALLOCATE PREPARE sql_statement; 
SET @Param2 = @cout;
SELECT @Param2;

Can use @Param2 as return type of a function

Thursday, October 9, 2014

Activity Streams for Social Media




Today most of the social media data is stored in No SQL databases as Activity Streams. Below given is some of useful activity streams I found over the internet


Discussion
{
"_id" : "4d19fb9ddhd4000005",
"_t" : "Discussion",
"created" : "Tue, 28 Dec 2010 07:00:35 GMT",
"GroupID" : 129242,
"eIndividualID" : "22872F9",
"firstName" : "Danial",
"lastName" : "Micheal",
"discussionStart" : "How is the meeting.",
"comments": [
{
"_id" : "4d19fe329aceye400000a",
"eIndividualID" : "FFE4251",
"created" : "2010-12-28T15:11:46.6760000Z",
"firstName" : "Robin",
"lastName" : "Tally",
"commentText" : "Went alright. Thank you"
},
{
"_id" : "4d19feyey027400000a",
"eIndividualID" : "FFE441",
"created" : "2010-12-28T15:11:46.6760000Z",
"firstName" : "Robin",
"lastName" : "Tally",
"commentText" : "That is good"
},
{
"_id" : "4d1deheh000003",
... } ]
}

Live Chat 
{
"_id": "4cd261dsdfshsd000006",
"senderEIndividualID": "1B0D6E8",
"senderFirstName": "Danial",
"senderLastName": "Tally",
"sendDate": "Thu, 04 Nov 2010 00:31:32 GMT",
"groupID": 119420,
"message": "way faster then "
}
{
"_id": "4cd261a69ac0900c2400000a",
"senderEIndividualID": "B656641",
"senderFirstName": "Steve",
"senderLastName": "Chen",
"sendDate": "Thu, 04 Nov 2010 00:32:54 GMT",
"groupID": 119420,
"message": "ohhhhh"

{
"_id": "4cd2621b9ac0900c2400001f",
"senderEIndividualID": "1B0D6E8",
"senderFirstName": "Danial",
"senderLastName": "Tally",
"sendDate": "Thu, 04 Nov 2010 00:34:51 GMT",
"groupID": 119420,
"message": "I think it made"
}


User Data 
{ "_id": 4252992,
 "events" : {
 "all_ids": [ 116706, 179487, 16389, 827496 ],
 "curr_ids": [ 827496 ],
 },
 "nns" : [
 [ 2816442, 0.2 ],
 [ 1615962, 0.047619047619047616 ],
 ],
 "facebook" : {
 "_id" : 4808871, 
 "name" : "Danial Tally",
 "location" : "Melbourne, Victoria",
 "mutualfriends" : [ 56341525, 262659792 ],
 "allfriends" : [ 56341525, 562526567, 262659792 ],
 },
}

Sunday, October 5, 2014

Performance Tuning on MongoDB

If your writing huge amount of data to MongoDB it is really important to have a proper performance tuning method with that. I will discuss one of the performance tuning method, but this might not be the only one

The explain() method returns a document that describes the process used to return the query results. If the mango has a collection name inventory and inventory is having 1500000 records, and this how it explain

db.inventory.find().explain()

{
    "cursor" : "BasicCursor",
    "isMultiKey" : false,
    "n" : 1500000,
    "nscannedObjects" : 1500000,
    "nscanned" : 1500000,
    "nscannedObjectsAllPlans" : 1500000,
    "nscannedAllPlans" : 1500000,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 22,
    "indexBounds" : {},
    "server" : "myServerIp"
}

cursor displays BasicCursor to indicate a collection scan. n displays 1500000 to indicate that the query matches and returns three document.
nscanned and nscannedObjects display 1500000 to indicate that MongoDB had to scan ten documents 

db.inventory.find({"verb":"getInventoryPrice"}).explain()

{
    "cursor" : "BasicCursor",
    "isMultiKey" : false,
    "n" : 302,
    "nscannedObjects" : 1500000,
    "nscanned" : 1500000,
    "nscannedObjectsAllPlans" : 1500000,
    "nscannedAllPlans" : 1500000,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 65,
    "indexBounds" : {},
    "server" : "myServerIp"
}

The difference between the number of matching documents and the number documents scanned may suggest that, to improve efficiency, the query might benefit from the use of an index.

Adding Indexes
Good performance starts with Indexing
Indexes can improve the performance by 2 to 3 times or 1000 ms query down to <1ms, but that is for good indexing

Key commands on indexing

db.inventory.ensureIndex( { verb: 1 } )
db.inventory.dropIndex( { verb: 1 } )

ascending 1
descending -1


db.inventory.getIndexKeys()
{
    "0" : {
        "_id" : 1
    },
    "1" : {
        "verb" : 1
    }
}
db.inventory.getIndexes()
{
    "0" : {
        "v" : 1,
        "key" : {
            "_id" : 1
        },
        "ns" : "inventory.inventory",
        "name" : "_id_"
    },
    "1" : {
        "v" : 1,
        "key" : {
            "verb" : 1
        },
        "ns" : "inventory.inventory",
        "name" : "verb_1"
    }
}

scanAndOrder is a very bad thing in MongoDB
MongoDB sorts documents in-memory is very very expensive and with out an index large result sets can be rejected with an error

Explain After the above change

{
    "cursor" : "BtreeCursor verb_1",
    "isMultiKey" : false,
    "n" : 302,
    "nscannedObjects" : 302,
    "nscanned" : 302,
    "nscannedObjectsAllPlans" : 302,
    "nscannedAllPlans" : 302,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 2,
    "indexBounds" : {
        "verb" : [ 
            [ 
                "getInventoryPrice", 
                "getInventoryPrice"
            ]
        ]
    },
    "server" : "myServerIp"
}
BtreeCursor indicates that the query used an index. The cursor includes name of the index. When a query uses an index, the output of explain() includes indexBounds details.

Compare the Performance
To manually compare the performance of a query using more than one index, you can use the hint() method in conjunction with the explain() method.
db.inventory.find({"verb":"getInventoryPrice"}).explain().hint({"verb": 1})
{
    "cursor" : "BtreeCursor verb_1",
    "isMultiKey" : false,
    "n" : 302,
    "nscannedObjects" : 302,
    "nscanned" : 302,
    "nscannedObjectsAllPlans" : 302,
    "nscannedAllPlans" : 302,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 0,
    "indexBounds" : {
        "verb" : [ 
            [ 
                "getInventoryPrice", 
                "getInventoryPrice"
            ]
        ]
    },
    "server" : "myServerIp"
}

The ordering of fields in an Index should be
fields where extract exact values
fields where will sort
fields where query for range vallues like $in, $gt, $lt...Etc

Need to be very careful if you have
Many query patterns
Use mongoDb like RDBMS
Have many indexes for each collection

How to identify problematic quires by profiling
// Will profile all queries that take 100 ms
db.setProfilingLevel(1, 100);

// Will profile all queries
db.setProfilingLevel(2);

// Will disable the profiler
db.setProfilingLevel(0);

// Find the most recent profile entries
db.system.profile.find().sort({$natural:-1});
    
// Find all queries that took more than 5ms
db.system.profile.find( { millis : { $gt : 5 } } );
    
// Find only the slowest queries
db.system.profile.find().sort({millis:-1});

db.system.profile.find({op:{$in: ["query", "update", "command"]}})


So far we have discussed how to do indexing in manual methods. Now lets discuss about some easy ways
DEX- Dex is a MongoDB performance tuning tool that compares queries to the available indexes in the queried collection(s) and generates index suggestions based on simple heuristics. Currently you must provide a connection URI for your database.
https://github.com/mongolab/dex/blob/master/README.md

Identify Indexing with DEX
A Indexed query helps performance by several ways. The trick is to identify an ideal indexes. Even for experts, hand-crawling through the MongoDB logs for slow queries is a laborious process.
Dex helps you in this. Dex is a tool that looks at your slow queries and tells you exactly which indexes you need to make those queries fast.

Running Dex is easy! For up-to-date instructions, refer to our README on GitHub. However, what you do with Dex's recommendations depends on your situation.

Install Dex
>pip install dex # in sudo mode or use sudo pip install dex

How to find the Location on EC2

> vi /etc/mongod.conf

Location of mongodb log can be found with logpath=/var/log/mongo/mongod.log

dex -f /var/log/mongo/mongod.log mongodb://ec2-<myec2ip>.ap-southeast-2.compute.amazonaws.com:<mongo port>/inventory

> dex -f /var/log/mongo/mongod.log mongodb://ec2-<myec2ip>.ap-southeast-2.compute.amazonaws.com:<mongo port>/inventory

{
    'runStats': {
        'linesRecommended': 150,
        'linesProcessed': 169,
        'linesPassed': 66594
    },
    'results': [
        {
            'queryMask': '{"$query":{"verb":"<val>"}}',
            'namespace': 'mydb.inventory',
            'recommendation': {
                'index': '{"date": 1}',
                'namespace': 'mydb.inventory',
                'shellCommand': 'db["inventory"].ensureIndex({"date": 1}, {"background": true})'
            },
            'details': {
                'count': 95,
                'totalTimeMillis': 10676,
                'avgTimeMillis': 112
            }
        },
        ...............
    ]
}


Filter by query time (millis)
dex -f /var/log/mongo/mongod.log mongodb://ec2-<myec2ip>.ap-southeast-2.compute.amazonaws.com:<mongo port>/inventory -s 400

In more details 
dex -f /var/log/mongo/mongod.log mongodb://ec2-<myec2ip>.ap-southeast-2.compute.amazonaws.com:<mongo port>/inventory -v

Even Dex guide to find the required Indexes final decision in on you to find the required one



References
http://mongolab.org/dex/
http://blog.mongolab.com/2012/06/cardinal-ins/