Redis : Basics

Recently I have been playing around with Redis. In particular I have been trying how to integrate Redis with web services in Azure.

Redis is actually more like a data structures server, supporting very interesting data structures, and operations on them.

I am listing down some data structures, and corresponding when using these data structures.

  1. Strings
  2. Lists
  3. Sets
  4. Sorted Sets
    • Useful when fast access to the middle of a large collection of elements is important
  5. Hashes
    • Hashes are maps between string fields and string values, so they are the perfect data type to represent objects.
  6. Bit arrays
  7. Hyper Log Log

 

DECR, DECRBY, DEL, EXISTS, EXPIRE, GET, GETSET, HDEL, HEXISTS, HGET, HGETALL, HINCRBY, HKEYS, HLEN, HMGET, HMSET, HSET, HVALS, INCR, INCRBY, KEYS, LINDEX, LLEN, LPOP, LPUSH,LRANGE, LREM, LSET, LTRIM, MGET, MSET, MSETNX, MULTI, PEXPIRE, RENAME, RENAMENX, RPOP, RPOPLPUSH, RPUSH, SADD, SCARD, SDIFF, SDIFFSTORE, SET, SETEX, SETNX, SINTER, SINTERSTORE, SISMEMBER, SMEMBERS, SMOVE, SORT, SPOP, SRANDMEMBER, SREM, SUNION, SUNIONSTORE, TTL, TYPE, ZADD,ZCARD, ZCOUNT, ZINCRBY, ZRANGE, ZRANGEBYSCORE, ZRANK, ZREM, ZREMRANGEBYSCORE, ZREVRANGE, ZSCORE

 

Code on GitHub:

References:

Web Services : Data flow between the UI view and the backend APIs

These two articles helped me a lot in understanding how to handle data passing between the UI view and the backend APIs.

References:

  1. https://developer.mozilla.org/en-US/docs/Web/Guide/HTML/Forms/Sending_and_retrieving_form_data
  2. http://www.asp.net/web-api/overview/advanced/sending-html-form-data-part-1
  3. http://carehart.org/blog/client/index.cfm/2007/1/2/form_self_post

 

Azure Machine Learning : Model Retraining

References:

  1. https://azure.microsoft.com/en-us/documentation/articles/machine-learning-retrain-models-programmatically/
  2. https://azure.microsoft.com/en-us/blog/retraining-and-updating-azure-machine-learning-models-with-azure-data-factory/
  3. https://gallery.cortanaanalytics.com/Tutorial/No-code-Batch-Scoring-and-Retraining-1
  4. https://azure.microsoft.com/en-us/documentation/articles/machine-learning-consume-web-services/

Async Await in Web Service Design

Have been wanting to understand the Async Await Pattern in all its glory. Recently got the opportunity to finally jump head on

Tips:

  • Async Await & Threading :
    • Async methods are intended to be non-blocking operations. An await expression in an async method doesn’t block the current thread while the awaited task is running. Instead, the expression signs up the rest of the method as a continuation and returns control to the caller of the async method. The async and await keywords don’t cause additional threads to be created. Async methods don’t require multithreading because an async method doesn’t run on its own thread. The method runs on the current synchronization context and uses time on the thread only when the method is active. You can use Task.Run to move CPU-bound work to a background thread, but a background thread doesn’t help with a process that’s just waiting for results to become available.
    • https://msdn.microsoft.com/en-us/library/hh191443.aspx

 

  • UI thread of an ASP.NET application needs special handling. Else there could be deadlock issues
    • http://blog.ciber.no/2014/05/19/using-task-configureawaitfalse-to-prevent-deadlocks-in-async-code/
    • Interestingly I found Console Apps can automatically handle this issue by spawning a different thread.  But ASP.NET Apps require you to explicitly put in the .ConfigureAwait(false)
    • e.g. result = await DoSomeTask().ConfigureAwait(false)
    • The reason the Console Apps behave differently is very well explained in the link below :
      • All of the UI application types you can create in Visual Studio will end up having a special SynchronizationContext published on the UI thread. Windows Forms, Windows Presentation Foundation, Metro style apps… they all have one. But there’s one common kind of application that doesn’t have a SynchronizationContext: console apps.
      • http://blogs.msdn.com/b/pfxteam/archive/2012/01/20/10259049.aspx

 

  • For debugging : Use Debug.WriteLine and use the ‘Output’ windows in VS

 

Code:

References:

Web Services : Learning POST by Example. Invoking Cognitive Service APIs by Microsoft.

i was recently working with some Computer Vision APIs from Microsofts Project Oxford. I was especially interested in the vision apis

https://www.projectoxford.ai/demo/vision#Analysis

As i write this post on Web Services POST .. pun unintended – the interesting thing I noticed working with this vision stuff is how simple they have made the whole process of invoking their web service via POST messages.

Requests:

  1. Request URL
  2. Request Parameters
  3. Request Headers
  4. Request Body

Response:

  1. Response 200
  2. Response 400
  3. Response 415
  4. Response 500.

These folks have even provided code samples in several languages :

  1. Curl
  2. C#
  3. Java
  4. JavaScript
  5. ObjC
  6. PHP
  7. Python
  8. Ruby

References:

–> very helpful. check it out

Code

 

 

 

 

 

 

 

Web Services : Understanding C# HttpClient

C# has a HttpClient class, which should be easier to use, support async programming and allow users to set any headers without reverting to some workaround code (cf. WebClient and its underlying classes).

It is necessary to understand this class properly in order to do serious coding for Web Service development.

References:

  1. http://d-fens.ch/2014/04/12/httpclient-and-how-to-use-headers-content-type-and-postasync/
  2. http://stackoverflow.com/questions/10679214/how-do-you-set-the-content-type-header-for-an-httpclient-request
  3. https://msdn.microsoft.com/en-us/library/hh944521(v=vs.118).aspx
  4. https://www.jayway.com/2012/03/13/httpclient-makes-get-and-post-very-simple/

 

Learning Curves. What to try next in ML ?

A very interesting problem in ML is : What to try next ?  Andrew Ng has some very interesting insights on this topic. (See the reference section below)

  • Nowadays most ML platforms, e.g. AzureML give the ability to do parameter sweeps.
    • Most of the time they also do cross validation when doing sweeps.
    • This simplifies model selection, the platforms will automatically select the parameters during cross validation which give the best accuracy/AUC on the cross validation dataset.
    • This is usually the 1st thing to do for pretty much all ML problems.

 

  • However, an interesting question still remains esp from a practical standpoint –
    • Should I focus more on feature engineering i.e. add more features.  OR Should I focus more on getting more data
    • For these cases I would generally use learning curves.
    • There are some nuances. So let me explain what I usually do.

 

  • Plot of Training Error v/s Cross Validation Error.
    • This usually indicates whether I am currently suffering from a high bias (underfit) problem or a high variance (overfit) problem.
    • High Bias (underfit):
      • high training error. high generalization (CV) error
    • High Variance (overfit):
      • low training error. high generalization (CV) error

learningcurve

  • High Variance (Overfitting) : Plot how the Error / Accuracy varies with increasing data.
    • A good idea here is to use log-base2 scale on the x-axis.
    • Using a log-base-2 scheme gives a good sense of how much the Error/Accuracy with decrease/increase with more data

logplot

  • Based on the intuition above the following steps can be taken. 

 

What to Try next ?

Underfit (high bias)

Overfit (high variance)

Getting More Training Examples

No

Yes

Try smaller set of features

No

Yes. But first see if you can get more training examples.

Additional features

Yes

Maybe. If we get a feature that gives a strong signal then yes add it. But also invest in more data collection in parallel.

 

Code:

 

References:

  1. https://class.coursera.org/ml-005/lecture