802.3 v/s 803.11

This gives a nice overview of the differences between  Ethernet and Wifi at a protocol level.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.456.9874&rep=rep1&type=pdf

  • The crux of the problem is this :  “The CSMA/CD protocol is not used in a wireless environment due to the user has no capability to sense/listen to the channel for collision while sending the packet [12].
  • This necessitates things like Collision Avoidance techniques to be used for Wifi.  And that imposes limits on how fast you can transmit packets at a certain frequency band leading to slower speeds.

 

Making REST calls to send data to an Azure EventHub

I recently encountered a situation where I had to use pure REST Calls to send data to an Azure Event Hub.

Tips:

  • If you are used to using libraries (C#, Python) you will find that the libraries are doing a lot behind the scenes. Its not trivial to go from using the library to making pure REST calls
  • The first approach – using Fiddler to capture the traffic and re-purpose those calls – failed.
    • I am not sure why the calls fail to show up on fiddler. I tried out a few things like decrypt HTTPS and stuff. But I wasn’t able to get the sending traffic to show up on Fiddler
  • The references below give a good of how I made some progress.

REST Call to send data:

I finally got it to work with something like this:

POST https://simplexagpmeh.servicebus.windows.net/simplexagpmeh/messages?timeout=60&api-version=2014-01 HTTP/1.1
User-Agent: Fiddler
Authorization: SharedAccessSignature sr=http%3a%2f%2fsimplexagpmeh.servicebus.windows.net%2f&sig=RxvSkhotfGEwERdiaA8oLr7X9u5XLeDI8TCK5DhDPP8%3d&se=1476214239&skn=RootManageSharedAccessKey
ContentType: application/atom+xml;type=entry;charset=utf-8
Host: simplexagpmeh.servicebus.windows.net
Content-Length: 153
Expect: 100-continue

{ "DeviceId" : "ArduinoYun",
  "SensorData" : [ { "SensorId" : "awk",
        "SensorType" : "temperature",
        "SensorValue" : 24.5
      } ]
}

References:

Code:

 

Getting Started with Apache Kafka

Am doing some toe-dipping into Apache Kafka.

Linux:

Windows:

Commands from the Apache quick start documentation:

  • This gave me a good overview of what the system is doing.
> bin/zookeeper-server-start.sh config/zookeeper.properties
> bin/kafka-server-start.sh config/server.properties
> bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
> bin/kafka-topics.sh --list --zookeeper localhost:2181
> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
> bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
> cp config/server.properties config/server-1.properties
> cp config/server.properties config/server-2.properties
[Now edit these new files and set the following properties:
config/server-1.properties:
    broker.id=1
    listeners=PLAINTEXT://:9093
    log.dir=/tmp/kafka-logs-1
config/server-2.properties:
    broker.id=2
    listeners=PLAINTEXT://:9094
    log.dir=/tmp/kafka-logs-2]
> bin/kafka-server-start.sh config/server-1.properties &
> bin/kafka-server-start.sh config/server-2.properties &
> bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 1 --topic my-replicated-topic
> bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-replicated-topic
> bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic test
> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic my-replicated-topic
> bin/kafka-console-consumer.sh --zookeeper localhost:2181 --from-beginning --topic my-replicated-topic
> ps | grep server-1.properties
> kill -9 7564
[Leadership has switched to one of the slaves and node 1 is no longer in the in-sync replica set:]
> bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-replicated-topic
> bin/kafka-console-consumer.sh --zookeeper localhost:2181 --from-beginning --topic my-replicated-topic

Bandit Problems: an ‘Experiment Strategy’ or a ‘ML Algorithm’ ?

Do a simple search on Google –  ‘how do bandit algorithms work’ ?

Do the results look confusing ?  Some links (here1, here2) say they are better than A/B.  Then there are other links which say otherwise (here3, here4).

In fact, when one hears about Bandit Problems, there are couple of questions to think about:

Questions:

1.Is it an ‘Experiment Strategy’ ?

  • MAB gets compared with A/B tests. So is it an ‘experiment strategy’ like A/B testing ?

2. Is it an ‘ML Algorithm’ ?

  • Bandit algorithms select the most optimal ‘action’. So is it fundamentally an ML Algorithm ?
  • If yes, whats the relation between these ‘bandit problems’ v/s supervised ML algos like Logistic Regression and Decision Trees.

3. Where do the algorithms like epsilon-greedy, UCB etc fit into ?

 

Thoughts:

  • The correct way of looking at bandit problems is to think of it as an optimization problem for online interactive systems.
    • The goal of bandit algorithms is to select the best policy that will maximize rewards. The space of policies is extremely large (or infinite)
  • In literature, people have treated bandit problems in different settings:
    • Multi Armed Bandit setting
    • Contextual Bandit
  • Multi Armed Bandit setting.
    • In the MAB setting,  there are a few known approaches for selecting the best policy.
      • Naive
      • Epsilon-Greedy
      • Upper Confidence Bounds.
  • Contextual Bandit.
    • In one of my previous posts I  noted the ML reduction stack in VW for the contextual bandits problem. In a separate post, I have also noted some thoughts on the use of the IPS score for conterfactual evaluation.
    • In the Full Information Setting, the task of selecting the best policy is mapped to a cost-sensitive classification problem where:
      • context <-> example
      • action <-> label/class
      • policy <-> classifier
      • reward <-> gain / (negative) cost
    • Thereby, we can use known supervised techniques like Decision Trees, Logistic Regression etc. to solve the cost-sensitive classification problem.
      • This was an interesting insight for me, and helped me answer the question #2 above
    • In the Partial Information aka. Bandit setting, there would be two more issues we would like to handle
      • Filling in missing data.
      • Overcoming Bias.
  • The Partial Information aka. Bandit setting can further be looked into in 2 different ways:
    • Online.
      • In the online setting the problem has been solved in different ways
      • Epsilon-Greedy / Epoch Greedy [Langford & Zhang].
      • “Monster” Algorithm [Dudik, Hsu, Kale, Langford]
      • They mostly vary in how they optimize regret. And/Or computational efficiency.
    • Offline.
      • This is where Counterfactual evaluation and Learning comes in..
  • Bandit algorithms are not just an alternate ‘experiment strategy’ that is  ‘better’ or ‘worse’ than A/B tests.
    • The objectives behind doing an A/B test are different from the objectives of using a bandit system (which is to do continuous optimization).
  • Typic issues to consider for bandit problems:
    • Explore-Exploit
      • exploit what has been learned
      • explore to learn which behaviour might give best results.
    • Context
      • In the contextual setting (‘contextual bandit’) there are many more choice available. unlikely to see the same context twice.
    • Selection bias
      • the exploit introduces bias that must be accounted for
    • Efficiency.

References: