Java, Maven, Scala, SBT Concepts

Am toe dipping into maven. Trying to make sense of how maven fits in with  IDE, command line maven, POM files blah blah etc.

tips:

  • intellij will have default support within the IDE for both Maven and SBT.  So as long as we are not using mvn and sbt  from the command line we should be good.

java fundamental concepts:

maven:

base scala in intellij:

scala with SBT:

Ubuntu Learnings

Remote to Ubuntu 16.04 from Windows 10:

Copy On Select:

Install Java

profile and bashrc files

abgoswam@abgoswam-ubuntu:~$ ls .bash* -lh
-rw------- 1 abgoswam abgoswam 4.1K Oct 23 15:20 .bash_history
-rw-r--r-- 1 abgoswam abgoswam  220 Oct 21 22:44 .bash_logout
-rw-r--r-- 1 abgoswam abgoswam 3.7K Oct 21 22:44 .bashrc

abgoswam@abgoswam-ubuntu:~$ ls .profile* -lh
-rw-r--r-- 1 abgoswam abgoswam 655 Oct 21 22:44 .profile

abgoswam@abgoswam-ubuntu:~$ ls -lh /etc/profile*
-rw-r--r-- 1 root root  670 Oct 23 14:55 /etc/profile
-rw-r--r-- 1 root root  884 Oct 23 15:18 /etc/profile.save

/etc/profile.d:
total 20K
-rw-r--r-- 1 root root   40 Nov 30  2015 appmenu-qt5.sh
-rw-r--r-- 1 root root  101 Jun 29 12:03 apps-bin-path.sh
-rw-r--r-- 1 root root  663 May 18 02:19 bash_completion.sh
-rw-r--r-- 1 root root 1003 Dec 29  2015 cedilla-portuguese.sh
-rw-r--r-- 1 root root 1.9K Mar 16  2016 vte-2.91.sh

abgoswam@abgoswam-ubuntu:~$ ls -lh /etc/bash*
-rw-r--r-- 1 root root 2.2K Aug 31  2015 /etc/bash.bashrc
-rw-r--r-- 1 root root   45 Aug 12  2015 /etc/bash_completion

Gvim

Making Gvim auto open in new tab:

 

Spark:

Windowing Operations in Azure Stream Analytics

Windowing is a very common operation in stream analytics.

Beneath the surface, there is a whole bunch of complex data structuring that’s going on to support the windowing operations. I would love to dig deeper into these someday.

Example:

Here is an example of a query I wrote recently using windowing operators in azure stream analytics. It shows 3 interesting things :
1. Windowing
2. CTEs
3. Aggregation over string columns (using TopOne)

WITH ContextReward AS (
    SELECT 
        eventid,
        TopOne() OVER (ORDER BY [EventEnqueuedUtcTime] ASC) CR,
        MAX (reward) AS reward
    FROM Input
    GROUP BY eventid, HoppingWindow(Duration(hour, 2), Hop(hour, 1))
)

SELECT 
    reward,
    eventid, 
    CR.actionname AS actionname,
    CR.age AS age,
    CR.gender AS gender,
    CR.weight AS weight,
    CR.actionprobability
INTO OutputWindow
FROM ContextReward

SELECT * INTO Output FROM Input 
SELECT * INTO OutputCSV FROM Input

 

References:

802.3 v/s 803.11

This gives a nice overview of the differences between  Ethernet and Wifi at a protocol level.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.456.9874&rep=rep1&type=pdf

  • The crux of the problem is this :  “The CSMA/CD protocol is not used in a wireless environment due to the user has no capability to sense/listen to the channel for collision while sending the packet [12].
  • This necessitates things like Collision Avoidance techniques to be used for Wifi.  And that imposes limits on how fast you can transmit packets at a certain frequency band leading to slower speeds.

 

REST Calls in Python. JSON. Pandas.

I recently had to make REST calls in Python for sending data to Azure EventHub.

In this particular case I could not use the Python SDK to talk to EventHub. As I wrote down the code to make the raw REST calls, I came across several gems. Am listing them down below.

Tips:

  • Use the python ‘requests’ library.
    • i am yet to figure out how to make async calls. can i use this library for async as well or would I have to use something else
  • Sending JSON is way to go.
    • Don’t even try sending anything else
  • Pandas has great functionality to convert  Series/DataFrames to JSON.
    • the ‘to_json’ function has awesome functionality including orient by ‘records’ etc
  • Python has an awesome library called ‘json’ to deal with JSON data.
    • To deserialize ,use json.loads()
    • In particular,  to convert dict to JSON use  json.dumps().
    • Note: If you want to preserve the order, one would have to use ‘collections.OrderedDict’. Check this link

Check this out:


myj = '[{"reward":30,"actionname":"x","age":60,"gender":"M","weight":150,"Scored Labels":30.9928596354},{"reward":20,"actionname":"y","age":60,"gender":"M","weight":150,"Scored Labels":19.0217225957}]'

myj_l = json.loads(myj, object_pairs_hook=collections.OrderedDict)

myj_l
Out[177]:
[OrderedDict([(u'reward', 30), (u'actionname', u'x'), (u'age', 60), (u'gender', u'M'), (u'weight', 150), (u'Scored Labels', 30.9928596354)]),
 OrderedDict([(u'reward', 20), (u'actionname', u'y'), (u'age', 60), (u'gender', u'M'), (u'weight', 150), (u'Scored Labels', 19.0217225957)])]

for item in myj_l:
    print json.dumps(item)

{"reward": 30, "actionname": "x", "age": 60, "gender": "M", "weight": 150, "Scored Labels": 30.9928596354}
{"reward": 20, "actionname": "y", "age": 60, "gender": "M", "weight": 150, "Scored Labels": 19.0217225957}

References:

Code:

5 Skills.

One person recently pointed out to me that a strong ML team needs 5 skills.

I was glad he pointed it out so clearly. Sometimes its easy to know things, but hearing it from someone else and crystallizing it helps a lot.

The 5 skills (in no particular order) are:

  1. Research
  2. Engineering
  3. Data Science
  4. Program Management
  5. Systems Architecture.

The other interesting thing he mentioned which I really liked was about prioritization. Nowadays there is so much of cool technology out there that it becomes super important to do prioritization. This would give one a strong sense of direction.

Making REST calls to send data to an Azure EventHub

I recently encountered a situation where I had to use pure REST Calls to send data to an Azure Event Hub.

Tips:

  • If you are used to using libraries (C#, Python) you will find that the libraries are doing a lot behind the scenes. Its not trivial to go from using the library to making pure REST calls
  • The first approach – using Fiddler to capture the traffic and re-purpose those calls – failed.
    • I am not sure why the calls fail to show up on fiddler. I tried out a few things like decrypt HTTPS and stuff. But I wasn’t able to get the sending traffic to show up on Fiddler
  • The references below give a good of how I made some progress.

REST Call to send data:

I finally got it to work with something like this:

POST https://simplexagpmeh.servicebus.windows.net/simplexagpmeh/messages?timeout=60&api-version=2014-01 HTTP/1.1
User-Agent: Fiddler
Authorization: SharedAccessSignature sr=http%3a%2f%2fsimplexagpmeh.servicebus.windows.net%2f&sig=RxvSkhotfGEwERdiaA8oLr7X9u5XLeDI8TCK5DhDPP8%3d&se=1476214239&skn=RootManageSharedAccessKey
ContentType: application/atom+xml;type=entry;charset=utf-8
Host: simplexagpmeh.servicebus.windows.net
Content-Length: 153
Expect: 100-continue

{ "DeviceId" : "ArduinoYun",
  "SensorData" : [ { "SensorId" : "awk",
        "SensorType" : "temperature",
        "SensorValue" : 24.5
      } ]
}

References:

Code:

 

Ubuntu Linux Installation Troubleshoot

I tried installing Ubuntu 16.04 on my Lenovo X1 Carbon. It’s interesting how hard it was to install Linux on this machine.

USB stick:

  • Not possible! I spent several frustrating hours trying it out. Looks like the option to boot from a USB stick is not there in the BIOS.
  • The Ubuntu installation guide does mention that there might be issues installing from USB stick.
  • https://help.ubuntu.com/16.04/installation-guide/

NetBoot: