Today I'm posting about keyword extraction with Lexical Chains - it's something that I first looked into during college but which has resurfaced recently and I used it for a couple of projects. The original paper I read is a couple of years old, it's called "Efficient text summarization using lexical chains" and was written by Gregory Silber and Kathleen McCoy in 2000. A link to the text can be found here.
Recently, I've been doing a little bit of JavaScript and have enjoyed playing with jQuery. I don't write JS very often, so maybe what I'm describing in this blog entry is common knowledge to all of you (i beg your pardon...), but it took me the better part of an afternoon, so I figured I'll share my findings...
Looking at my portfolio today, I think I've done pretty well sticking with Ben Graham's strategy of value investing. Given, I started building out my portfolio in March 2008 and investing 60% of my liquid assets prior to October/November 2008, the results look pretty good. Overall, I'm up from March 2008. While I usually stick to long term investment, I unloaded YHOO at 14.5$ and a couple of other companies that I felt wouldn't be worth investing in given either bad business decisions or warning signals in their fundamentals such has a high debt-to-equity ratio - which for me is a no-go.
I am looking at a couple of companies that make up the "risky" part of my portfolio - these include the golf club shaft manufacturer "ALDA", a communications firm "ATGN" and the mobile service provider "LVWR". The financials of these companies might not be super impressive but given liquid asset / market cap ratio and the growth opportunities in their respective sectors these companies seemed like a bargain. I'm still firmly believing in two companies - pharma-empire "MRK", which compared to it's sector seems undervalued and "MKL" an insurance company that has very savvy management and seems to be fairly cheap as well. Let's talk about the "bad" investments I've made - I had bet strongly on BRK-B - will continue to do so. I think Buffet got a good deal with GS and this will be reflected when he sells the warrants. There I'm down quite a bit. Also with BAC & GE I'm down. UAHC (health care) is down - though I believe they know how to deal with the government (Medicare partner) so should Obama's plan come through (healthcare reform) - they'll have a huge advantage...
I'm up with SAP (software) and FMS. FMS is a company that "owns" the dialysis market - it seems to be still very cheap. That's my biggest bet - I figure that people will always have health problems associated with the wrong diet - there seems to be a correlation between obesity (which as we know is steadily increasing) and kidney failures. This is backed by a lot of research (#1, #2). What gives me hope for FMS is that people who undergo dialysis want to trust the equipment and facilities - it's not like you're eating at a new restaurant and worst case you'll get some old french fries... A friend who is on dialysis has also confirmed that FMS is a really great company with awesome customer service and highly skilled staff. FMS often owns the entire "supply-chain", from manufacturing to "sales" and "maintenance" - they manufacture the equipment such as dialysis machines, they operate the dialysis centers and train the staff - people feel safe and return.
--
DISCLAIMER: I am NOT giving any financial advise!References:
#1 American Society Of Nephrology. "Obesity Triples The Risk Of
Chronic Kidney Failure." ScienceDaily 13 May 2006. 22 September 2009
<http://www.sciencedaily.com /releases/2006/05/060513122553.htm>#2 Journal of the American Society of Nephrology, "Obesity: What Does
It Have to Do with Kidney Disease?", 2004, 22 September 2009
<http://jasn.asnjournals.org/cgi/content/full/15/11/2768>
I have recently gotten more into R - I'm loving it.
R is really cool - apart from being an awesome free tool for all sorts of calculations - it allows rapid analysis / visualization of small to mid sized data sets (1 to sizeof(data) < mem , which is roughly in the millions, depending on the type of data). It takes a bit to get used to the mathy way of mapping names to vectors - but it's really powerful!
While R can do almost anything (there are so many libraries available!), I do my preprocessing via the unix command line on my MBP. "tr", "awk", "cut" and "sed" are invaluable.
Let's assume we're hosting a web-app primarily used by mobile users and magically, we know the connection speeds at which content is delivered to our customers. It's weird, because some of our customers complain about page loading and others don't. We don't have much data but we know the average download speed and the maximum download speed of our clients. Also we manually added the zip-code for each client.
user1,765.3,1498.2,66333 user2,882.9,1200.0,66342 user3,901.2,980.8,77878 user4,587.2,640,77879 user5,1327.5,1924.4,77878 user6,45.2,55.3,23923 user7,22.2,58.3,99993 user8,29.3,44.9,92399 user9,13.3,19.4,23923 user10,12.4,45.3,99992 user11,12.2,23.2,99994 user13,11.4,22.9,99992 user14,66.1,69.9,99972
If you copy this file to your disk (e.g. to "/tmp/conn_speeds.csv"), you can do the following to load the data into R.
>x <- read.table(file='/Users/florian/BLOG/conn_speed',sep=',')
Then try to run some basic statistical analysis over the data:
>summary(x)
V1 V2 V3 V4
user1 :1 Min. : 11.4 Min. : 19.4 Min. : 23923
user10 :1 1st Qu.: 13.3 1st Qu.: 44.9 1st Qu.: 66342
user11 :1 Median : 45.2 Median : 58.3 Median : 77879
user13 :1 Mean : 359.7 Mean : 506.4 Mean : 77423
user14 :1 3rd Qu.: 765.3 3rd Qu.: 980.8 3rd Qu.: 99992
user2 :1 Max. :1327.5 Max. :1924.4 Max. : 99994
(Other):7et voilà, there is some basic analysis.
Maybe we come up with the hypothesis that certain areas don't have 3G access yet, and hence there should be clusters around each zipcode with similar speeds. To verify this, we'll run kmeans over the average connection speed and then plot the zip & avg_speed where the color is determined by which cluster each data point lies in.
>zip <- x$V4 >avg_speed <- x$V2 >km <- kmeans(avg_speed,2) >plot(zip,avg_speed,col=km$cluster,lwd=4)

This is just one example of how useful r is!