Timely Portfolio: 2015

Friday, May 22, 2015

visNetwork, Currencies, and Minimum Spanning Trees

Just because I’m ignorant doesn’t mean I won’t try things. Feel free to correct any ignorance that follows. More than anything I would like to feature the new htmlwidget visNetwork. I thought the example from Minimum Spanning Trees in R applied to currency data (similar to this research paper Minimum Spanning Tree Application in the Currency Market) would be a good way to demonstrate this fancy new widget. We’ll grab the currency data from FRED using quantmod code from this old post Eigen-who?.

Code


# get MST using code from this post
#   https://mktstk.wordpress.com/2015/01/03/minimum-spanning-trees-in-r/

library(quantmod)
# #get currency data from the FED FRED data series
Korea <- getSymbols("DEXKOUS",src="FRED",auto.assign=FALSE) #load Korea
Malaysia <- getSymbols("DEXMAUS",src="FRED",auto.assign=FALSE) #load Malaysia
Singapore <- getSymbols("DEXSIUS",src="FRED",auto.assign=FALSE) #load Singapore
Taiwan <- getSymbols("DEXTAUS",src="FRED",auto.assign=FALSE) #load Taiwan
China <- getSymbols("DEXCHUS",src="FRED",auto.assign=FALSE) #load China
Japan <- getSymbols("DEXJPUS",src="FRED",auto.assign=FALSE) #load Japan
Thailand <- getSymbols("DEXTHUS",src="FRED",auto.assign=FALSE) #load Thailand
Brazil <- getSymbols("DEXBZUS",src="FRED",auto.assign=FALSE) #load Brazil
Mexico <- getSymbols("DEXMXUS",src="FRED",auto.assign=FALSE) #load Mexico
India <- getSymbols("DEXINUS",src="FRED",auto.assign=FALSE) #load India
USDOther <- getSymbols("DTWEXO",src="FRED",auto.assign=FALSE) #load US Dollar Other Trading Partners
USDBroad <- getSymbols("DTWEXB",src="FRED",auto.assign=FALSE) #load US Dollar Broad
#combine all the currencies into one big currency xts
currencies<-merge(Korea, Malaysia, Singapore, Taiwan,
                  China, Japan, Thailand, Brazil, Mexico, India,
                  USDOther, USDBroad)

currencies<-na.omit(currencies)

colnames(currencies)<-c("Korea", "Malaysia", "Singapore", "Taiwan",
                        "China", "Japan", "Thailand", "Brazil", "Mexico", "India",
                        "USDOther", "USDBroad")
#get daily percent changes
currencies <- currencies/lag(currencies)-1  
currencies[1,] <- 0

cor.distance <- cor(currencies)
corrplot::corrplot(cor.distance)

library(igraph)
g1 <- graph.adjacency(cor.distance, weighted = T, mode = "undirected", add.colnames = "label")
mst <- minimum.spanning.tree(g1)
plot(mst)

library(visNetwork)
mst_df <- get.data.frame( mst, what = "both" )
visNetwork( 
  data.frame(
    id = 1:nrow(mst_df$vertices) 
    ,label = mst_df$vertices
  )
  , mst_df$edges
) %>%
  visOptions( highlightNearest = TRUE, navigation = T )

Wednesday, March 11, 2015

Extracting Heatmap

Inspired by this tweet, I wanted to try to do something similar in JavaScript.

Very cool hack: Extracting the original data from a heatmap image with R vector ops #rstats http://t.co/Lbi6FCXdrI pic.twitter.com/LCabkMGjXY
— Gregory Piatetsky (@kdnuggets) March 6, 2015

Fortunately, I had this old post Chart from R + Color from Javascript to serve as a reference, and I got lots of help from these links.

In a couple of hours, I got this crude but working rendering complete with a d3.js brush to get the scale. Then since this is sort of a finance blog, I imagined we found an old correlation heatmap like the one in Pretty Correlation Map of PIMCO Funds. Although, we could guess at the correlation values, I thought it would be a lot more fun to get live values. Try it out below.

Brush over the scale / legend
Input scale min and max
Mouseover color areas in the chart

As I said, it is rough, but it works. It needs a little UI work :)

Thursday, March 5, 2015

Is Time Series Clustering Meaningless? (lots of dplyr)

A kind reader directed me in a comment on Experiments in Time Series Clustering to this paper.

Clustering of Time Series Subsequences is Meaningless: Implications for Previous and Future Research

Eamonn Keogh and Jessica Lin

Computer Science & Engineering Department University of California – Riverside

http://www.cs.ucr.edu/~eamonn/meaningless.pdf

As I said in my last post, I don’t know what I’m doing, so I have no basis for discussing or arguing time series clustering. After reading the paper a couple of times, I think I understand their points, and I do not think what I am doing is “meaningless”. In their financial time series examples, they use prices and speak of trying to find patterns. I simply want to classify which years are most alike by various characteristics, such as autocorrelation of returns not prices, distribution of returns, and all sorts of other classifiers.

More than anything this whole exercise gave me a good excuse to dig much, much deeper. Iongtime readers might be wondering where are the interactive plots. I wanted to share what I have done so far hoping that readers might elaborate, argue, or point me in good directions.

Regardless of your interest in time series clustering, you might enjoy the dplyr and piping that I used to generate the results. Also, I have not seen dplyr do applied to autocorrelation ACF, so you might want to check that out in the last snippet of code.

All of the code for this post and last post is in this Github repo.


library(TSclust)
library(quantmod)
library(dplyr)
library(pipeR)
library(tidyr)

sp5 <- getSymbols("^GSPC",auto.assign=F,from="1900-01-01")[,4]

sp5 %>>%
  # dplyr doesn't like xts, so make a data.frame
  (
    data.frame(
      date = index(.)
      ,price = .[,1,drop=T]
    )
  ) %>>%
  # add a column for Year
  mutate( year = as.numeric(format(date,"%Y"))) %>>%
  # group by our new Year column
  group_by( year ) %>>%
  # within each year, find what day in the year so we can join
  mutate( pos = rank(date) ) %>>%
  mutate( roc = price/lag(price,k=1) - 1 ) %>>%
  # can remove date
  select( -c(date,price) ) %>>%
  as.data.frame %>>%
  # years as columns as pos as row
  spread( year, roc ) %>>%
  # remove last year since assume not complete
  ( .[,-ncol(.)] ) %>>%
  # remove pos since index will be same
  select( -pos ) %>>%
  # fill nas with previous value
  na.fill( 0 ) %>>%
  t %>>%
(~sp_wide) %>>%
  # use TSclust diss; notes lots of METHOD options
  diss( METHOD="ACF" ) %>>%
  hclust %>>%
(~hc) %>>%
  ape::as.phylo() %>>% 
  treewidget #%>>%
  #htmlwidgets::as.iframe(file="index.html",selfcontained=F,libdir = "./lib")

library(lattice)
library(ggplot2)
# get wide to long the hard way
#  could have easily changed to above pipe to save long
#  as an intermediate step
#  but this makes for a fun lapply
#  and also we can add in our cluster here
sp_wide %>>%
  (
    lapply(
      rownames(.)
      ,function(yr){
        data.frame(
          year = as.Date(paste0(yr,"-01-01"),"%Y-%m-%d")
          ,cluster = cutree(hc,10)[yr]
          ,pos = 1:length(.[yr,])
          ,roc = .[yr,]
        )
      }
    )
  ) %>>%
  (do.call(rbind,.)) %>>%
(~sp_long)


sp_long %>>%
  ggplot( aes( x = roc, group = year, color = factor(cluster) ) ) %>>%
  + geom_density() %>>%
  + facet_wrap(  ~ cluster, ncol = 1 ) %>>%
  + xlim(-0.05,0.05) %>>%
  + labs(title='Density of S&P 500 Years Clustered by TSclust') %>>%
  + theme_bw() %>>%
  # thanks to my friend Zev Ross for his cheatsheet
  + theme( plot.title = element_text(size=15, face="bold", hjust=0) ) %>>%
  + theme( legend.position="none" )  %>>%
  + scale_color_brewer( palette="Paired" )


# explore autocorrelations
sp5 %>>%
  # dplyr doesn't like xts, so make a data.frame
  (
    data.frame(
      date = index(.)
      ,price = .[,1,drop=T]
    )
  ) %>>%
  # add a column for Year
  mutate( year = as.numeric(format(date,"%Y"))) %>>%
  # group by our new Year column
  group_by( year ) %>>%
  # within each year, find what day in the year so we can join
  mutate( pos = rank(date) ) %>>%
  mutate( roc = price/lag(price,k=1) - 1 ) %>>%
  # can remove date
  select( -c(date,price) ) %>>%
  as.data.frame %>>%
  # years as columns as pos as row
  spread( year, roc ) %>>%
  # remove last year since assume not complete
  ( .[,-ncol(.)] ) %>>% t -> sP

sp_long %>>%
  group_by( cluster, year ) %>>%
  do(
    . %>>%
    (
      clustd ~ 
      acf(clustd$roc,plot=F) %>>%
        (a ~
          data.frame(
            cluster = clustd[1,2]
            ,year = clustd[1,1]
            ,lag = a$lag[-1]
            ,acf = a$acf[-1]
          )
        )
    )
  ) %>>%
  as.data.frame %>>%
  ggplot( aes( x = factor(cluster), y = acf, color = factor(cluster) ) ) %>>%
    + geom_point() %>>%
    + facet_wrap( ~lag, ncol = 4 ) %>>%
    + labs(title='ACF of S&P 500 Years Clustered by TSclust') %>>%
    + theme_bw() %>>%
    # thanks to my friend Zev Ross for his cheatsheet
    + theme(
        plot.title = element_text(size=15, face="bold", hjust=0)
        ,legend.title=element_blank()
    ) %>>%
    + theme(legend.position="none")  %>>%
    + scale_color_brewer(palette="Paired")

If you’ve made it this far, I would love to hear from you.

Monday, March 2, 2015

Experiments in Time Series Clustering

Last night I spotted this tweet about the R package TSclust.

Thank you Pablo and Jose for #TSclust - time series clustering package in #rstats ! http://t.co/GBQtQnQ8Lr
— Pasha Roberts (@pasharoberts) March 2, 2015

I should start by saying that I really don’t know what I’m doing, so be warned. I thought it would interesting to apply TSclust to the S&P 500 price time series. I took the 1-day simple rate of change, grouped by year with dplyr, and then indexed by the day of the year all in one pipeR pipeline. Since the TSclust paper

TSclust: An R Package for Time Series Clustering

Journal of Statistical Software, Volume 62, Issue 1

November 2014

http://www.jstatsoft.org/v62/i01/paper

demonstrates interoperability with hclust in their OECD interest rate example ( Section 5.2 ), I thought I could visualize the results nicely with treewidget from the epiwidgets package. Just because the htmlwidget was designed for phylogeny doesn’t mean we can’t use it for finance. Here is the result.

For reference and searching, I’ll copy the code below, but all of this can be found in this Github repo.


library(TSclust)
library(quantmod)
library(dplyr)
library(pipeR)
library(tidyr)
library(epiwidgets)

sp5 <- getSymbols("^GSPC",auto.assign=F,from="1900-01-01")[,4]

sp5 %>>%
  # dplyr doesn't like xts, so make a data.frame
  (
    data.frame(
      date = index(.)
      ,price = .[,1,drop=T]
    )
  ) %>>%
  # add a column for Year
  mutate( year = as.numeric(format(date,"%Y"))) %>>%
  # group by our new Year column
  group_by( year ) %>>%
  # within each year, find what day in the year so we can join
  mutate( pos = rank(date) ) %>>%
  mutate( roc = price/lag(price,k=1) - 1 ) %>>%
  # can remove date
  select( -c(date,price) ) %>>%
  as.data.frame %>>%
  # years as columns as pos as row
  spread( year, roc ) %>>%
  # remove last year since assume not complete
  ( .[,-ncol(.)] ) %>>%
  # remove pos since index will be same
  select( -pos ) %>>%
  # fill nas with previous value
  na.fill( 0 ) %>>%
  t %>>%
  # use TSclust diss; notes lots of METHOD options
  diss( METHOD="ACF" ) %>>%
  hclust %>>%
  ape::as.phylo() %>>% 
  treewidget

Tuesday, February 3, 2015

Financial Charts | Pan and Zoom

The htmlwidget for Week 2 over at Building Widgets claims to add pan and zoom interactivity to almost all R charts. Since their were no tests on financial charts, I thought I would try it out on a couple. It really does work.

Here is an example on an efficient frontier plotted from fPortfolio.

When we combine pipeR and htmlwidgets, we get a solid result from what I think is fairly elegant and understandable code.

svgPanZoom(
  svgPlot({
    returns %>>%
      (cumprod( 1 + . )) %>>%
      (.[endpoints(.,"months")]) %>>%
      ( ./lag(.,k=1) - 1 ) %>>%
      chart.SnailTrail(
        colorset = RColorBrewer::brewer.pal(9,"Set1")[-6]
        ,add.names="none"
        ,width = 36
        ,step = 36
        ,legend.loc = "topright"
      )
  },height= 10, width = 16)
)

An even more challenging test was chartSeries, and svgPanZoom still passed the test beautifully. See if it works on your machine.


getSymbols("SPY")
svgPanZoom(svgPlot({chartSeries(SPY)},width = 12, height = 8))

If you would like to reproduce the plots, all the code is in this Gist.

Friday, January 2, 2015

Will I fail?

I have committed to building an htmlwidget a week in 2015. To isolate and separate the commitment from this blog, I set up a new site Building Widgets and Github repo. The first post Can I Commit? provides meta introspection on commitment.

Can I commit to building an htmlwidget a week in the year 2015?

It seems we humans all struggle internally with commitment, and at the beginning of each year, we often become even more aware of this struggle in the form of New Year's Resolutions. This site is not really a New Year's Resolution. It is more a resolution that coincidentally falls at the beginning of the year, since htmlwidgets was released December 17.

I know through plenty of experiences with commitment failure that the pattern of commitment failure will assert itself throughout the life of this project… Building Widgets “Can I Commit?”

I promise this will be the only crosspost. Any future posts on this blog about htmlwidgets will only be application of the widgets, most likely for finance.