Wednesday, February 12, 2014

Avoiding XPath : Part III


(under construction)

Time sequenced lattice chart of  417,834 Windows Security log entries.

This cruft is similar to my post  Avoiding XPath : Part II.  However, here I am (laboriously) converting EVTX to CSV with Powershell 3.0. The files  are sizeable, taking about an hour to convert to CSV. The charts take a non-negligible amount of time to load on my i5 4GB laptop.


# Powershell Archive Security listing

a---          2/5/2012   3:18 PM  614993920 Archive-Security-2012-02-05-23-16-29-596.evtx
a---         2/10/2014   8:13 PM  422948414 Archive-Security-2012-02-05-23-16-29-596.evtx.csv
a---          2/7/2012   1:37 PM  614993920 Archive-Security-2012-02-07-21-35-54-163.evtx
a---         2/10/2014   9:04 PM  411968863 Archive-Security-2012-02-07-21-35-54-163.evtx.csv
a---         2/13/2012   9:25 AM  614993920 Archive-Security-2012-02-13-17-24-39-674.evtx
a---         2/10/2014  10:03 PM  425452871 Archive-Security-2012-02-13-17-24-39-674.evtx.csv
a---         2/13/2012   6:41 PM  614993920 Archive-Security-2012-02-14-02-40-08-681.evtx
a---         2/10/2014  10:11 PM   33347712 Archive-Security-2012-02-14-02-40-08-681.evtx.csv


I create an array of files (as below) and use a function to mass convert them:

[array]$filelist =
" Archive-Security-2012-02-05-23-16-29-596.evtx",
"Archive-Security-2012-02-07-21-35-54-163.evtx",
"Archive-Security-2012-02-13-17-24-39-674.evtx"

Function Convert-Logs3 {
[cmdletbinding()]
Param(
$filelist=$NULL
)
$filelist | foreach-object {
Get-WinEvent -Path "$PSItem"| Select RecordID,ID,TimeCreated, Message | export-csv -notypeinformation -path $(write "$PSItem.csv");
[System.gc]::collect();
}
}

I load one of the converted CSV files into R, parse the Message field and produce charts (far below). One of the newer laptops with fourth gen i7 and 16GB of DDR3 RAM would be useful. First I load the file and strip the Message file of tabs and returns.

# R
read.csv("Archive-Security-2012-02-05-23-16-29-596.evtx.csv", as.is=TRUE)
d <- read.csv("Archive-Security-2012-02-05-23-16-29-596.evtx.csv", as.is=TRUE)
h <- gsub('\n\t',' ',d$Message,fixed=TRUE)
h <- gsub('\n\n',' ',h,fixed=TRUE)
h <- gsub('\t\t',' ',h,fixed=TRUE)
h <- gsub('\n',' ',h,fixed=TRUE)
h <- gsub('\t',' ',h,fixed=TRUE)
h <- gsub('%%','',h,fixed=TRUE)
d$Message <- h

The result is below. Click to enlarge. I won't be parsing the Message field anymore in this post.


The R packages below are needed. Note the field names are different in an EVTX than in a live event log (RecordID, ID, Time Generated):

library(plyr) # 'count'
library(data.table) # 'data.table'
library(lattice) # 'barchart'
library(ggplot2)# 'arrange'

DF <- d
DT <- data.table(DF)
setkey(DT,RecordId)

DF_count <- data.frame(count(DT$Id))
arrange(DF_count,freq)

DF_arrange <- (arrange(DF_count,freq))
barplot(DF_arrange$freq,names.arg=(DF_arrange$x),xlab="Event IDs", ylab="Event IDs Count" )
barchart(~as.numeric(DF_arrange$freq) | as.factor(DF_arrange$x),xlab="Event Index")
dotplot(~as.numeric(DF$RecordId) | as.factor(DF$Id), xlab="Event Index")

DF4624 <- data.frame(DT[Id=="4624"])
DF4688 <- data.frame(DT[Id=="4688"])
DF5154 <- data.frame(DT[Id=="5154"])
DF5157 <- data.frame(DT[Id=="5157"])

DF4624_4688 <- merge.data.frame(DF4624,DF4688,all=TRUE)
DF5154_5157 <- merge.data.frame(DF5154,DF5157,all=TRUE)
DF4624_4688_5154_5157 <- merge.data.frame(DF4624_4688,DF5154_5157,all=TRUE)
DF_TG <- DF4624_4688_5154_5157
dotplot(~as.numeric(DF_TG$RecordId) | as.factor(DF_TG$Id), xlab="Event Index")

nrow(DF_TG)
DF_TG <- data.frame(DF_TG[16500:24500,1:3])
dotplot(~as.numeric(DF_TG$RecordId) | as.factor(DF_TG$Id), xlab="Events (Time)")

Here's the run:

> DF <- d
> DT <- data.table(DF)
> setkey(DT,RecordId)
>
> DF_count <- data.frame(count(DT$Id))
> arrange(DF_count,freq)
      x   freq
1  1100      1
2  1104      1
3  1105      1
4  4698      1
5  4699      1
6  4724      1
7  4738      1
8  4779      1
9  8222      1
10 4702      2
11 4904      2
12 4905      2
13 5140      2
14 5143      2
15  521      3
16 4647      3
17 4608      4
18 4610      4
19 4614      4
20 4902      4
21 4944      4
22 5024      4
23 5033      4
24 5056      4
25 5449      4
26 5450      4
27 1101      5
28 5142      5
29 4663      7
30 4625      8
31 4800     10
32 4801     10
33 5448     12
34 5031     13
35 4690     14
36 5145     14
37 4648     16
38 4674     18
39 4634     20
40 4950     20
41 5444     20
42 4776     24
43 5442     24
44 4622     36
45 4956     37
46 4719     91
47 4672     97
48 4611    102
49 4624    120
50 5446    120
51 4985    136
52 5441    156
53 4945    408
54 4673    491
55 4957   3118
56 5154   6328
57 4658  13790
58 4689  14110
59 4688  14485
60 5157  18149
61 4656  32222
62 5158  42381
63 5152  49172
64 5156  58538
65 5447 163442
>
> DF_arrange <- (arrange(DF_count,freq))
> barplot(DF_arrange$freq,names.arg=(DF_arrange$x),xlab="Event IDs", ylab="Event IDs Count" )
> barchart(~as.numeric(DF_arrange$freq) | as.factor(DF_arrange$x),xlab="Event Index")
> dotplot(~as.numeric(DF$RecordId) | as.factor(DF$Id), xlab="Event Index")
>
> DF4624 <- data.frame(DT[Id=="4624"])
> DF4688 <- data.frame(DT[Id=="4688"])
> DF5154 <- data.frame(DT[Id=="5154"])
> DF5157 <- data.frame(DT[Id=="5157"])
>
> DF4624_4688 <- merge.data.frame(DF4624,DF4688,all=TRUE)
> DF5154_5157 <- merge.data.frame(DF5154,DF5157,all=TRUE)
> DF4624_4688_5154_5157 <- merge.data.frame(DF4624_4688,DF5154_5157,all=TRUE)
> DF_TG <- DF4624_4688_5154_5157
> dotplot(~as.numeric(DF_TG$RecordId) | as.factor(DF_TG$Id), xlab="Event Index")
>
> nrow(DF_TG)
[1] 39082
> DF_TG <- data.frame(DF_TG[16500:24500,1:3])
> dotplot(~as.numeric(DF_TG$RecordId) | as.factor(DF_TG$Id), xlab="Events (Time)")

A general barplot is is not so useful for 417K events.

So a barchart (lattice) gives us a better idea of amounts.

But a dotplot (lattice) can give us Events in proportion through time.

We can filter specific events and then zoom in on the timeline.

No comments:

Post a Comment