Big Data/Hadoop: R Programming 2 : Loading Data

Wednesday, January 14, 2015

R Programming 2 : Loading Data

Loading txt file from Linux to R:
Place the file in /home/username/ directory
d = read.table("/home/userId/diva.txt",sep="\t")
print(d)

OR

d = read.table("foobar.txt", sep="\t", col.names=c("id", "name"), fill=FALSE,
strip.white=TRUE)

Loading CSV file:
data <- read.csv(file.choose(),header=T)

file.choose() function will allow users to select the file from required path.
data
User First.Name Sal
1 53 R 50000
2 73 Ra 76575
3 72 An 786776
4 71 Aa 5456
5 68 Ni 7867986
Here 5 Observations on 3 Variables.

Here we can use sep to specify , or |
data2 <- read.csv(file.choose(),header=T,sep=",")

----------------------------------------------------
dim : This will let us know the dimensions of the data in R that is number of rows and number of columns.

dim(cars)
[1] 50 2

Here 50 Columns and 2 rows.
---------------------
head and tail commands:
head(cars) : head command will give first 6 records in the object.
speed dist
1 4 2
2 4 10
3 7 4
4 7 22
5 8 16
6 9 10

tail command will give last 6 commands.
tail(cars)
speed dist
45 23 54
46 24 70
47 24 92
48 24 93
49 24 120
50 25 85
-----------------------------------------
Basic Commands to Explore data:
data2[c(1,2,3),]
data2[5:9,]
names(cars)
mean(cars$dist)
attach(cars)
detach(cars)
Summary(cars)
class(gender) --for gender kind of objects

Merge Data:
Merge merges only common cases to both datasets
mydata <- merge(mydata1, mydata3, by=c("country","year"))

Adding the option “all=TRUE” includes all cases from both datasets
mydata <- merge(mydata1, mydata3, by=c("country","year"), all=TRUE)

Many to One
mydata <- merge(mydata1, mydata4, by=c("country"))

mydata_sorted <- mydata[order(country, year),]

attach(mydata_sorted)
detach(mydata_sorted)

Big Data/Hadoop

Wednesday, January 14, 2015

R Programming 2 : Loading Data

No comments:

Post a Comment

Search This Blog

Blog Archive

Total Pageviews

Translate