Thursday, 29 November 2012

What is Data?



Data are measurements or observations that are collected as a source of information.It can be numbers, words, measurements, observation or just description of things. Example: 36 degrees, cold, number of hospital beds, height, weigtht, age, level of severity of disease

A Variable is a characteristic or attribute of an information that describes a person, place, thing, or idea and can assume different values. The value of the variable can "vary" from one entity to another
Example - temperature of a room, a person's hair color is a potential variable, which could have the value of "blond" for one person and "brunette" for another




We could distinguish between two different variables Based on the Level of Measurement 
Quantitative Variable
A quantitative variable is one in which the variates differ in magnitude, e.g. income, age, etc.
Qualitative/Categorial Variable
A qualitative variable is one in which the variates differ in kind rather than in magnitude, e.g. marital status, gender, nationality, etc.
Qualitative Data
Quantitative Data
Overview:

  • Deals with descriptions.
  • The variates differ in kind rather than magnitude
  • Data can be observed but not measured.
  • Colors, textures, smells, tastes, appearance, type, etc.
  • Qualitative Quality
Overview:
  • Deals with numbers.
  • The variates differ in magnitude
  • Data which can be measured.
  • Length, height, area, volume, weight, speed, time, temperature, humidity, sound levels, cost, members, ages, etc.
  • Quantitative Quantit
Example :
Latte
 Qualitative data:
  • robust aroma
  • frothy appearance
  • strong taste
  • burgundy cup

Example :
Latte
Quantitative data:
  • 12 ounces of latte
  • serving temperature 150ยบ F.
  • serving cup 7 inches in height
  • cost $4.95
 
 
Based on Statistical model there are two kinds of variable, 
 
Response Variable
The outcome of a study or . A variable you would be interested in predicting or forecasting. Often called a dependent variable whose value is dependent on the  predicted variable.
Explanatory Variable
Any variable that explains the response variable. Often called an independent variable or predictor variable.
Based on number of variables in a study, we have the following types of data,

Univariate Data
Involves a single variable does not deal with cause or relationship. The major   purpose of univariate is to describe data 

·         Central tendency - mean, mode, median
·         Dispersion - range, varience, max, min, quartiles, standard deviation
·         Frequency Distribution 
·         Bar graph, histogram, pir-chart, line graph, box-and-whiskers plot  

Bivariate Data

Involves two variables, deals with causes and relationship. The major purpose of the bivariate is to explain  

·         Analysis of two variables simultaneously
·         Correlation, comparisons, causes, relationships, explanations 
·         Tables where one variable is contigent on the values of the other variable
·         Independent and dependent variables
 

Data Types 
data type or simply type is a classification identifying one of various types of data, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of that type can be stored

Almost all programming languages explicitly include the notion of data type, though different languages may use different terminology. Common data types may include:
  • Intergers
  • Booleans
  • Characters
  • Floating numbers
  • Alphanumeric strings
Basic Data Types in R

Every individual data value has a data type that tells us what sort of value it is. The most common data types are NUMBERS, which R calls numeric values, and TEXT, which R calls character values. R also has LOGICAL values.



SAS Data Types

Internally, SAS supports two data types for storing data: CHAR  (fixed-length character data, 32,767-character maximum ) NUM (double-precision floating-point number )
Note: If the data field is longer than 254 characters, the SAS ODBC Driver processes it as the ODBC data type SQL_VARCHAR.
By using SAS format information, the SAS ODBC Driver is able to represent other ODBC data types, both when responding to queries, and in CREATE TABLE statements

CREATE TABLE
Data Type Name

ODBC Data Type

SAS Data Type
char(w)
SQL_CHAR
CHAR(w)
num(w, d)
SQL_DOUBLE
NUM
num(w, d)
SQL_FLOAT
NUM
integer
SQL_INTEGER
NUM FORMAT=11.0
date9x
SQL_DATE
NUM FORMAT=DATE9X.
datetime19x
SQL_TIMESTAMP
NUM FORMAT=DATETIME19X.
time8x
SQL_TIME
NUM FORMAT=TIME8X.


JMP Data Types

Every field in a JMP file has a name and a data type. The data type indicates how much physical storage to set aside for the field and the format in which the data is stored.
CHARACTER
specifies a field for character string data. The maximum length is 255 characters. Characters can be letters, digits, spaces, or special characters.
META
specifies how metadata contained in the specified data set is processed.
NUMERIC
specifies an 8-byte floating point number. This is also called a double precision number. When you are reading data, this maps directly to the SAS double precision number. When you are writing data, all SAS numeric variables (regardless of length) become JMP numeric variables.
ROWSTATE
specifies an integer variable that takes on the value of 1 or missing. When you are reading data, this maps to a SAS double precision number.
DATE
specifies the date format. When you are reading data, the date values are mapped to a SAS number and scaled to the base date. The JMP date display format maps to the appropriate SAS date display format. When you are writing data, the SAS output format for the numeric variable is checked to determine whether it is a date format. If so, the SAS numeric value is scaled to a JMP date value with the appropriate date display format.
DATETIME
specifies the datetime format. When you are reading data, the datetime values are mapped to a SAS number and scaled to the base datetime. The JMP datetime display format maps to the appropriate SAS datetime display format. When you are writing data, the SAS output format for the numeric variable is checked to determine whether it is a datetime format. If so, the SAS numeric value is scaled to a JMP datetime value with the appropriate datetime display format.
TIME
specifies the time format. When you are reading data, the time values are mapped to a SAS number and scaled to the base time. The JMP time display format maps to the appropriate SAS time display format. When you are writing data, the SAS output format for the numeric variable is checked to determine whether it is a time format. If so, the SAS numeric value is scaled to a JMP time value with the appropriate time display format.


Data Types in STATA

Each element of data is said to be either type string or numeric. 
STRING:  Associated with each data type is a storage type. Say Str1, Str2, Str3...etc.
NUMBER: Numbers are stored as byte, int, long, float, or double, with the default being float.  byte, int, and long are said to be of integer type in that they can hold only integers.

 

Data Structures

A data structure is an actual implementation of a particular abstract data type.
Because many different languages approach the construction of data structures differently

Data structure refers to the way data is organized and manipulated. It seeks to find ways to make data access more efficient. When dealing with data structure, we not only focus on one piece of data, but rather different set of data and how they can relate to one another in an organized manner.

Examples: Arrays, Lists, Iterators, Stacks & Queues, Binary trees, Min & Max Heaps, Graphs, Hash Tables, Sets and Tradeoffs.

Basic data structures in R

Vectors 
A collection of values that all have the same data type. The elements of a vector are all numbers, giving a numeric vector, or all character values, giving a character vector.
A vector can be used to represent a single variable in a data set.
Image numvec


Factors  
A collection of values that all come from a fixed set of possible values. A factor is similar to a vector, except that the values within a factor are limited to a fixed set of possible values.
A factor can be used to represent a categorical variable in a data set.
Image factor
Matrices  
A two-dimensional collection of values that all have the same type. The values are arranged in rows and columns.
There is also an array data structure that extends this idea to more than two dimensions.
Image matrix


Data frames
 
A collection of vectors that all have the same length. This is like a matrix, except that each column can contain a different data type.
A data frame can be used to represent an entire data set.
Image df


Lists
 
A collection of data structures. The components of a list can be simply vectors--similar to a data frame, but with each column allowed to have a different length. However, a list can also be a much more complicated structure.
This is a very flexible data structure. Lists can be used to store any combination of data values together.
Image list






No comments:

Post a Comment