Table of Contents
Published:
This post is left for holding table of contents.
Published:
This post is left for holding table of contents.
Published:
Google Data Analytics: Prepare Data for Exploration - Week 3
Published:
Google Data Analytics: Prepare Data for Exploration - Week 1 & Week 2
Published:
Workflow of Keras (Tensorflow). Tutorial from Keras on the Kaggle Cats vs Dogs binary classification dataset.
Published:
For non-deep-learning models, scikit-learn
can be a good choice. For deep learning models, it’s better to use a deep learning framework like pytorch
, keras
or tensorflow
(non keras version).
Published:
Scikit-Learn
contains a broad selection of standard supervised and unsupervised machine learning methods with tools for model selection and evaluation, data transformation, data loading, and model persistence. These models can be used for classification, clustering, regression, dimensionality reduction, and other common tasks. Let’s learn the basic of scikit-learn
in dealing with classification and regression problems through several simple examples. Clustering and dimensionality examples can be referenced to Clustering Analysis and Factor and Principle Component Analysis respectively.
Published:
patsy
is a Python package for describing statistical models (especially linear models, or models that have a linear component) and building design matrices. It is used in many projects to provide a high-level interface to the statistical code, including:
Published:
matplotlib
is a fairly low-level tool. pandas
itself has built-in methods that simplify creating visualizations from DataFrame and Series objects. And seaborn
further simplify the procedures. Unfortuantely, seaborn
doesn’t have built-in support for 3D functionalities. However, we can still use seaborn
style for 3D matplotlib
plots. Let’s learn the basics of pandas
and seaborn
through some simple examples.
Published:
Basics of matplotlib
.
Published:
Aggregations refer to any data transformation that produces scalar values from arrays.
Published:
Group operations involves three stages:
Published:
Several functions are useful for reshaping and pivoting the tables:
DataFrame.stack()
: Stack the prescribed level(s) from columns to index.DataFrame.unstack()
: Pivot a level of the (necessarily hierarchical) index labels.DataFrame.pivot()
: Return reshaped DataFrame organized by given index / column values.DataFrame.melt()
: Unpivot a DataFrame from wide to long format.DataFrame.explode()
: Transform each element of a list-like to a row, replicating index values.Published:
DataFrames can be merged, concatenated or combined in a number of ways, where
DataFrame.merge()
or pandas.merge()
merges DataFrame or named Series objects with a database-style joinDataFrame.join()
join columns of another DataFrame, which is similiar to DataFrame.merge()
DataFrame.update()
modifies in place using non-NA values from another DataFrame.pandas.cancat()
concatenates pandas objects along a particular axis with optional set logic along the other axesDataFrame.combine()
performs column-wise combine with another DataFrameDataFrame.combine_first()
updates null elements with value in the same location in otherPublished:
Multi-indexing feature provides a way to work with higher dimensional data in a lower dimensional form.
Published:
Some examples from https://docs.python.org/3/library/re.html#regular-expression-examples.
Published:
Simple strings can be operated with string built-in methods. For more complex patterns, regular expression is a powerful tool.
Published:
Pandas provides a bunch of APIs for data removing, replacing, renaming, transforming etc. This post will get a sketch of these functions.
Published:
Pandas uses the floating-point value NaN
(Not a Number, np.nan
) to represent missing data, and it provides several API functions related to missing data handling: dropna()
, fillna()
, isnull()
and notnull()
.
Published:
A DataFrame is a 2-dimensional data structure that can store data of different types (including characters, integers, floating point values, categorical data and more) in columns. It is similar to a spreadsheet, a SQL table or the data.frame in R. This post covers the basics of DataFrame, more functions will be explored in practice.
Published:
NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of non-negative integers. In NumPy dimensions are called axes
.
Published:
The principal built-in types in Python are numerics, sequences, mappings, classes, instances and exceptions. Reference [this] site for a complete description. This post will cover some of the commonly used python built-in types.
Published:
A function pointer is a pointer that denotes a function rather than an object. Like any other pointer, a function pointer points to a particular type. A function’s type is determined by its return type and the types of its parameters.
Published:
When there is no exact match for overloading functions, type conversions might be necessary. In order to determine the best match, the compiler ranks the conversions that could be used to convert each argument to the type of its corresponding parameter. These type conversions are summarized in this post.
Published:
Functions that have the same name but different parameter lists and that appear in the same scope are overloaded. When we call these functions, the compiler can deduce which function we want based on the argument type we pass. Here arise a question: when does two parameter lists differ?
Published:
Defining a function makes the code easier to read and understand. However, one potential drawback to make a function is that calling a function is apt to be slower than evaluating the equivalent expressions.
Published:
Some examples about passing and returning pointers and references to and from functions have been discussed in the post Passing and Returning References and Pointers. Some practical aspects about argument passing mentioned in C++ Primier (5th Edition) will be summarized in this post.
Published:
Rvalue references is introduced in C++ 11 to support move functions, which allows programmers to avoid logically unnecessary copying and to provide perfect forwarding functions. They are primarily meant to aid in the design of higer performance and more robust libraries.
Published:
Codes in this post can be found in folder ReferencePointer.
Published:
The definitions of references and pointers may be easy to understand. However, it is another thing to be able to use them in practice. Here are some examples that may help to understand references and pointers.
Published:
In process … …
Published:
There are various ways for communicating between local devices. Depending on our target devices, we can either connect two devices directly using cables and transfer data using protocals like CAN, Serial, TCP, UDP etc., or we can transfer data in wireless ways using protocals like Bluetooth, ZigBee, WiFi etc.
Published:
Codes in this post can be found in folder DataSharingBetweenProcesses.
Published:
In process … …
Published:
Codes in this post can be found in folder DataSharingBetweenFiles.
Published:
Codes in this post can be found in file DataSharingBetweenFunctions.cpp.
Published:
The World Model is used to visualize which real world objects the user’s gaze intersects with. The World Model is constructed out of simple object such as planes, spheres, boxes, etc that describe the real world.
Published:
The Smart Eye Pro system is a head and gaze tracking system, which measures the subject’s head pose and gaze direction in 3D.
Published:
The Smart Eye Pro system is a head and gaze tracking system, which measures the subject’s head pose and gaze direction in 3D. In addition, information about eye lid opening values and pupil dilation can be measured.
Published:
This series of posts are notes from Applied Statistics given by Prof. LIANG Heng of Tsinghua University.
Published:
Clustering analysis is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups.
Published:
In practice, data may contain many variables, however, not all of them have significant influence on the results we want to analysis or predict. Factor analysis and principle component analysis are two basic methods that aim to reduce the dimension of the data to make it easier to understand and analyze.
Published:
The purpose of multiple regression model is to estimate the dependent variable (response variable) using multiple independent variables (explanatory variables).
Published:
Basics of unitary regression, especially linear regression.
Published:
Problem set of Monte Carlo.
Published:
General practice of converting a uniform distribution to a general distribution.
Published:
Several techniques to reduce variance of the estimation, including antithetic variates, control variates, stratified sampling and importance sampling.
Published:
Monte Carlo (MC) technique is a numerical method that makes use of random numbers to solve mathematical problems for which an analytical solution is not known.
Published:
Problem set of parameter estimation.
Published:
Problem set of probabilities.
Published:
Summary of some of the commonly used experimental design methods, including completely randomized designs, randomized block designs, full factorial designs and fractional factorial designs.
Published:
Python realization of tests, including one-way test and independencae test.
Published:
test and test, including 1 sample test, 2 independent sample test, paired sample test and one-way and two-way ANOVA.
Published:
Data representation and manipulation in Python.
Published:
Cheat sheet of statistical tests.
Published:
In statistics, analysis of variance (ANOVA) is a collection of statistical models, and their associated procedures, in which the observed variance is partitioned into components due to different explanatory variables. In its simplest form ANOVA gives a statistical test of whether the means of several groups are all equal, and therefore generalizes Student’s two-sample test to more than two groups.
Published:
Basics of test.
Published:
test is usually used in comparing two groups of samples, including two independent samples and paired sampels.
Published:
Basics of parameter hypothesis test.
Published:
Point estimation refers to constructing certain statistical quantities using a point, and interval estimation is to estimate the unknown parameters using an interval.
Published:
Properties of point estimation, mean squre error and minimum variance unbiased estimation.
Published:
Assume are samples from a population, point estimation refers to constructing certain statistical quantities that can be used to estimate the distribution of the population. The method is not unique, the most commonly used methods are the method of moments and the method of maximum likelihood.
Published:
Conditional distributions and expectation of discrete and continuous random variables.
Published:
Bivariate and multivariate normal distributions.
Published:
Two simple examples of order statistics.
Published:
Expectation, Variance and Covariance (Correlation).
Published:
This post will summarize some of the commonly used continuous distributions, including
Published:
This post will summarize some of the commonly used discrete distributions, including
Published:
In 1985, Shakespearean scholar Gary Taylor discovered a nine-stanza poem in a bound folio volume that was attributed to Shakespeare (called the Taylor poem). The size of the newly discovered poem is small relative to the size of Shakespeare’s total work, only 429 total words. Can we prove that the poem was actually written by Shakespeare or not?
Published:
Probability and statistics are two closely related fields in mathematics which concern themselves with analyzing the relative frequency of events.. Still, there are fundamental differences in the way they see the world.