Blog Posts

2100

1 minute read

Published: January 01, 2100

This post is left for holding table of contents.

2022

Data Preparation [02]: Database Basics

2 minute read

Published: February 19, 2022

Google Data Analytics: Prepare Data for Exploration - Week 3

Data Preparation [01]: Data Collection Considerations

4 minute read

Published: February 18, 2022

Google Data Analytics: Prepare Data for Exploration - Week 1 & Week 2

Data Modeling [04]: Keras and Tensorflow

19 minute read

Published: February 17, 2022

Workflow of Keras (Tensorflow). Tutorial from Keras on the Kaggle Cats vs Dogs binary classification dataset.

Data Modeling [03]: Pytorch

14 minute read

Published: February 16, 2022

For non-deep-learning models, scikit-learn can be a good choice. For deep learning models, it’s better to use a deep learning framework like pytorch, keras or tensorflow (non keras version).

Data Modeling [02]: Scikit-Learn

7 minute read

Published: February 15, 2022

Scikit-Learn contains a broad selection of standard supervised and unsupervised machine learning methods with tools for model selection and evaluation, data transformation, data loading, and model persistence. These models can be used for classification, clustering, regression, dimensionality reduction, and other common tasks. Let’s learn the basic of scikit-learnin dealing with classification and regression problems through several simple examples. Clustering and dimensionality examples can be referenced to Clustering Analysis and Factor and Principle Component Analysis respectively.

Data Modeling [01]: Patsy and Statsmodels

4 minute read

Published: February 14, 2022

patsy is a Python package for describing statistical models (especially linear models, or models that have a linear component) and building design matrices. It is used in many projects to provide a high-level interface to the statistical code, including:

statsmodels: Estimation of statistical models, statistical tests, and statistical data exploration (link)
HDDM: Hierarchical Bayesian parameter estimation of Drift Diffusion Models (DDM) (link)

Data Visualization [02]: Pandas and Seaborn Basics

6 minute read

Published: February 13, 2022

matplotlib is a fairly low-level tool. pandas itself has built-in methods that simplify creating visualizations from DataFrame and Series objects. And seaborn further simplify the procedures. Unfortuantely, seaborn doesn’t have built-in support for 3D functionalities. However, we can still use seaborn style for 3D matplotlib plots. Let’s learn the basics of pandas and seaborn through some simple examples.

Data Visualization [01]: Matplotlib Basics

7 minute read

Published: February 12, 2022

Basics of matplotlib.

Data Aggregation and Grouping [02]: Data Aggregation

6 minute read

Published: February 11, 2022

Aggregations refer to any data transformation that produces scalar values from arrays.

Data Aggregation and Grouping [01]: GroupBy Method

6 minute read

Published: February 10, 2022

Group operations involves three stages:

split: object is split into groups based on one or more keys
apply: a function is applied to each group, producing a new value
combine: results of are combined into a result object

Data Wrangling [03]: Reshaping and Pivot Tables

6 minute read

Published: February 09, 2022

Several functions are useful for reshaping and pivoting the tables:

DataFrame.stack(): Stack the prescribed level(s) from columns to index.
DataFrame.unstack(): Pivot a level of the (necessarily hierarchical) index labels.
DataFrame.pivot(): Return reshaped DataFrame organized by given index / column values.
DataFrame.melt(): Unpivot a DataFrame from wide to long format.
DataFrame.explode(): Transform each element of a list-like to a row, replicating index values.

Data Wrangling [02]: Merging DataFrames

12 minute read

Published: February 08, 2022

DataFrames can be merged, concatenated or combined in a number of ways, where

DataFrame.merge() or pandas.merge() merges DataFrame or named Series objects with a database-style join
DataFrame.join() join columns of another DataFrame, which is similiar to DataFrame.merge()
DataFrame.update() modifies in place using non-NA values from another DataFrame.
pandas.cancat() concatenates pandas objects along a particular axis with optional set logic along the other axes
DataFrame.combine() performs column-wise combine with another DataFrame
DataFrame.combine_first() updates null elements with value in the same location in other

Data Wrangling [01]: Multi-Indexing

6 minute read

Published: February 07, 2022

Multi-indexing feature provides a way to work with higher dimensional data in a lower dimensional form.

Data Cleaning [04]: Regular Expression Examples

4 minute read

Published: February 06, 2022

Some examples from https://docs.python.org/3/library/re.html#regular-expression-examples.

Data Cleaning [03]: String Manipulation

8 minute read

Published: February 06, 2022

Simple strings can be operated with string built-in methods. For more complex patterns, regular expression is a powerful tool.

Data Cleaning [02]: Data Transformation

7 minute read

Published: February 05, 2022

Pandas provides a bunch of APIs for data removing, replacing, renaming, transforming etc. This post will get a sketch of these functions.

Data Cleaning [01]: Handling Missing Data

6 minute read

Published: February 04, 2022

Pandas uses the floating-point value NaN (Not a Number, np.nan) to represent missing data, and it provides several API functions related to missing data handling: dropna(), fillna(), isnull() and notnull().

Data Types [03]: Pandas DataFrame Basics

7 minute read

Published: February 03, 2022

A DataFrame is a 2-dimensional data structure that can store data of different types (including characters, integers, floating point values, categorical data and more) in columns. It is similar to a spreadsheet, a SQL table or the data.frame in R. This post covers the basics of DataFrame, more functions will be explored in practice.

Data Types [02]: Numpy Array Basics

4 minute read

Published: February 02, 2022

NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of non-negative integers. In NumPy dimensions are called axes.

Data Types [01]: Python Built-In Types

3 minute read

Published: February 01, 2022

The principal built-in types in Python are numerics, sequences, mappings, classes, instances and exceptions. Reference [this] site for a complete description. This post will cover some of the commonly used python built-in types.

C++ Basics [08]: Pointers to Functions

2 minute read

Published: January 12, 2022

A function pointer is a pointer that denotes a function rather than an object. Like any other pointer, a function pointer points to a particular type. A function’s type is determined by its return type and the types of its parameters.

C++ Basics [07]: Argument Type Conversions

6 minute read

Published: January 12, 2022

When there is no exact match for overloading functions, type conversions might be necessary. In order to determine the best match, the compiler ranks the conversions that could be used to convert each argument to the type of its corresponding parameter. These type conversions are summarized in this post.

C++ Basics [06]: Functions Overloading and Matching

4 minute read

Published: January 12, 2022

Functions that have the same name but different parameter lists and that appear in the same scope are overloaded. When we call these functions, the compiler can deduce which function we want based on the argument type we pass. Here arise a question: when does two parameter lists differ?

C++ Basics [05]: Inline and Constexpr Functions

3 minute read

Published: January 11, 2022

Defining a function makes the code easier to read and understand. However, one potential drawback to make a function is that calling a function is apt to be slower than evaluating the equivalent expressions.

C++ Basics [04]: Function Basics

9 minute read

Published: January 10, 2022

Some examples about passing and returning pointers and references to and from functions have been discussed in the post Passing and Returning References and Pointers. Some practical aspects about argument passing mentioned in C++ Primier (5th Edition) will be summarized in this post.

C++ Basics [03]: Rvalue and Rvalue References

5 minute read

Published: January 09, 2022

Rvalue references is introduced in C++ 11 to support move functions, which allows programmers to avoid logically unnecessary copying and to provide perfect forwarding functions. They are primarily meant to aid in the design of higer performance and more robust libraries.

C++ Basics [02]: Passing/Returning References/Pointers

4 minute read

Published: January 08, 2022

Codes in this post can be found in folder ReferencePointer.

C++ Basics [01]: References and Pointers Basics

8 minute read

Published: January 07, 2022

The definitions of references and pointers may be easy to understand. However, it is another thing to be able to use them in practice. Here are some examples that may help to understand references and pointers.

Data Sharing [C++] [06]: Between Remote Devices

less than 1 minute read

Published: January 06, 2022

In process … …

Data Sharing [C++] [05]: Between Local Devices

less than 1 minute read

Published: January 05, 2022

There are various ways for communicating between local devices. Depending on our target devices, we can either connect two devices directly using cables and transfer data using protocals like CAN, Serial, TCP, UDP etc., or we can transfer data in wireless ways using protocals like Bluetooth, ZigBee, WiFi etc.

Data Sharing [C++] [04]: Between Processes

7 minute read

Published: January 04, 2022

Codes in this post can be found in folder DataSharingBetweenProcesses.

Data Sharing [C++] [03]: Between Threads

less than 1 minute read

Published: January 03, 2022

In process … …

Data Sharing [C++] [02]: Between Files

2 minute read

Published: January 02, 2022

Codes in this post can be found in folder DataSharingBetweenFiles.

Data Sharing [C++] [01]: Between Functions Within A File

3 minute read

Published: January 01, 2022

Codes in this post can be found in file DataSharingBetweenFunctions.cpp.

2021

Smart Eye [03]: The World Model

5 minute read

Published: December 03, 2021

The World Model is used to visualize which real world objects the user’s gaze intersects with. The World Model is constructed out of simple object such as planes, spheres, boxes, etc that describe the real world.

Smart Eye [02]: Head and Gaze Tracking

4 minute read

Published: December 02, 2021

The Smart Eye Pro system is a head and gaze tracking system, which measures the subject’s head pose and gaze direction in 3D.

Smart Eye [01]: Getting Started

7 minute read

Published: December 01, 2021

The Smart Eye Pro system is a head and gaze tracking system, which measures the subject’s head pose and gaze direction in 3D. In addition, information about eye lid opening values and pupil dilation can be measured.

Statistics [31]: Summary

less than 1 minute read

Published: January 31, 2021

This series of posts are notes from Applied Statistics given by Prof. LIANG Heng of Tsinghua University.

Statistics [30]: Clustering Analysis

5 minute read

Published: January 30, 2021

Clustering analysis is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups.

Statistics [29]: Factor and Principle Component Analysis

17 minute read

Published: January 29, 2021

In practice, data may contain many variables, however, not all of them have significant influence on the results we want to analysis or predict. Factor analysis and principle component analysis are two basic methods that aim to reduce the dimension of the data to make it easier to understand and analyze.

Statistics [28]: Multiple Regression Model

10 minute read

Published: January 28, 2021

The purpose of multiple regression model is to estimate the dependent variable (response variable) using multiple independent variables (explanatory variables).

Statistics [27]: Unitary Regression Model

5 minute read

Published: January 27, 2021

Basics of unitary regression, especially linear regression.

Statistics [26]: Problem Set [03] - Monte Carlo

less than 1 minute read

Published: January 26, 2021

Problem set of Monte Carlo.

Statistics [25]: From Uniform to General Distributions

1 minute read

Published: January 25, 2021

General practice of converting a uniform distribution to a general distribution.

Statistics [24]: Variance Reducing Techniques

2 minute read

Published: January 24, 2021

Several techniques to reduce variance of the estimation, including antithetic variates, control variates, stratified sampling and importance sampling.

Statistics [23]: Monte Carlo

4 minute read

Published: January 23, 2021

Monte Carlo (MC) technique is a numerical method that makes use of random numbers to solve mathematical problems for which an analytical solution is not known.

Statistics [22]: Problem Set [02] - Parameter Estimation

less than 1 minute read

Published: January 22, 2021

Problem set of parameter estimation.

Statistics [21]: Problem Set [01] - Probabilities

less than 1 minute read

Published: January 21, 2021

Problem set of probabilities.

Statistics [20]: Experimental Design

11 minute read

Published: January 20, 2021

Summary of some of the commonly used experimental design methods, including completely randomized designs, randomized block designs, full factorial designs and fractional factorial designs.

Statistics [19]: Python [03] - Chi-Squared Test

2 minute read

Published: January 19, 2021

Python realization of $\chi^2$ tests, including one-way $\chi^2$ test and $\chi^2$ independencae test.

Statistics [18]: Python [02] - t Test & F Test

9 minute read

Published: January 18, 2021

$t$ test and $F$ test, including 1 sample $t$ test, 2 independent sample $t$ test, paired sample $t$ test and one-way and two-way ANOVA.

Statistics [17]: Python [01] - Data Representation

3 minute read

Published: January 17, 2021

Data representation and manipulation in Python.

Statistics [16]: Summary of Statistical Tests

3 minute read

Published: January 16, 2021

Cheat sheet of statistical tests.

Statistics [15]: Analysis of Variance - F test

4 minute read

Published: January 15, 2021

In statistics, analysis of variance (ANOVA) is a collection of statistical models, and their associated procedures, in which the observed variance is partitioned into components due to different explanatory variables. In its simplest form ANOVA gives a statistical test of whether the means of several groups are all equal, and therefore generalizes Student’s two-sample $t$ test to more than two groups.

Statistics [14]: Chi-Squared Test

3 minute read

Published: January 14, 2021

Basics of $\chi^2$ test.

Statistics [13]: t Test - Comparing Two Samples

1 minute read

Published: January 13, 2021

$t$ test is usually used in comparing two groups of samples, including two independent samples and paired sampels.

Statistics [12]: Parameter Hypothesis Test

4 minute read

Published: January 12, 2021

Basics of parameter hypothesis test.

Statistics [11]: Parameter Interval Estimation

3 minute read

Published: January 11, 2021

Point estimation refers to constructing certain statistical quantities using a point, and interval estimation is to estimate the unknown parameters using an interval.

Statistics [10]: Evaluation of Point Estimation

4 minute read

Published: January 10, 2021

Properties of point estimation, mean squre error and minimum variance unbiased estimation.

Statistics [09]: Parameter Point Estimation

1 minute read

Published: January 09, 2021

Assume $X_1,X_2,...,X_n$ are samples from a population, point estimation refers to constructing certain statistical quantities $\hat{\theta} = \theta(X_1,X_2,...,X_n)$ that can be used to estimate the distribution of the population. The method is not unique, the most commonly used methods are the method of moments and the method of maximum likelihood.

Statistics [08]: Conditional Distributions and Expectation

1 minute read

Published: January 08, 2021

Conditional distributions and expectation of discrete and continuous random variables.

Statistics [07]: Multivariate Normal Distributions

1 minute read

Published: January 07, 2021

Bivariate and multivariate normal distributions.

Statistics [06]: Order Statistics

1 minute read

Published: January 06, 2021

Two simple examples of order statistics.

Statistics [05]: Statistical Quantities

1 minute read

Published: January 05, 2021

Expectation, Variance and Covariance (Correlation).

Statistics [04]: Some Common Continuous Distributions

3 minute read

Published: January 04, 2021

This post will summarize some of the commonly used continuous distributions, including

Uniform distribution
Exponential distribution
Weibull distribution
Normal distribution
$\chi^2$ distribution
Student’s t-distribution
F-distribution
Gamma distribution
Beta Distribution

Statistics [03]: Some Common Discrete Distributions

4 minute read

Published: January 03, 2021

This post will summarize some of the commonly used discrete distributions, including

Uniform distribution
Bernoulli distribution
Binomial distribution
Geometric distribution
Negative binomial distribution
Poisson distribution
Hypergeometric distribution
Mulitnomial distribution.

Statistics [02]: Shakespear’s New Poem

7 minute read

Published: January 02, 2021

In 1985, Shakespearean scholar Gary Taylor discovered a nine-stanza poem in a bound folio volume that was attributed to Shakespeare (called the Taylor poem). The size of the newly discovered poem is small relative to the size of Shakespeare’s total work, only 429 total words. Can we prove that the poem was actually written by Shakespeare or not?

Statistics [01]: Probability vs Statistics

3 minute read

Published: January 01, 2021

Probability and statistics are two closely related fields in mathematics which concern themselves with analyzing the relative frequency of events.. Still, there are fundamental differences in the way they see the world.

Chao Huang

Blog Posts

2100

2022

2021