Posts by Tags

Array

Data Types [02]: Numpy Array Basics

4 minute read

Published:

NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of non-negative integers. In NumPy dimensions are called axes.

C++

C++ Basics [08]: Pointers to Functions

2 minute read

Published:

A function pointer is a pointer that denotes a function rather than an object. Like any other pointer, a function pointer points to a particular type. A function’s type is determined by its return type and the types of its parameters.

C++ Basics [07]: Argument Type Conversions

6 minute read

Published:

When there is no exact match for overloading functions, type conversions might be necessary. In order to determine the best match, the compiler ranks the conversions that could be used to convert each argument to the type of its corresponding parameter. These type conversions are summarized in this post.

C++ Basics [06]: Functions Overloading and Matching

4 minute read

Published:

Functions that have the same name but different parameter lists and that appear in the same scope are overloaded. When we call these functions, the compiler can deduce which function we want based on the argument type we pass. Here arise a question: when does two parameter lists differ?

C++ Basics [05]: Inline and Constexpr Functions

3 minute read

Published:

Defining a function makes the code easier to read and understand. However, one potential drawback to make a function is that calling a function is apt to be slower than evaluating the equivalent expressions.

C++ Basics [03]: Rvalue and Rvalue References

5 minute read

Published:

Rvalue references is introduced in C++ 11 to support move functions, which allows programmers to avoid logically unnecessary copying and to provide perfect forwarding functions. They are primarily meant to aid in the design of higer performance and more robust libraries.

C++ Basics [01]: References and Pointers Basics

8 minute read

Published:

The definitions of references and pointers may be easy to understand. However, it is another thing to be able to use them in practice. Here are some examples that may help to understand references and pointers.

Data Sharing [C++] [05]: Between Local Devices

less than 1 minute read

Published:

There are various ways for communicating between local devices. Depending on our target devices, we can either connect two devices directly using cables and transfer data using protocals like CAN, Serial, TCP, UDP etc., or we can transfer data in wireless ways using protocals like Bluetooth, ZigBee, WiFi etc.

Concatenating

Data Wrangling [02]: Merging DataFrames

12 minute read

Published:

DataFrames can be merged, concatenated or combined in a number of ways, where

  • DataFrame.merge() or pandas.merge() merges DataFrame or named Series objects with a database-style join
  • DataFrame.join() join columns of another DataFrame, which is similiar to DataFrame.merge()
  • DataFrame.update() modifies in place using non-NA values from another DataFrame.
  • pandas.cancat() concatenates pandas objects along a particular axis with optional set logic along the other axes
  • DataFrame.combine() performs column-wise combine with another DataFrame
  • DataFrame.combine_first() updates null elements with value in the same location in other

Data Aggregation

Data Aggregation and Grouping [01]: GroupBy Method

6 minute read

Published:

Group operations involves three stages:

  1. split: object is split into groups based on one or more keys
  2. apply: a function is applied to each group, producing a new value
  3. combine: results of are combined into a result object

Data Bias

Data Cleaning

Data Cleaning [03]: String Manipulation

8 minute read

Published:

Simple strings can be operated with string built-in methods. For more complex patterns, regular expression is a powerful tool.

Data Cleaning [02]: Data Transformation

7 minute read

Published:

Pandas provides a bunch of APIs for data removing, replacing, renaming, transforming etc. This post will get a sketch of these functions.

Data Cleaning [01]: Handling Missing Data

6 minute read

Published:

Pandas uses the floating-point value NaN (Not a Number, np.nan) to represent missing data, and it provides several API functions related to missing data handling: dropna(), fillna(), isnull() and notnull().

Data Ethics

Data Preparation

Data Structures

Data Transformation

Data Cleaning [02]: Data Transformation

7 minute read

Published:

Pandas provides a bunch of APIs for data removing, replacing, renaming, transforming etc. This post will get a sketch of these functions.

Data Types

Data Types [01]: Python Built-In Types

3 minute read

Published:

The principal built-in types in Python are numerics, sequences, mappings, classes, instances and exceptions. Reference [this] site for a complete description. This post will cover some of the commonly used python built-in types.

Data Visualization

Data Visualization [02]: Pandas and Seaborn Basics

6 minute read

Published:

matplotlib is a fairly low-level tool. pandas itself has built-in methods that simplify creating visualizations from DataFrame and Series objects. And seaborn further simplify the procedures. Unfortuantely, seaborn doesn’t have built-in support for 3D functionalities. However, we can still use seaborn style for 3D matplotlib plots. Let’s learn the basics of pandas and seaborn through some simple examples.

Data Wrangling

Data Wrangling [03]: Reshaping and Pivot Tables

6 minute read

Published:

Several functions are useful for reshaping and pivoting the tables:

  • DataFrame.stack(): Stack the prescribed level(s) from columns to index.
  • DataFrame.unstack(): Pivot a level of the (necessarily hierarchical) index labels.
  • DataFrame.pivot(): Return reshaped DataFrame organized by given index / column values.
  • DataFrame.melt(): Unpivot a DataFrame from wide to long format.
  • DataFrame.explode(): Transform each element of a list-like to a row, replicating index values.

Data Wrangling [02]: Merging DataFrames

12 minute read

Published:

DataFrames can be merged, concatenated or combined in a number of ways, where

  • DataFrame.merge() or pandas.merge() merges DataFrame or named Series objects with a database-style join
  • DataFrame.join() join columns of another DataFrame, which is similiar to DataFrame.merge()
  • DataFrame.update() modifies in place using non-NA values from another DataFrame.
  • pandas.cancat() concatenates pandas objects along a particular axis with optional set logic along the other axes
  • DataFrame.combine() performs column-wise combine with another DataFrame
  • DataFrame.combine_first() updates null elements with value in the same location in other

Data Wrangling [01]: Multi-Indexing

6 minute read

Published:

Multi-indexing feature provides a way to work with higher dimensional data in a lower dimensional form.

DataFrame

Data Types [03]: Pandas DataFrame Basics

7 minute read

Published:

A DataFrame is a 2-dimensional data structure that can store data of different types (including characters, integers, floating point values, categorical data and more) in columns. It is similar to a spreadsheet, a SQL table or the data.frame in R. This post covers the basics of DataFrame, more functions will be explored in practice.

Database

Function

C++ Basics [08]: Pointers to Functions

2 minute read

Published:

A function pointer is a pointer that denotes a function rather than an object. Like any other pointer, a function pointer points to a particular type. A function’s type is determined by its return type and the types of its parameters.

C++ Basics [07]: Argument Type Conversions

6 minute read

Published:

When there is no exact match for overloading functions, type conversions might be necessary. In order to determine the best match, the compiler ranks the conversions that could be used to convert each argument to the type of its corresponding parameter. These type conversions are summarized in this post.

C++ Basics [06]: Functions Overloading and Matching

4 minute read

Published:

Functions that have the same name but different parameter lists and that appear in the same scope are overloaded. When we call these functions, the compiler can deduce which function we want based on the argument type we pass. Here arise a question: when does two parameter lists differ?

C++ Basics [05]: Inline and Constexpr Functions

3 minute read

Published:

Defining a function makes the code easier to read and understand. However, one potential drawback to make a function is that calling a function is apt to be slower than evaluating the equivalent expressions.

GroupBy

Data Aggregation and Grouping [01]: GroupBy Method

6 minute read

Published:

Group operations involves three stages:

  1. split: object is split into groups based on one or more keys
  2. apply: a function is applied to each group, producing a new value
  3. combine: results of are combined into a result object

Keras

Matplotlib

Merging

Data Wrangling [02]: Merging DataFrames

12 minute read

Published:

DataFrames can be merged, concatenated or combined in a number of ways, where

  • DataFrame.merge() or pandas.merge() merges DataFrame or named Series objects with a database-style join
  • DataFrame.join() join columns of another DataFrame, which is similiar to DataFrame.merge()
  • DataFrame.update() modifies in place using non-NA values from another DataFrame.
  • pandas.cancat() concatenates pandas objects along a particular axis with optional set logic along the other axes
  • DataFrame.combine() performs column-wise combine with another DataFrame
  • DataFrame.combine_first() updates null elements with value in the same location in other

Missing Data

Data Cleaning [01]: Handling Missing Data

6 minute read

Published:

Pandas uses the floating-point value NaN (Not a Number, np.nan) to represent missing data, and it provides several API functions related to missing data handling: dropna(), fillna(), isnull() and notnull().

Modeling

Data Modeling [03]: Pytorch

14 minute read

Published:

For non-deep-learning models, scikit-learn can be a good choice. For deep learning models, it’s better to use a deep learning framework like pytorch, keras or tensorflow (non keras version).

Data Modeling [02]: Scikit-Learn

7 minute read

Published:

Scikit-Learn contains a broad selection of standard supervised and unsupervised machine learning methods with tools for model selection and evaluation, data transformation, data loading, and model persistence. These models can be used for classification, clustering, regression, dimensionality reduction, and other common tasks. Let’s learn the basic of scikit-learnin dealing with classification and regression problems through several simple examples. Clustering and dimensionality examples can be referenced to Clustering Analysis and Factor and Principle Component Analysis respectively.

Data Modeling [01]: Patsy and Statsmodels

4 minute read

Published:

patsy is a Python package for describing statistical models (especially linear models, or models that have a linear component) and building design matrices. It is used in many projects to provide a high-level interface to the statistical code, including:

  • statsmodels: Estimation of statistical models, statistical tests, and statistical data exploration (link)
  • HDDM: Hierarchical Bayesian parameter estimation of Drift Diffusion Models (DDM) (link)

Multi-Indexing

Data Wrangling [01]: Multi-Indexing

6 minute read

Published:

Multi-indexing feature provides a way to work with higher dimensional data in a lower dimensional form.

Numpy

Data Types [02]: Numpy Array Basics

4 minute read

Published:

NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of non-negative integers. In NumPy dimensions are called axes.

Pandas

Data Visualization [02]: Pandas and Seaborn Basics

6 minute read

Published:

matplotlib is a fairly low-level tool. pandas itself has built-in methods that simplify creating visualizations from DataFrame and Series objects. And seaborn further simplify the procedures. Unfortuantely, seaborn doesn’t have built-in support for 3D functionalities. However, we can still use seaborn style for 3D matplotlib plots. Let’s learn the basics of pandas and seaborn through some simple examples.

Data Types [03]: Pandas DataFrame Basics

7 minute read

Published:

A DataFrame is a 2-dimensional data structure that can store data of different types (including characters, integers, floating point values, categorical data and more) in columns. It is similar to a spreadsheet, a SQL table or the data.frame in R. This post covers the basics of DataFrame, more functions will be explored in practice.

Patsy

Data Modeling [01]: Patsy and Statsmodels

4 minute read

Published:

patsy is a Python package for describing statistical models (especially linear models, or models that have a linear component) and building design matrices. It is used in many projects to provide a high-level interface to the statistical code, including:

  • statsmodels: Estimation of statistical models, statistical tests, and statistical data exploration (link)
  • HDDM: Hierarchical Bayesian parameter estimation of Drift Diffusion Models (DDM) (link)

Pivot Table

Pivot Tables

Data Wrangling [03]: Reshaping and Pivot Tables

6 minute read

Published:

Several functions are useful for reshaping and pivoting the tables:

  • DataFrame.stack(): Stack the prescribed level(s) from columns to index.
  • DataFrame.unstack(): Pivot a level of the (necessarily hierarchical) index labels.
  • DataFrame.pivot(): Return reshaped DataFrame organized by given index / column values.
  • DataFrame.melt(): Unpivot a DataFrame from wide to long format.
  • DataFrame.explode(): Transform each element of a list-like to a row, replicating index values.

Pointer

C++ Basics [08]: Pointers to Functions

2 minute read

Published:

A function pointer is a pointer that denotes a function rather than an object. Like any other pointer, a function pointer points to a particular type. A function’s type is determined by its return type and the types of its parameters.

C++ Basics [01]: References and Pointers Basics

8 minute read

Published:

The definitions of references and pointers may be easy to understand. However, it is another thing to be able to use them in practice. Here are some examples that may help to understand references and pointers.

Python

Data Types [03]: Pandas DataFrame Basics

7 minute read

Published:

A DataFrame is a 2-dimensional data structure that can store data of different types (including characters, integers, floating point values, categorical data and more) in columns. It is similar to a spreadsheet, a SQL table or the data.frame in R. This post covers the basics of DataFrame, more functions will be explored in practice.

Data Types [02]: Numpy Array Basics

4 minute read

Published:

NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of non-negative integers. In NumPy dimensions are called axes.

Data Types [01]: Python Built-In Types

3 minute read

Published:

The principal built-in types in Python are numerics, sequences, mappings, classes, instances and exceptions. Reference [this] site for a complete description. This post will cover some of the commonly used python built-in types.

Pytorch

Data Modeling [03]: Pytorch

14 minute read

Published:

For non-deep-learning models, scikit-learn can be a good choice. For deep learning models, it’s better to use a deep learning framework like pytorch, keras or tensorflow (non keras version).

Reference

C++ Basics [03]: Rvalue and Rvalue References

5 minute read

Published:

Rvalue references is introduced in C++ 11 to support move functions, which allows programmers to avoid logically unnecessary copying and to provide perfect forwarding functions. They are primarily meant to aid in the design of higer performance and more robust libraries.

C++ Basics [01]: References and Pointers Basics

8 minute read

Published:

The definitions of references and pointers may be easy to understand. However, it is another thing to be able to use them in practice. Here are some examples that may help to understand references and pointers.

Regular Expression

Data Cleaning [03]: String Manipulation

8 minute read

Published:

Simple strings can be operated with string built-in methods. For more complex patterns, regular expression is a powerful tool.

Reshaping

Data Wrangling [03]: Reshaping and Pivot Tables

6 minute read

Published:

Several functions are useful for reshaping and pivoting the tables:

  • DataFrame.stack(): Stack the prescribed level(s) from columns to index.
  • DataFrame.unstack(): Pivot a level of the (necessarily hierarchical) index labels.
  • DataFrame.pivot(): Return reshaped DataFrame organized by given index / column values.
  • DataFrame.melt(): Unpivot a DataFrame from wide to long format.
  • DataFrame.explode(): Transform each element of a list-like to a row, replicating index values.

SQL

Scikit-Learn

Data Modeling [02]: Scikit-Learn

7 minute read

Published:

Scikit-Learn contains a broad selection of standard supervised and unsupervised machine learning methods with tools for model selection and evaluation, data transformation, data loading, and model persistence. These models can be used for classification, clustering, regression, dimensionality reduction, and other common tasks. Let’s learn the basic of scikit-learnin dealing with classification and regression problems through several simple examples. Clustering and dimensionality examples can be referenced to Clustering Analysis and Factor and Principle Component Analysis respectively.

Seaborn

Data Visualization [02]: Pandas and Seaborn Basics

6 minute read

Published:

matplotlib is a fairly low-level tool. pandas itself has built-in methods that simplify creating visualizations from DataFrame and Series objects. And seaborn further simplify the procedures. Unfortuantely, seaborn doesn’t have built-in support for 3D functionalities. However, we can still use seaborn style for 3D matplotlib plots. Let’s learn the basics of pandas and seaborn through some simple examples.

Smart Eye

Smart Eye [03]: The World Model

5 minute read

Published:

The World Model is used to visualize which real world objects the user’s gaze intersects with. The World Model is constructed out of simple object such as planes, spheres, boxes, etc that describe the real world.

Smart Eye [02]: Head and Gaze Tracking

4 minute read

Published:

The Smart Eye Pro system is a head and gaze tracking system, which measures the subject’s head pose and gaze direction in 3D.

Smart Eye [01]: Getting Started

7 minute read

Published:

The Smart Eye Pro system is a head and gaze tracking system, which measures the subject’s head pose and gaze direction in 3D. In addition, information about eye lid opening values and pupil dilation can be measured.

Statistics

Statistics [31]: Summary

less than 1 minute read

Published:

This series of posts are notes from Applied Statistics given by Prof. LIANG Heng of Tsinghua University.

Statistics [30]: Clustering Analysis

5 minute read

Published:

Clustering analysis is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups.

Statistics [29]: Factor and Principle Component Analysis

17 minute read

Published:

In practice, data may contain many variables, however, not all of them have significant influence on the results we want to analysis or predict. Factor analysis and principle component analysis are two basic methods that aim to reduce the dimension of the data to make it easier to understand and analyze.

Statistics [28]: Multiple Regression Model

10 minute read

Published:

The purpose of multiple regression model is to estimate the dependent variable (response variable) using multiple independent variables (explanatory variables).

Statistics [24]: Variance Reducing Techniques

2 minute read

Published:

Several techniques to reduce variance of the estimation, including antithetic variates, control variates, stratified sampling and importance sampling.

Statistics [23]: Monte Carlo

4 minute read

Published:

Monte Carlo (MC) technique is a numerical method that makes use of random numbers to solve mathematical problems for which an analytical solution is not known.

Statistics [20]: Experimental Design

11 minute read

Published:

Summary of some of the commonly used experimental design methods, including completely randomized designs, randomized block designs, full factorial designs and fractional factorial designs.

Statistics [15]: Analysis of Variance - F test

4 minute read

Published:

In statistics, analysis of variance (ANOVA) is a collection of statistical models, and their associated procedures, in which the observed variance is partitioned into components due to different explanatory variables. In its simplest form ANOVA gives a statistical test of whether the means of several groups are all equal, and therefore generalizes Student’s two-sample test to more than two groups.

Statistics [11]: Parameter Interval Estimation

3 minute read

Published:

Point estimation refers to constructing certain statistical quantities using a point, and interval estimation is to estimate the unknown parameters using an interval.

Statistics [09]: Parameter Point Estimation

1 minute read

Published:

Assume are samples from a population, point estimation refers to constructing certain statistical quantities that can be used to estimate the distribution of the population. The method is not unique, the most commonly used methods are the method of moments and the method of maximum likelihood.

Statistics [04]: Some Common Continuous Distributions

3 minute read

Published:

This post will summarize some of the commonly used continuous distributions, including

  • Uniform distribution
  • Exponential distribution
  • Weibull distribution
  • Normal distribution
  • distribution
  • Student’s t-distribution
  • F-distribution
  • Gamma distribution
  • Beta Distribution

Statistics [03]: Some Common Discrete Distributions

4 minute read

Published:

This post will summarize some of the commonly used discrete distributions, including

  • Uniform distribution
  • Bernoulli distribution
  • Binomial distribution
  • Geometric distribution
  • Negative binomial distribution
  • Poisson distribution
  • Hypergeometric distribution
  • Mulitnomial distribution.

Statistics [02]: Shakespear’s New Poem

7 minute read

Published:

In 1985, Shakespearean scholar Gary Taylor discovered a nine-stanza poem in a bound folio volume that was attributed to Shakespeare (called the Taylor poem). The size of the newly discovered poem is small relative to the size of Shakespeare’s total work, only 429 total words. Can we prove that the poem was actually written by Shakespeare or not?

Statistics [01]: Probability vs Statistics

3 minute read

Published:

Probability and statistics are two closely related fields in mathematics which concern themselves with analyzing the relative frequency of events.. Still, there are fundamental differences in the way they see the world.

Statsmodels

Data Modeling [01]: Patsy and Statsmodels

4 minute read

Published:

patsy is a Python package for describing statistical models (especially linear models, or models that have a linear component) and building design matrices. It is used in many projects to provide a high-level interface to the statistical code, including:

  • statsmodels: Estimation of statistical models, statistical tests, and statistical data exploration (link)
  • HDDM: Hierarchical Bayesian parameter estimation of Drift Diffusion Models (DDM) (link)

String Manipulation

Data Cleaning [03]: String Manipulation

8 minute read

Published:

Simple strings can be operated with string built-in methods. For more complex patterns, regular expression is a powerful tool.

Tensorflow