• Taichi - Physics Engines

           

          Some supplement contents about Taichi

          If overlapping contents: just add on the original GAMES201 notes. The new contents: add at the last of the notes individually.

          • Exporting Results (Lecture 2)
          • OOP and Metaprogramming (Lecture 3)
          • Diff. Programming (Lecture 3)
          • Visualization (Lecture 3)

           

           

          Lecture 1 Introduction

           

          Keyword: Discretization / Efficient solvers / Productivity / Performance / Hardware architecture / Data structures / Parallelization / Differentiability (DiffTaichi)

           

          Taichi Programming Language

          Initialization

          Required every time using taichi

          Use ti.init, for spec. cpu or gpu: use ti.cpu (default) or ti.gpu

          Data

          Data Types

          signed integers ti.i8 (~ 64, default i32) / unsigned integers ti.u8 (~64) / float-point numbers ti.f32(default) (~ 64)

          Modify Default

          The default can be changed via ti.init

          Type Promotions

          Switch to high percision automatically

          Type Casts

          Tensor

          Multi-dimensional arrays(高维数组)

          Field

          ti.field: A global N-d array of elements

          heat_field = ti.field(dtype=ti.f32, shape=(256, 256))

          Other Examples:

          Kernels

          Must be decorated with @ti.kernel (Compiled, statically-typed, lexically-scoped, parallel and differentiable - faster)

          For loops at the outermost scope in a Taichi kernel is automatically parallelized (if inside - serial)

          If a outermost scope is not wanted to parallelize for and an inside scope is, write the unwanted one in the python scope and call the other one as the outermost in the kernel.

          Arguments

          Return Value

          Functions

          Decorated with @ti.func, usually for high freq. used func.

          Taichi's function can be called in taichi's kernel but can't be called by python (not global)

          Function 可以被 kernel 调用但不能被 py 调用,kernel 也不能调用 kernel,但是 function可以调用 function

          image-20211005212840697

          Arguments

          Scalar Math

          Similar to python. But Taichi also support chaining comparisons (a < b <= c != d)

          Specifically, a / b outputs floating point numbers, a // b ouputs integer number (as in py3)

          Matrices and Linear Algebra

          ti.Matrix is for small matrices only. For large matrices, 2D tensor of scalars should be considered.

          Note that element-wise product in taichi is * and mathematically defined matrix product is @ (like MATLAB)

          Common Operations:(返回矩阵 A 经过变换的结果而非变换 A 本身)

          Parallel for-loops

          (Automatically parallalized)

          Range-for loops

          Same as python for loops

          Struct-for loops

          Iterates over (sparse) tensor elements. Only lives at the outermost scope looping over a ti.field

          for i,j in x: can be used

          Example:

          Atomic Operations

          In Taichi, augmented assignments (e.g. x[i] += 1) are automatically atomic.

          (+=: the value on the RHS is directly summed on the current value of the var. and the referncing var.(array))

          (Atomic operation: an operation that will always be executed without any other process being able to read or change state that is read or changed during the operation. It is effectively executed as a single step, and is an important quality in a number of algorithms that deal with multiple independent processes, both in synchronization and algorithms that update shared data without requiring synchronization.)

          When modifying global var. in parallel, make sure to use atomic operations.

          Taichi-scope vs. Python-scope

          Taichi-scope: Everything decorated with ti.kernel and ti.func

          Code in Taichi-scope will be compiled by the Taichi compiler and run on parallel devices (Attention that in parallel devices the order of print may not be guaranteed)

          Python-scope: Code outside the Taichi-scope

          Code in Python-scope is simple py code and will be executed by the py interpreter

          Phases

          1. Initialization: ti.init()

          2. Tensor allocation: ti.var, ti.Vector, ti.Matrix

            Only define tensors in this allocation phase and never define in the computation phase

          3. Computation (launch kernel, access tensors in Py-scope)

          4. Optional: Restart (clear memory, destory var. and kernels): ti.reset()

          Practical Example (fractal)

          Debug Mode

           

          Lecture 2 Lagrangian View

           

          Two Views (Def)

          Lagrangian View

          Sensors that move passively with the simulated material(节点跟随介质移动)

          粒子会被额外记录位置,会随(被模拟的)介质不断移动

          Euler View

          Still sensors that never moves(穿过的介质的速度)

          网格中每个点的位置不会被特别记录

          Mass-Spring Systems

          弹簧 - 质点模型 (Cloth / Elastic objects / ...)

          Hooke's Law

          fij=k(xixj2lij)(xix^j)fi=jjifij

          (f - force, k - Spring stifness, Li,j - Spring rest length between particle i and j, ^ - normalization, xix^j​ - direction vector of paticle i to j)

          Newton's Second Law of Motion

          vit=1mifi;xit=vi

          Time Integration

          Common Types of Integrators

          Comparison

          Implementing a Mass-Spring System with Symplectic Euler

          Steps

          Demo

          Backward Euler Implicit Implement

          Steps

          Solving the system:

          A=[IΔt2M1fx(xt)]b=vt+ΔtM1f(xt)Avt+1=b

          Demo

          Unifying Explicit and Implicit Integrators

          [IβΔt2M1fx(xt)]vt+1=vt+ΔtM1f(xt)

          Solve Faster

          For millions of mass points and springs

          Lagrangian Fluid Simulation (SPH)

          Smoothed Particle Hydrodynamics

          Use particles carrying samples of physical quntities, and a kernel W​, to approximate continuous fields (A can be almost any spatially varying phisical attributes: density / pressure / ...)

          A(x)=iAimiρiW(xxj2,h)

          image-20210724122234613

          在 x 这点的物理场的值(即 A(x)​​​)等于核函数(如密度 / 压力等)加权平均了周围的粒子(本质是光滑了)的求和。核函数中间高四周低,使得更靠近该点影响更大,远离该点的影响变小(超过 support 半径 h​ 则为 0)

          不需要 mesh,适合模拟自由表面 free-surface flows 流体(如水和空气接触的表面,反之,不适合烟雾(空间需要被填满)等),可以使用 “每个粒子就是以小包水” 理解

          Equation of States (EOS)

          aka. Weakly Compressible SPH (WCSPH)

          Momentum Equation (ρ - density (ρ0 - ideal), B - bulk modulus, γ - constant (usually ~ 7) )

          (Actually the Navier-Stoke's without considering viscosity)

          DvDt=1ρp+g,p=B((ρρ0)γ1)A(x)=iAimiρiW(xxj2,h),ρi=jmjW(xixj2,h)

          ρi​ 通过质量加权, D - Lagrange derivative

          Gradient in SPH

          A(x)=iAimiρiW(xxj2,h)Ai=ρijmj(Aiρi2+Ajρj2)xiW(xixj2,h)

          Not really accurate but at least symmetric and momentum conserving (to add viscosity etc. Laplacian should be introduced)

          SPH Simulation Cycle

          Variant of SPH

          Paper: SPH Fluids in Computer Graphics, Smooted Particle Hydrodynamics Techniques for Physics Based Simulation of Fluids and Solids (2019)

          Courant-Friedrichs-Lewy (CFL) Conditions (Explicit)

          One upper bound of time step size: (def. using velocity other than stiffness)

          C=uΔtΔxCmax1

          (C - Courant number, CFL number, u - maximum velocity, Δt - time step, Δx - length interval (e.g. particle radius and grid size))

          Application: estimating allowed time step in (explicit) time integration.

          SPH: ~ 0.4; MPM (Material Point Method): 0.3 ~ 1; FLIP fluid (smoke): 1 ~ 5+

          Build spatial data structure such as voxel grids O(n2)O(n)

          Neighborhood search with hashing

          精确找到距离不超过 h​​​​ 的粒子,每个 voxel 维护一个 list(所有位置在这个 voxel 里的粒子),每个需要查找的粒子先确定相应 voxel 再查找周围的 voxel(枚举所有包含的粒子,通常是常数)- SPH 的瓶颈

          image-20210726000447459

          Ref: Compact hashing

          Other Paricle-based Simulation Methods

          Exporting Results

          Exporting Videos

           

          Lecture 3 Lagrangian View (2)

           

          弹性、有限元基础、Taichi 高级特性

          Elasticity

          Deformation

          Deformation Map ϕ​​ (to describe): A (vector to vector) function that relates rest material position and deformed material position

          xdeformed=ϕ(xrest)

          Deformation gradient F: F=xdeformedxrest (Translational invariant, same deformation gradients for ϕ1 and ϕ1+c)

          Deform / rest volume ratio: J=det(F) F[None] = [[2, 0], [0, 1]](横向拉伸了))

          Elasticity

          Hyperelastic material

          Hyperelastic material: Stress-strain relationship is defined by a strain energy density function ψ=ψ(F)

          ψ​ is a potential function that penalizes deformation; F represents the strain (means deformation gradient)

          Stress Tensor

          Relationship: τ=jσ=PFT; P=JσFT; Traction t=σTn (考虑的是截面的法向量,转置为 -T)​

          Elastic Moduli (Isotropic Materials)

          Lame parameters

          Conversions:(通常指定 Young’s Modulus & Poisson’s Ratio,或 EνK=E3(12ν) ; λ=Eν(1+ν)(12ν) ; μ=E2(1+ν)

          Hyperelastic Material Models

          Popular in graphics:

          Lagrangian Finite Elements on Linear Tetrahedral Meshes

          The Finite Element Method (FEM)

          Discretization sheme to build discrete equations using weak formulations of continuous PDEs

          Linear Tetrahedral (Triangular) FEM

          Assumption: The deformation map ϕ​ is affine and thereby the deformation gradient F​ is constant

          xdeformed=F(xrest)+b b - offset,平移产生的形不产生弹性,不影响 gradient)

          For every element e​, the elastic potential energy U(e)=eψ(F(x))dx=Veψ(Fe)​​ (对每个元素这么设定,计算其弹性势能)

          Computing Fe (for 2D triangles)

          Explicit Linear Triangular (FEM) Simulation

          Semi-implicit Euler time integration scheme:

          vt+1,i=vt,i+Δtft,imixt+1,i=xt,i+Δtvt+1,i

          xt,i​ and vt,i​​ are stored on the vertices of finite elements (triangles / tetrahedrons), Ve - constant volumes

          ft,i=Uxi=eU(e)xi=eVeψ(Fe)FeFexi=eVeP(Fe)Fexi

          Taichi’s AutoDiff system can use to compute p(Fe)

          Implicit Linear Triangular FEM Simulation

          Backward Euler Time Integration: [IβΔt2M1fx(xt)]vt+1=vt+ΔtM1f(xt)

          (M - diagonal matrix to record the mass)

          Compute force differentials: fx=2ψx2 (second order DE are not available in taichi)

          Higher Level of Taichi

          Objective Data-Oriented Programming

          OOP -> Extensibility / Maintainability

          An “object” contains its own data (py-var. / ti.field) and method (py-func. / @ti.func / @ti.kernel)

          OOP

          Python OOP in a Nutshell

          If want to add some features, this method can inherit all past features and add on them (easy to maintain and extent)

          Using Classes in Taichi

          Hybrid scheme: Objective Data-Oriented Programming (ODOP)

          -> More data-oriented -> usually use more vectors / matrices in the class

          Important Decorators:

          Caution: if the variable is passed from Python scope, the self.xxx will still regard as a constant

          Features:

          Demo: ti example odop_solar: a=GMr/||r||23

          Metaprogramming 元编程

          Allow to pass almost anything (including tensors) to Taichi kernels; Improve run-time performance by moving run-time costs to compile time; Achieve dimensionality independence (2D / 3D simulation codes simultaneously); etc. 很多计算可以在编译时间完成而非运行时间完成 (kernels are lazily instantiated)

          Metaprogramming -> Reusability:

          Programming technique to generate other programs as the program’s data

          Metaprogramming

          In Taichi the “Codes to write” section is actually ti.templates and ti.statics

          Template

          Allow to pass anything supported by Taichi (if directly pass something like a = [43, 3.14] (python list) -> error; need to modify as Taichi’s types a = ti.Vector([43, 3.14]))

          Taichi kernels are instantiated whenever seeing a new parameter (even same typed)

          frequently used of templates will cause higher time costs

          Pass-by-reference:

          The template initialization process could cause high overhead

          Dimensionlity-independent Programming
          Tensor-size Reflection

          Fetch tensor dimensionlity info as compile time constants:

          Static

          Compile-time Branching

          Using compile-time evaluation will allow certain computations to happen when kernel are being instantiated. Saves the overhead of the computations at runtime (C++17: if constexpr)

          Forced Loop-unrolling

          Use ti.static(range(...)) : Unroll the loops at compile time(强制循环展开)

          Reduce the loop overhead itself; loop over vector / matrix elements (in Taichi matrices must be compile-time constants)

          Variable Aliasing

          Creating handy aliases for global var. and func. w/ cumbersome names to improve readability

          Metadata

          Data generated data (usually used to check whether the shape / size is the same or not for copy)

          Differentiable Programming

          Forward programs evaluate f(x), differentiable programs evaluates f(x)x

          Taichi supports reverse-mode automatic differentiation (AutoDiff) that back-propagates gradients w.r.t. a scalar (loss) function f(x) (关于每个元素的导数)

          Gradient-based Optimization

          minxL(x)=12i=0n1(xiyi)2

          Results: Loss exp. decreases to near 0

          (also use for gradient descend method for fixing curves, FEM and MPM (with bg. grids to deal with self collision))

          Application 1: Forces from Potential Energy Gradients

          fi=ϕ(x)xi

          (Demo: ti example mpm_lagrangian_forces)

          Application 2: Differentiating a Whole Physical Process

          Use AutoDiff for the whole physical process derivative

          Not used for basic differentiation but optimization for initial conditions(初始状态的优化 / 反向模拟)/ controller

          Need to record all tensors in the whole timestep in ti.Tape() ~ Requires high memory (for 1D tensor needs to use 2D Tape)

          ~ Use checkpointing to reduce memory consumption

          Visualization

          Print

          Visualizing 2D Results

          Apply Taichi’s GUI system:

          Visualizing 3D Results

           

          Lecture 4 Eulerian View (Fluid Simulation)

           

          回答问题:介质流过的速度

          Overview

          Material Derivatives

          Connection of Lagrangian and Eulerian

          (Use D for material derivatives. t​ - Eulerian part, no spatial velocity, varies with time; u​ - material movement, change by velocity part in scalar field (u - material (fluid) velocity / advective / Lag. / particle derivative))

          DDt:=t+u

          For example:

          Temperature: DTDt=Tt+uTx-Velocity: DuxDt=uxt+uux

          (Incompressible) Navier-Stokes Equations

          ρDuDt=p+μ2u+ρgorDuDt=1ρp+ν2u+g;u=0

          Usually in graphics want low viscosity (delete the viscosity part) except for high viscous materials (e.g., honey)

          DuDt=1ρp+g;u=0

          Operator Splitting

          Split the equation (PDEs with time) into 3 parts: (α - any physical property (temperature / color / smoke density / ...))

          DuDt=0,DαDt=0(advection)ut=g(external forces, optional)ut=1ρps.t.u=0(projection)

          Time Discretization

          (for each time step using the splitting above)

          Grid

          Spatial Discretization

          Using Cell-centered Grids

          (evenly distributed)

          image-20210805202511587

          ux, uy and p​ are stored at the center of the cells (horizontal and vertical offset = 0.5 cell)

          Using Staggered grids

          Stored in various location (Red - ux; Green: uy; Blue - p)

          image-20210805202532004 Red: 3x4; Green: 4x3; Blue: 3x3

          Bilinear Interpolation

          Interpolate (x,y) with the 4 points by weighted average

          image-20210805233100492

          Advection

          Different Schemes: Trade-off between numerical viscosity, stability, performance and complexity

          Semi-Lagrangian Advection

          Scheme

          image-20210805233608015

          Velocity field constant, for very short time step Δt​, velocity assumed to be constant (p -> x)

          Problem

          velocity field not constant! -> Cause magnification / smaller / blur

          image-20210805234531975

          The real trajectory of material parcels can be complex (Red: a naive estimation of last position; Light gree: the true previous position.)

          Solutions

          (-> initial value problem for ODE, simply use the naive algorithm = forward Euler (RK1); to solve the problem can use RK2 scheme)

          ~ Blur: Usually because of the use of bilinear interpolation (numerical viscosity / diffusion) (causes energy reduction) -> BFECC

          BFECC / MacCormack

          Scheme

          BFECC: Back and Forth Error Compensation and Correction (Good reduction of blur especially for static region boundaries)

          Problem: Overshooting

          ~ Gibbs Phenomen at boundary because of this correction: 0.5 * (x[I] - new_x_aux[I])

          Idea: Introduce a clipping function

          image-20210806002649157 (Artifacts)

          Chorin-Style Projection

          To ensure the velocity field is divergence free after projection - Constant Volume (need linear solver - MGPCG)

          Expand using finite difference in time (u - the projected velocity field, p - The divergence degree of the gradient of the pressure)

          Divergence (div=​​): the total generation / sink of the fluid (+ for out, - for in). For actual (imcompressible) fluid flow: F=0​​ $\cur​​

          Curl (curl=×​): Clockwise - ×F<0​; Counter clockwise - ×F>0

          (REF: https://www.youtube.com/watch?v=rB83DpBJQsE)

          uu=Δtρp s.t. u=0u=uΔtρp s.t. u=0u=(uΔtρp)0=uΔtρpp=ρΔtu

          Want: find a p for the last formula

          Poisson’s Equation

          p=forΔp=f

          (Δ here represents 2=​, as the Laplace operator)

          If f=0​​, the equation becomes Laplace’s Equation (in this course generally not 0)

          Spatial Discretization (2D)

          Discretize on a 2D grid: (central differential, 5 points)

          (AP)i,j=(p)i,j=1Δx2(4pi,j+pi+1,j+pi1,j+pi,j1+pi,j+1)bi,j=(ρΔtu)i,j=ρΔtΔx(ui+1,jxui,jx+ui,j+1yui,jy)

          Linear System: Anm×nmpnm=bnm​​ (very high dimension (especially for 3D), but sparse)​

          u

          Divergency of velocity(速度的散度): Quantity of fluids flowing in / out: Flows in = Negative; Flows out = Positive

          image-20210806104706358

          Remind: Staggered Grid: x-componenets of u - vertical boundaries; y-comp: horizontal boundaries

          (u)i,j=(ui+1,jxui,jx+ui,j+1yui,jy)

          p

          image-20210806105359859

          (p)i,j=1Δx2(4pi,j+pi+1,j+pi1,j+pi,j1+pi,j+1)

          Boundary Conditions

          Dirichlet and Neumann boundaries

          Solving Large-scale Linear Systems

          Ax=b

          (Good numerical solver are usually composed of different solvers)

          Matrix Storage

          A: often sparse, symmetric and positive-defined (SPD) (Actually 2-D arrays)

          Store A:

          Krylov-subspace Solvers

          Conjugate gradients

          Basic Algorithm: (energy minimization)

          r0=bAx0 # r - Residual p0=r0 # Estimation (x should be add a ? p to approach the solution) k=0 while True:

          αk=rkrkpkApk

          xk+1=xk+αkpk rk+1=rkαkApk if rk+1 is sufficiently small, break

          βk=rk+1rk+1rkrk pk+1=rk+1+βkpk k=k+1 return xk+1

          Eigenvalues and Condition Numbers

          Convergence related to condition numbers

          Remind:

          A smaller condition number causes faster convergence

          Warm Starting

          Starting with an closer initial guess results in fewer interations needed

          Using p​ from the last frame as the initial guess of the current frame (Jacobi / GS / CG work well but not good for MGPCG)

          Preconditioning

          Find an approximate operator M that is close to A but easier to convert

          Ax=bM1Ax=M1b

          M1A may have a smaller condition number

          Common Preconditioners:

          Multigrid preconditioned conjugate gradients (MGPCG)

          Residual reduction very fast in several iterations

          A. McAdams, E. Sifakis, and J. Teran (2010). “A Parallel Multigrid Poisson Solver for Fluids Simulation on Large Grids.”. In: Symposium on Computer Animation, pp. 65–73.

          Taichi demo: 2D/3D multigrd: ti example mgpcg_advanced

           

          Lecture 5 Poisson’s Equation and Fast Method

           

          Poisson’s Equation and its Fundamental Solution

          Using the view of PDE

          2ϕ=ρϕ(x=0)=0ϕ(x)=Ωρ(y)4π||xy||2dyϕj=j=1Nmj4πRij

          For example: Gravitational Problem (势场满足密度的泊松方程)

          f(x)=ϕf(x)=j=1,jiNxi(ρjνj4π||xixj||2)fi=j=1,jiNρjνj4π||xixj||23

          If N particles in M points computaiton -> Required O(NM)

          Fast Summation

          快速多级展开

          M2M Transform

          (Electrons - Multipoles and Multipole - Multipole)

          For M2M Transform: Both z1 and z2 are far from Z; For M2L (Local pole ~ interpolation) Transform: z1 near Z

          M2L Transform

          If we know M-Expansion at c1 (M1 = {c1, Q, Qk}), what is the polynomial at z1, so that potentials at neighbor z can be evaluated.

          image-20210813233116830

          ϕ(Z)=Qlog(ZC)+k=1Pbk(ZC)k=Qlog(Z1C+ZZ1)+k=1Pbk(Z1C+ZZ1)k=Qlog(Z1C)+k=1Pbk(Z1C)kϕ(Z1)+l=1Pbl(ZZ1)lH.O.T.

          where H.O.T. is high order turbulence, from multipole to local pole

          bl=Ql(CZ1)l+1(CZ1)lk=1Pbk(Z1C)k(l+k1k1)

          L2L Transform

          If we know L-Expansion at c1 (L1 = {c1, B}), what is the polynomial at c2, so that potentials at neighbor z around c2 can be evaluated? Honer Scheme

          image-20210813234942106

          k=0Pak(ZZ0)kk=0Pbk(Z)k

          Comparison

          O(N) Alogarithm: lCPN4l=O(CPN)

           


          26:00

           

           

           

          Taichi Graphic Course S1 (NEW)

          Lecture 0-2 is in the above notes

           

          Lecture 3 Advanced Data Layout (21.10.12)

          Advanced Dense Data Layouts

          (ti.field() is dense and @ti.kernel is optimized for field => Data Oriented => focus on data-access / decouple the ds from computations)

          CPU -> wall of computations; GPU / Parallel Frame -> wall of data access (memory access)

          Packed Mode

          Optimized for Data-access

          Upgrade ti.field()

          From shape to ti.root

          change to:

          image-20211012192916022

          One step futher:

          image-20211012192924233

          A Taichi script uses dense equiv. to the above C/C++ codes

          Row & Column Majored Fields

          Col: image-20211012200823340 Row:image-20211012201031533

          Access: struct for (the access order will altered for row and col. -majored)

          Example: loop over ti.j first (row-majored)

          Special Case (dense after dense)

          Hierachical 1-D field (block storage)

          image-20211012193228785

          Block major

          e.g. for 9-point stencil

          image-20211019175844706

          image-20211012193352701

          Compare to flat layouts:

          Array of Structure (AoS) and Structure of Arrays (SoA)

          image-20211012193933727

          Switching: ti.root.dense(ti.i, 8).place(x,y) -> ti.root.dense(ti.i, 8).place(x) + ti.root.dense(ti.i, 8).place(y)

          Sparse Data Layouts

          SNode Tree (Extended)

          Dense SNode-Tree:

          image-20211012194553239

          Pointer

          But the space occupation rate could be low -> use pointer (when no data in the space a pointer points -> set to 0)

          image-20211012194725213

          Activation

          Once writing an inactive cell: x[0,0] = 1 (activate the whole block the pointer points to (cont. memory))

          If print => inactivate will not access and return 0

          Pointer in Pointer

          Actually ok but can be a waste of memory (pointer = 64 bit / 8 bytes). can also break cont. in space

          Bitmasked

          image-20211012195127835

          The access / call is similar to a dense block. Cost 1-bit-per-cell extra but can skip some …

          API

          Put Together

          Example

           

          Lecture 4 Sparse Matrix, Debugging and Code Optimization (21.10.19)

          Sparse Matrix and Sparse Linear Algebra

          Naive Taichi Implementation: hard to maintain

          Build a Sparse Matrix

          Sparse Matrix Operations

          Sparse Linear Solver

          Example: Linear Solver(鸡兔同笼)

          [1124][nChickensnRabbits]=[nHeadsnLegs]

          Given the amount of heads and legs to compute the amount of chicken and rabbit

          If the problem is solved intuitively as a linear system Ax=b, the time complexity will be at the level of O(n3) (even in sparse matrices)

          -> Use x = solver.solve(b)

          Example: Linear Solver (Diffusion)

          Tt=κ2T ;2T=xx2T+yy2T

          Finite Differential Method with FTCS Scheme

          Matrix Representation: (Explicit)

          Tn+1TnΔt=κΔx2DTnTn+1=(I+ΔtκΔx2D)Tn

          κ is the conductivity rate, higher κ for higher propagation of the heat. But too high κ can have wrong wrong results. (for example: the Torigin is 100 degree and κ=500, the next step T will be -300 => oscillation results, unstable)

          D=[2+1+1+13+1+1+12+1+13+1+1+1+14+1+1+1+13+1+12+1+1+13+1+1+12]

          The temperature transfer function can be easier to express:

          Matrix Representation: (Implicit)

          Tn+1TnΔt=κΔx2DTn+1Tn+1=(IΔtκΔx2D)1Tn

          New Features (0.8.3+)

          Sparse Matrix related features to ti.linalg

          Debugging a Taichi Project

          Taichi still not supports break point debugging

          Visualize Fields

          Print a Taichi field / numpy array will truncate your results

          This method is hard to visualize -> use GUI / GGUI (normal dir / vel. dir / physics / …)

          Debug Mode

          The debug mode is off by default

          Other Problems

          Optimizing Taichi Code

          Performance Profiling

          -> Amdahl’s Law (for higher run-time cost optimization will be more valueable)

          Profiler

          Demo

          Performance Tuning Tips

          BG: Thread hierarchy of Taichi in GPU

          image-20211022211251885

          The Block Local Storage (BLS)

          image-20211022211606701

          Block Size of an hierarchically defined field (SNode-tree):

          bls_size = 4 x 4 (x 4 Bytes) = 64 Bytes (Fast)

          image-20211022211820891

          Decide the size of the blocks

          ti.block_dim() before a parallel for-looop

          (default_block_dim = 256 Bytes)

          Cache Most Freq-Used Data in BLS Manually

          ti.block_local()

          when a data is very important (actually in our case some neighbor will be counted as well)

          bls_size = 5 x 6 (x 4 Bytes) = 120 Bytes (the overlapped part will be cached in different bls)

          image-20211022212537538

           

          Lecture 5 Procedural Animation (21.10.26)

          Procedure Animation in Taichi

          Steps

          1. Setup the canvas
          2. Put colors on the canvas
          3. Draw a basic unit
          4. Repeat the basic units (tiles / fractals)
          5. Animate the pictures
          6. Introduce some randomness (chaos)

          Demo: Basic Canvas Creation

          Colors

          Add color via a for-loop

          Basic Unit

          Draw a circle

          Helper Functions

          Demo: Basic Unit with Blur

          image-20211102155632535

          Repeat the Basic Units: Tiles

          image-20211102155844798

          Repeat the Basic Units: Fractals

          image-20211102160145427

          Animate the Picture

          image-20211102160717368

          Introduce randomness (chaos)

          The Balance: Perlin noise (huge randomness but continuous in smaller scales)

          -> shadertoy.com

           

          Lecture 6 Ray Tracing (21.11.2)

          Basis of Ray Tracing

          Rendering Types

          Assumptions of Light Rays

          Applying Ray Tracing (Color)

          In color finding, option 1-2 usually use rasterization than RT. The classical RT is option 3 and the modern RT is actually option 4 (path tracing)

          Option 1: The Color of the Object

          For 256x128:

          image-20211104110341752

          Flat-looking Results

          Cannot tell the materials using their colors

          image-20211104110631893

          What we see = color * brightness

          image-20211104110836230

          Option 2: Color + Shading

          Lambertian Reflectance Model

          image-20211104111217754

          Brightness = cosθ

          image-20211104111341624

          The larger cosθ, the higher energy/area

          Results with Lambertian

          Looks like 3D, but still lack of the specular surfaces

          image-20211104111638925

          Phong Reflectance Model

          image-20211104111858045

          Brightness = (VR)α=(cos(θ))α; α is the hardness (intensity)

          For higher α the visible angle will be smaller (Left: α=2; Right: α=5, more like a pulse)

          image-20211104112114526 image-20211104112126747

          Blinn-Phong Reflectance Model

          image-20211104112508191

          Brightness = (NH)α=(cos(θ))α, H=V+L||V+L||, α>α is the hardness in Blinn-Phong

          Blinn-Phong Shading Model

          image-20211104112601169

          Results with Blinn-Phong

          Shining but floating, no glassy like

          image-20211104112711501

          Option 3: The Whitted-Style Ray Tracer

          -> Shadow / Mirror / Dielectric

          The Whitted-Style

          image-20211104115105152

          Img

          Since we have partially tranparent objects (the one with a gray shadow)

          Refraction + Reflection (Recursive)

          image-20211104151508399

          image-20211104151822500

          -> not easy to program

          Problems

          Option 4: Path Tracer (Modern)

          From Previous Approaches

          Global Illumination (GI)

          With GI, diffuse surface will still scatter rays as well. Without GI, if not shined from the light source directly, it has fully dark shadow

          An “unified” model for different surfaces:

          Diffuse (Monte Carlo Method)SpecularDielectric
          image-20211104153150409image-20211104153218601image-20211104153236293
          Lo=1Nk=1NLi,k cos(θk)Lo=LiLo=1Nk=1NLi,k

          Monte Carlo Methdo: rely on repeated random sampling to obtain numerical results

          All the collected weighted average color show the final color of a point

          Problem in Sampling

          Problems in Stop Criterion

          The current stop criterion

          -> Problem: Infinity loops

          Core Ideas Summary

          Further Readings

           

          Lecture 7 Ray Tracing 2 (21.11.9)

          Recap

          Ray-casting from Camera/Eye

          Ray

          Ray is a line def by its origin (O) and dir (d) (or a point it passed by)

          P=O+td ;d=PO||PO||

          image-20211110220057903

          Camera/Eye and Monitor

          Ray-casting

          image-20211110224200599

          A pixel has its size as well -> + 0.5 pixels

          Ray-object Intersection

          Sphere

          Sphere Intersection

          Cornell Box

          Actually formed by 4 huge spheres other than using planes (more convenient)

          image-20211111003937029

          Plane

          Ray-plane Intersection

          Ray-object Intersection

          Sampling

          Want to sample the directions of rays uniformly

          Coordinates

          image-20211111101215134

          Sampling the Hemisphere Uniformly

          1. Attempt one: ϕ=rand(0,2π), θ=rand(0,π), r=1 => Not uniform after mapping to the polar coord

            image-20211111101937097 image-20211111101951187

            -> Uniform?: The probability of sampling is proportional to the surface area (differential surface element: dA=r2sinθ dθdϕ)

            -> probability density function (p.d.f.) f(θ)=02πf(ϕ,θ) dϕ=sin(θ)2

          2. The corrected attempt: ϕ=rand(0,2π), θ=arccos(rand(1,1)), r=1 (inverse transform sampling) => Much more uniform

            image-20211111104749948 image-20211111104802197

          3. Negate the direction if against the normal

            Sampling a Sphere

          Reflection v.s. Refraction

          Total Reflection

          Happens when n1>n2 (e.g., from glass to air)

          Snell’s law may fail to give sin(θt)=n1n2sin(θi)>1 => fail to solve arcsin(θt) (only reflection)

          image-20211111112839174

          Reflection Coefficient R

          At a steep angle => Reflection Coefficient R (how much been reflected, material (n1 & n2) and view point (θ) dep)

          Refraction Coefficient T=1R

          Fresnel’s Equation

          Schlick’s Approximation

          Material and angle

          R(θi)=R0+(1R0)(1cos(θi))5 ;R0=(n1n2n1+n2)2

          Path Tracing with R

          image-20211111133711072

          Recursion in Taichi

          Call functions: temp var. -> storage in stack => higher pressure

          Better solution -> using loops for tail-resursion

          Optimization: Remain the fronter part of breaking loops; modify the tail-recursion part:

          Anti-Aliasing

          Zig-zag artifacts -> softening the edges

          In this case we can use 4 times of random sampling -> anti-aliasing

           

          Lecture 8 Deformable Simulation 01: Spatial and Temporal Discretization (21.11.16)

          Laws of Physics

          Equation of Motion

          Integration in Time

          Equation of Motion

          Use h as the step size

          Explicit (forward) Euler Integration

          image-20211118154017210

          Extremely fast, but increase the system enery gradually (leads to explode) -> seldom used => Symplectic Euler Integration

          Symplectic Euler Integration

          Also very fast. Update the velocity first -> momentum preserving, oscillating system Hamiltonian (could be unstable) -> Widely used in accuracy centric applications (astronomy simulation / molecular dynamics /…)

          Implicit (backward) Euler Integration

          Often expensive. Energy declines, damping the Hamitonian from the osscillating components. Often stable for large timesteps -> Widely used in performance-centric applications (game / MR / design / animation)

          Integration in Space

          Mass-Spring System

          Deformation

          image-20211118164920620

          Demo

          Compute force

          Time integration

          Applications

          Cloth sim / Hair sim

          Not the best choice when sim continuum area/volume: Area/volume gets inverted without any penalty => Linear FEM

          Constitutive Models

          Deformation Map

          A continuous model to describe deformation: x=ϕ(X) at every point (Deformation map)

          image-20211118171109581

          Generally: For X near X: (with 1st order Taylor)

          ϕ(X)ϕX(XX)+ϕ(X)=ϕXFX+(ϕ(X)ϕXX)tFX+t

          image-20211118171844688

          (F - Deformation gradient, F=[x1/X1x1/X2x2/X1x2/X1])

          After approx, close to a translation in very small region: F=ϕX=I in translation; F=R in rotation; F=S in scaling

          => A non-rigid deformation gradient shall end up with a non-zero deformation energy

          Energy Density

          Ψ(x)=Ψ(ϕ(X))Ψ(FX+t)Ψ(FX)Ψ(F)

          => Deformation gradient is NOT the best quantity to describe deformation

          Strain Tensor

          Strain tensor: ϵ(F)

          Samples in different constitutive models:

          From energy density function to energy: E(x)=ΩΨ(F) dX => Spatial Discretization

          Linear Finite Element Method (FEM)

          image-20211118175112445

          E=(The whole area)=(A piece of triangle)

          Linear FEM Energy

          Find deformation gradient: Original pose (Uppercase) => Current pose xi=FXi+t (share the same F and T)

          image-20211118191850161

          [x1x4x2x4x3x4]Ds=F[X1X4X2X4X3X4]Dm ;F=DsDm1

          Find the gradient of Ψ(F(x))

          Chain rule: (Note: B:A=A:B=i,jAijBij=tr(ATB))

          Ψx=Fx:ΨF

          1st Piola-Kirchhoff stress tensor for hyperelastic matrial: P=ΨF

          Some 1st Piola-Kirchhoff stress tensors:

          Linear FEM

          For Ψxj(k)=Fxj(k):P

          -> Taichi: Autodiff

          Demo

          Revisit Mass-spring System

           

          Lecture 9 Deformable Simulation 02: The Implicit Integration Methods (21.11.23)

          The Implicit Euler Integration

          Time Step

          Sub-(time)-stepping: nsub: hsim=hdispnsub

          image-20211124213141086

          The smaller nsub, the larger hsim (less accurate)

          Numerical Recipes for Implicit Integrations

          The Baraff and Witkin Style Solution

          One iter of Newton’s Method, referred as semi-implicit

          Reformulating the Implicit Euler Problem

          To reduce the red part: add another step:

          Minimization / Non-linear Root-finding:

          Convex Minimization of ming(x)

          The general descent method:

          image-20211125172532811

          Problem

          But most deformable bodies have non-convex energies: unsteady equilibrium.

          image-20211125213834479

          Steps

          g(x) is contains 2 terms: dynamics and elastic

          g(x)=12x(xn+hvn)M2+h2E(x)
          1. Init guess: x=xn or x=xn+hvn

          2. While loop: while not converge:

            • Descent dir: dx=xg(x) or dx=(x2g(x))1xg(x)

              • Gradient: xg(x)=M(x(xn+hvn))+h2xE(x) (linear + non-linear) (Model dep)
              • Hessian (Matrix): x2g(x=M+h2x2E(x) (Model dep)
            • Line search: det the stepsize t

            • Update: x=x+tdx

          Definiteness-fixed Newton’s Method

          For general cases:

          in Step 2 (while), After compute gradient dir and Hessian, add another substep:

          Definiteness-fix

          H~=fix(H)

          After this definiteness-fix: (Newton’s)

          image-20211126011246122

          Linear Solvers

          Linear Solvers for Ax=b

          Factorization

          For A=LLT (Assume already p.s.d. proceeded)

          Current APIs:

          Conjugate Gradient (CG) Method

          Properties:

          Demo: (Python scope - control flow)

          Accelerate the CG Method

           

          Lecture 10 Fluid Simulation 01: The Particle-based (Lagrangian) Methods (21.11.30)

          Lagrangian Method = Particle-Based

          Incompressible Fluid Dynamics

          usually compressible for explosion / shock wave / … ; incompressible for slower dynamics

          Forces for Incompressible Fluids

          ma=f=fext+fpres+fvisc

          image-20211202093424163

          Incompressible Navier–Stokes Equations

          The Navier-Stokes Equation (p=k(ρ0ρ))

          {ρDvDtma/V=ρgmg/Vpadv+μ2vdiff(Momentum Equation) v=0DρDt=ρ( v)=0 (Divergence free -- Mass conserving)

          The Spatial Derivative Operators

          Time Discretization

          Operator Splitting

          General Steps

          Separate the N-S Eqn into 2 parts and use half the timestep to solve each part (Advection-Projection)

          Time Integration

          Given xn & vn

          Integration with the Weakly Compressible (WC) Assumption

          Storing the density ρ as an individual var that advect with the vel field

          DvDt=g1ρp+v2v ; v=0 (weak compressible)

          image-20211202105945198

          Time Integration

          Symplectic Euler Integration: In this case step 1 and 2 can be combined since they are both explicit (no order diff)

          Spatial Discretization (Lagrangian View)

          • Previous knowledge using Lag. View: Mesh-based simulation (FEM)
          • Today: Mesh-free simulation => a simular example: marine balls

          Basic Idea

          Cont. view -> Discrete view (using particles):

          dvidt=g1ρp(xi)+ν2v(xi)ai , where ν=μρ0

          Time integration (Symplectic Euler):

          vi=vi+Δt ai & xi=xi+Δt vi

          But still need to evaluate ρ(xi), p(xi) & 2v(xi)

          Smoothed Particle Hydrodynamics (SPH)

          Dirac Delta

          Trivial Indentity

          The Dirac function only tends to infinity at 0 but equals to 0 otherwise. And its overall integral is 1.

          f(r)=f(r)δ(rr) drδ(r)={+, if r=00, otherwise  and δ(r) dr=1
          Widen the Dirac Delta
          f(r)f(r)W(rr,h)dr, where limh0W(r,h)=δ(r)

          image-20211202114735115

          W(r,h) - kernel funciton:

          e.g. W(r,h)={12h, if |r|<h0,otherwise (Error: O(h2), can be reduced by decreasing the sampling radius h)

          image-20211202114217762

          Finite Probes: Summation
          f(r)f(r)W(rr,h) drjVjf(rj)W(rrj,h)

          image-20211202114202834

          Smoother Kernel Function

          Still use the summation: f(r)jVjf(rj)W(rrj,h). But “trust” the closer probes (the orange & yellow ones) better => Smoother W function

          image-20211202115507144

          Use a C3 cont. spline: e.g. (The figure below shows the 1st / 2nd / 3rd order der)

          W(r,h)=σd{6(q3q2)+1 for 0q122(1q)3 for 12q10 otherwise  ;with q=1hr,σ1=43h,σ2=407πh2,σ3=8πh3

          image-20211202115720769

          Smoothed Particle Hydrodynamics (SPH)

          Discretization

          1D:

          f(r)jVjf(rj)W(rrj,h)=jmjρjf(rj)W(rrj,h)

          2D:

          image-20211202120133759

          Par stored in every particles

          Evaluate 2D fields using the SP
          f(r1)m2ρ2f(r2)W(r1r2,h)+m3ρ3f(r3)W(r1r3,h)+m4ρ4f(r4)W(r1r4,h)

          image-20211202124713890

          SPH Spatial Derivatives

          The operators will affect only on the kernel func (W)

          Improving Approximations for Spatial Derivatives

          image-20211202131525157

          Anti-symmetric Form

          Better for projection / divergence / …

          Symmetric Form

          Better for forces

          Implementation Details (WCSPH)

          Simulation Pipeline

          Boundary Conditions

          image-20211202153846215

          Problems of Boundaries

          Navie search: O(n2) time

          => Use background grid: Common grid size = h (same support radius in SPH)

          (Each neighbor search takes 9 grids in 2D and 27 grids in 3D)

          image-20211202154832585

           

          Lecture 11 Fluid Simulation 02: The Grid-based (Eulerian) Methods (21.12.6)

          N-S Equations and Their Time Integration

          Operator Splitting

          A toy example:

          Integrate dqdt=1+2 (for some quantity q ) => Theoretical sol: qn+1=qn+3Δt

          Operator Splitting: q~=qn+1Δt ; qn+1=q~+2Δt

          Operation Splitting for N-S Eqn

          (q can be velocity / density / temperature / …)

          DvDt=g1ρp+ν2v ; v=0{Advection:DqDt=0Add Forces:vt=g+ν2vProjection:vt=1ρp s.t.  v=0

          One Numerical Time-Stepping for N-S Eqn

          Given qn:

          Step 1: Advection: (Solve intuitive formula) => in Lag (trivial), in Euler (not trivial)

          qn+1=advect(vn,Δt,qn)

          v~=advect(vn,Δt,vn)

          Step 2: Applying forces: (The term of viscosity could be neglected when simulate gas; gravity can be neglected when the gas has similar density as air)

          v~~=v~+Δt(g+ν2v~)

          Step 3: Projection: (Solve for pressure and ensure divergence free)

          vn+1=project(Δt,v~~)

          Return vn+1, qn+1

          From the Lagrangian View to the Eulerian View

          In Eulerian grids: Need to record: grid index / velocity / density / temperature / grid size / …

          Spatial Derivatives Using Finite Difference

          Spatial derivative: The dimensions can be decoupled when computing the spatial derivatives due to the structural grid

          qi,j,k=[qi,j,k/xqi,j,k/yqi,j,k/z]

          To compute q/x

          image-20211209002832506

          MAC (Marker-and-Cell) grid

          Staggered Grid

          image-20211209003840801

          {vi,j=[vxi1/2,j+vxi+1/2,j2,vyi,j1/2+vyi,j+1/22]vi+1/2,j=[vxi+1/2,j,vyi,j1/2+vyi,j+1/2+vyi+1,j1/2+vyi+1,j+1/24]vi,j+1/2=[vyi1/2,j+vyi+1/2,j+vyi1/2,j+1+vyi+1/2,j+14,vyi,j+1/2]

          For a row x col grid (3x3) storage:

          Staggered Grid in Stokes Theorem

          Exterior calculus

          image-20211209004509483

          Advection

          Material Derivative

          DfDt=ft+v f ;where f=f(x,t)Total derivative w.r.t. time:  ddtf(x,t)=fx+dxdt:fx=ft+v f

          Material Derivative of Vectors

          If q is a vector q=[qx,qy,qz]T

          DqDt=qt+vq=qt+v:[[qxxqyxqzx][qxyqyyqzy][qxzqyzqzz]]=qt+vx[qxxqyxqzx]+vy[qxyqyyqzy]+vz[qxzqyzqzz]=qt+[vxqxx+vyqxy+vzqxzvxqyx+vyqyy+vzqyzvxqzx+vyqzy+vzqzz]=[qxt+vqxqyt+vqyqzt+vqz]

          For velocity (self-advection)

          DvDt=qt+vv=[qxt+vvxqyt+vvyqzt+vvz]

          Quantity Advection

          In Eulerian view, quantities flow with the velocity field for DqDt=qt+vq=0

          image-20211209105154349

          Attempt 1: Finite Difference

          qin+1qinΔt+vnqi+1nqi1n2Δx=0qin+1=qinΔtvnqi+1nqi1n2Δx

          image-20211209105638323

          1-D Advection: Unconditionally Unstable w/. FTCS

          Attempt 2: Semi-Lagrangian

          qn+1=qn ? : qn+1(xn+1)=qn(xn)=qn(xn+1Δtvn)

          image-20211209105843129

          To find value of qn,xn+1Δtvn: -> Interpolation: qn+1(xn+1)=interpolate(qn,xn+1Δtvn)

          Usually use Bilinear interpolation in 2D:

          q=lerp(a,b,c,d)=lerp(lerp(a,b),lerp(c,d))=Da+Cb+Bc+ADA+B+C+D

          1-D Advection: Unconditionally Stable (The peak will stably move forward) => Required vnΔt<Δx

          image-20211209110616182

          Assuming: vnΔt<Δx: (equiv to a velocity-aware one-sided finite difference form)

          qin+1=ΔtvnΔxqi1n+(1ΔtvnΔx)qin=qinΔtvnqinqi1nΔxqin+1qinΔt+vnqinqi1n2Δx=0

          Problems:

          Projection

          Poisson’s Equation

          Possion’s Problem

          Want to solve vt=1ρp s.t.  v=0

          Use finite difference again:

          vxi1/2,jn+1vxi1/2,jnΔt=1ρpi,jpi1,jΔxs.t. vxi+1/2,jn+1vxi1/2,jn+1Δxvxx+vi,j+1/2n+1vyi,j1/2n+1Δxvxx=0

          image-20211209115951070

          The condition will not be divergence free => becomes Δtρ p= vn

          Δtρ p= vnor Δtρ4pi,jpi+1,jpi1,jpi,j+1pi,j1Δx2=vi+1/2,jnvxi1/2,jn+vyi,j1/2nvyi,j1/2nΔx

          Another way to achieve Possion’s problem:

          Pressure Solve

          Boundary Conditions

          image-20211209152406791

          Boundary Conditions in Possion’s Problem

          To solve Ap=d (linear solvers (see Lec. 9))

          image-20211209153331497