Time Series Forecasting

Materials INFO

Semester: Fall, 2025
본 페이지는 논문의 링크만 제공합니다.

[1주차]
- MLP-Mixer: An all-MLP Architecture for Vision (2021) (link)
  - MLP-Mixer 제안
  - 이미지 분류에서 self-attention 없이도 SOTA 달성
  - 기본적은 Transformer 는 복잡도가 $O(n^2)$, 반면 MLP 는 $O(n)$
- MLP4Rec: A Pure MLP Architecture for Sequential Recommendation (2022) (link)
  - MLP-Mixer 구조를 순차 추천(sequential recommendation)에 적용
- SMLP4Rec: An Efficient All-MLP Architecture for Sequential Recommendations (2024) (link)
  - MLP4Rec의 효율성과 정확도를 모두 향상
- Self-Attentive Sequential Recommendation (2018) (link)
  - Transformer 기반 순차 추천의 기준점
- Deep Learning Recommendation Model for Personalization and Recommendation Systems (2019) (link)
  - Facebook AI
  - 대규모 추천모형의 구현
[2주차]
- Prediction with Time-Series Mixer for the S&P500 Index (2024) (link)
  - TS-Mixer exhibits competitive performance regarding S&P500 Index prediction
  - 일반적은 MLP-Mixer 는 자기회귀 예측이 잘 안되고 장기시점의 의존성이 부족하다고 알려져있음
- FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting (2022) (link)
  - Frequency domain에서 long-term dependency를 잡기 위한 Transformer. Mixer의 한계를 보완
- MP3Net:Multi-scale Patch Parallel Prediction Networks for Multivariate Time Series Forecasting (2024) (link)
  - modeling long-term dependencies in time series analysis
  - multi-scale patch module to extract local features and long-term correlations from the time series
- TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis (2023) (link)
  - multi-scale prediction
- Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting (2020) (link)
  - 효과적인 transformer 기반 시계열 예측모형 개발
[3주차]
- FourierFormer: Transformer Meets Generalized Fourier Integral Theorem (2022) (link)
  - 기존 transformer 가 암묵적으로 쿼리들이 가우시안 혼합 분포를 따른다고 가정한다고 해석
  - 이러한 아이디어를 확장하여 가정이 완화된 상태에서 작동하는 Transformer 를 제안
- Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting (2021) (link)
  - Long-term dependency, 시계열의 점진적 분해
- Perceiver: General Perception with Iterative Attention (2021) (link)
  - The model leverages an asymmetric attention mechanism to iteratively distill inputs into a tight latent bottleneck, allowing it to scale to handle very large inputs.
- Perceiver IO: A General Architecture for Structured Inputs & Outputs (2021) (link)
  - 입력과 출력의 크기에 선형적으로 확장되는 범용 아키텍처인 Perceiver IO를 제안
- General-purpose, long-context autoregressive modeling with Perceiver AR (2022) (link)
  - 효과적인 transformer 기반 시계열 예측모형 개발
[4주차]
- Efficiently Modeling Long Sequences with Structured State Spaces (2021) (link)
  - 상태모형의 구조화 + 시계열 모형 (S4)
- Mamba: Linear-Time Sequence Modeling with Selective State Spaces (2024) (link)
  - gating과 state-space 융합하여 선택적 기억
  - 수천 길이의 시계열에서 좋은 성능을 보임
- RWKV: Reinventing RNNs for the Transformer Era (2023) (link)
  - RNN과 Transformer 의 하이브리드, 대형 모델 사용가능
- Rethinking Attention with Performers (2021) (link)
  - 소프트맥스 어텐션 커널을 근사
  - 소프트맥스 외에도 다양한 커널화 가능한 어텐션 메커니즘들을 효율적으로 모델링
- Linformer: Self-Attention with Linear Complexity (2020) (link)
  - low-rank attention으로 메모리 절약
[5주차]
- Pay Attention to MLPs (2021) (link)
  - vision transformer에서는 self attention 이 필수적이지 않다고 주장
  - gMLP가 동일한 정확도를 달성할 수 있음을 알 수 있습니다
- FFNet: Frequency Fusion Network for Semantic Scene Completion (2022) (link)
  - 현재 많은 방법들이 물체의 기하학적 및 의미론적 정보를 포착하기 위해 RGB-D 이미지를 활용
  - RGB-D 데이터를 더 잘 활용하여 의미론적 장면 완성을 향상시키는 주파수 융합 네트워크(Frequency Fusion Network, FFNet)라는 새로운 방법을 제안
- 심층 평형 모델(Deep Equilibrium Model, DEQ) (2019) (link)
  - 기존의 많은 deep sequence models의 은닉층이 특정 fixed point로 수렴한다는 관찰에서 착안하여, root-finding을 통해 이 평형점을 직접 찾는 DEQ 접근법을 제안
  - 네트워크의 실질적인 "깊이"와 관계없이, 네트워크의 훈련 및 예측에 상수 메모리(constant memory)만 필요하다고 주장
- Kernel Neural Operators (KNOs) for Scalable, Memory-efficient, Geometrically-flexible Operator Learning (2024) (link)
  - KNO는 적분 연산자 내에 매개변수화된, 닫힌 형태의, 유한하게 매끄럽고, 콤팩트하게 지지되는(compactly-supported) 커널과 학습 가능한 희소성(sparsity) 매개변수를 사용합니다. 이를 통해 기존 신경망 연산자들에 비해 학습해야 할 매개변수 수를 획기적으로 줄였음
  - KNO는 저메모리, 기하학적 유연성, 딥 연산자 학습이라는 새로운 패러다임을 제시한다고 주장
- Token Shift Transformer for Video Classification (2021) (link)
  - 트랜스포머의 인코더는 pair-wise self-attention으로 복잡한 3차원 비디오 신호에 적용될 때 막대한 계산 부담
  - 트랜스포머 인코더 내에서 시간적 관계를 모델링하기 위한 새로운, 매개변수 제로 FLOPs 연산자인 Token Shift Module, TokShift 제시
[6주차]
- Frequency-domain MLPs are More Effective Learners in Time Series Forecasting (2023) (link)
  - Applies MLPs in the frequency domain to capture global dependencies and compact key frequency components for forecasting
- Koopa: Learning Non-stationary Time Series Dynamics with Koopman Predictors (2023) (link)
  - Uses Koopman embedding and context-aware operators to disentangle and predict time-variant/invariant dynamics in real time
- Deep State Space Models for Time Series Forecasting (2018) (link)
  - 모든 토큰-토큰 연산 구조를 하나의 틀로 설명
  - Attention의 우수한 성능을 token-mixing 으로 설명함
  - 트랜스포머에서 어텐션 모듈을 단순한 공간 풀링(spatial pooling) 연산으로 대체한 PoolFormer라는 모델을 제안했습니다. 이 연산은 극도로 단순한 토큰 믹싱만 수행
- Spectral Temporal Graph Neural Network for Multivariate Time-series Forecasting (2020) (link)
  - Fuses Graph Fourier Transform and Discrete Fourier Transform in a single GNN to jointly learn inter-series correlations and temporal dependencies
- CROSSFORMER: TRANSFORMER UTILIZING CROSS- DIMENSION DEPENDENCY FOR MULTIVARIATE TIME SERIES FORECASTING (2023) (link)
  - CROSSFORMER introduces Dimension-Segment-Wise embedding and two-stage (cross-time & cross-dimension) attention for hierarchical multivariate forecasting
- iTransformer: Inverted Transformers Are Effective for Time Series Forecasting (2024) (link)
  - iTransformer inverts tokenization to embed each variate separately, then applies vanilla Transformer to learn multivariate correlations and support arbitrary lookback windows
[7주차]
- Token Pooling in Vision Transformers (2021) (link)
  - 중간에 토큰을 pooling 해 계산량 절감
- Efficient Time Series Processing for Transformers and State-Space Models through Token Merging (2024) (link)
  - token merging 을 시계열에도 적용
- MetaFormer Is Actually What You Need for Vision (2022) (link)
  - 모든 토큰-토큰 연산 구조를 하나의 틀로 설명
  - Attention의 우수한 성능을 token-mixing 으로 설명함
  - 트랜스포머에서 어텐션 모듈을 단순한 공간 풀링(spatial pooling) 연산으로 대체한 PoolFormer라는 모델을 제안했습니다. 이 연산은 극도로 단순한 토큰 믹싱만 수행
- Conditional Neural Processes (2018) (link)
  - GPs와 같은 확률적 프로세스의 유연성을 유지하면서도, 신경망 구조를 갖추고 경사 하강법으로 학습
- Linformer: Self-Attention with Linear Complexity (2020) (link)
  - low-rank attention으로 메모리 절약
- Fusing Large Language Models with Temporal Transformers for Time Series Forecasting (2025) (link)
  - LLM (GPT2 Lora, 수치형 표현)을 이용해 시계열 패턴학습 표현벡터 학습
  - Transformer + ResNet, 입력값에 LLM 표현벡터와 시계열입력값을 같이 사용함
  - (추가연구) 수치형 토큰의 변환방법 조사,

Time Series Forecasing

Materials INFO

CONTENTS