[캡스톤 디자인] Parallel Attention-based LSTM

Notice

Recent Posts

Recent Comments

Link

« 2025/08 »
일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Tags more

Archives

Today

Total

관리 메뉴

SYDev

[캡스톤 디자인] Parallel Attention-based LSTM 본문

4학년 1학기 전공/캡스톤디자인

[캡스톤 디자인] Parallel Attention-based LSTM

시데브 2025. 4. 10. 14:23

PA-LSTM

https://www.sciencedirect.com/science/article/pii/S0263224121009969

PEMS(Portable Emission Measurement Systme): 이동식 배출가스 측정 장비로, 운행 중 발생되는 배출가스 측정을 위하 사용하는 배기가스 인증규제 시험절차의 제도적 보완장치
장기 측정 시에 drift 및 비용 문제로 활용에 제한이 있음
vehicle emissions는 일반적인 시계열 데이터 -> time-series prediction model 적용 가능
이런 PEMS의 한계를 보완하고자, 배출가스의 사전 예측 및 제어 전략 반영이 가능한 예측 모델로 Parallel Attention-based LSTM 제안

모델 선정 과정

traditional prediction model
- ARIMA, SVM, Gaussian Process 등 NARX 기반 모델들은 주로 linear 혹은 pre-defined nonlinear 관계를 가정
- 따라서, 복잡하고 시변성이 강한 Emission Data에는 비적합
Deep Learning 기반 Model
- RNN, LSTM, GRU -> nonlinear, long-term dependency problem 효과적으로 처리
- however, long sequence를 다루면서 정보 손실 및 성능 저하 발생

LSTM은 long-term dependency를 해결하기 위해 고안된 모델인데, long sequence를 다루면서 성능 저하가 발생?

LSTM이 해결한 문제: Vanishing Gradient(기울기 소실)
- 초기 입력의 정보가 뒤로 갈수록 희미해지거나 소실되는 문제
- gate structure를 도입하여, 정보를 기억하거나 잊는 메커니즘 적용하여 문제 해결

Long sequence를 다루면서 발생하는 문제
- hidden state에 장기적인 기억 정보를 축적
- 결론적으로 하나의 hidden state에 모든 과거 정보를 축적
-> sequence의 길이가 길어질수록 과거 time-step의 정보가 희석
-> 모든 time step의 정보가 동등하게 취급 or 잊히는 정보가 많아짐
- 중요한 시점과 중요하지 않은 시점 구별 X -> 성능 저하

Bahdanau가 제안한 Attention 기반 모델
- input sequence의 각 시점에 가중치를 부여하여 중요한 정보를 강조하는 역할
- however, 이 방식은 주로 time steps에만 집중하여 각 피처별 중요도는 고려 X
- ex) 날씨 예측을 진행할 때, feature가 습도, 온도, 바람 세기, 구름의 양, .. 이 존재하는데, time step의 중요도에만 집중하여 하나의 time step에 attention을 부여할 때, 모든 features에 대해서 같은 중요도를 부여 -> 별로 중요하지 않은 feature에도 같은 중요도가 부여됨
DA-RNN
- 두 단계의 attention을 적용하여 Exogenous inputs와 time step의 중요한 시점을 모두 학습
  - input attention: 어떤 feature가 중요한지 학습
  - temporal attention: 어떤 time step이 중요한지 학습
- however, 서로 다른 sensor로부터 온 data에 같은 기준으로 attention을 적용 -> 각 sensor의 특성 온전히 반영 X

PA-LSTM
- 서로 다른 input data group의 feature를 따로 학습하기 위해 병렬 학습 진행
- 학습 속도 향상을 위한 연산 병렬 처리와는 다른 개념..

Attention-based LSTM의 병렬 학습

기존 LSTM이 병렬 학습이 어려웠던 이유
- LSTM은 이전 time-step의 hidden sate를 다음 time-step이 사용하는 구조 -> sequence 전체를 순차적으로 계산해야 하므로, 병렬화에 불리
Attention은 input sequence의 모든 위치를 동시에 계산 가능 -> 연산 병렬화에 유리

https://github.com/ningshixian/LSTM_Attention/tree/master

GitHub - ningshixian/LSTM_Attention: attention-based LSTM/Dense implemented by Keras

attention-based LSTM/Dense implemented by Keras. Contribute to ningshixian/LSTM_Attention development by creating an account on GitHub.

github.com

X = Input Sequence of length n.
H = LSTM(X); Note that here the LSTM has return_sequences = True,
    so H is a sequence of vectors of length n.
s is the hidden state of the LSTM (h and c)

h is a weighted sum over H: 加权和
h = sigma(j = 0 to n-1)  alpha(j) * H(j)

weight alpha[i, j] for each hj is computed as follows:
H = [h1,h2,...,hn]
M = tanh(H)
alhpa = softmax(w.transpose * M)

h# = tanh(h)
y = softmax(W * h# + b)

J(theta) = negative_log_likelihood + regularity

Attention-based LSTM 병렬 학습(Data Parallelism vs Model parallelism)

- Model Paralleism

여러 GPU에 Model parameters를 나누어 연산하는 방법
하나의 GPU 메모리에 Model이 모두 들어가지 않는 경우 사용

- Data Parallelism

모델을 여러 개의 GPU에 복제, 각 GPU에 학습 데이터를 분할하여 학습하는 방법
학습 데이터의 크기가 크고, 모델의 크기가 단일 GPU에 들어갈 때 사용

LSTM code reference

https://github.com/ningshixian/LSTM_Attention/tree/master

GitHub - ningshixian/LSTM_Attention: attention-based LSTM/Dense implemented by Keras

attention-based LSTM/Dense implemented by Keras. Contribute to ningshixian/LSTM_Attention development by creating an account on GitHub.

github.com

-> 병렬 학습 적용 예정

참고자료

Hao Xie, Yujun Zhang, Ying He, Kun You, Boqiang Fan, Dongqi Yu, Boen Lei, Wangchun Zhang,
Parallel attention-based LSTM for building a prediction model of vehicle emissions using PEMS and OBD,
Measurement, Volume 185, 2021
Neural Machine Translation by Jointly Learning to Align and Translate, Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio
Qin, Y., Song, D., et al., 2017, “A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction”
https://kjwony.tistory.com/14

LLM 파인튜닝을 위한 GPU 분산 학습 정복하기 PART 1

최근 대규모 언어 모델(LLM)의 등장으로 모델의 크기와 데이터의 양이 급증하면서, 제한된 하드웨어에 맞추거나 학습 시간을 단축하기 위해 다양한 분산 학습 기법들이 급속히 발전하고 있습니

kjwony.tistory.com

https://lifeisenjoyable.tistory.com/21

딥러닝 모델의 분산학습이란? (Data parallelism과 Model parallelism)

지난 글(자연어처리 모델 학습을 위한 하드웨어 구성은? - NVIDIA Grace)에서 NLP 모델의 학습시 큰 모델 크기와 대규모 학습 데이터 때문에 여러 GPU에 나누어 연산하는 분산학습이 이뤄진다고 간단

lifeisenjoyable.tistory.com

728x90

'4학년 1학기 전공 > 캡스톤디자인' 카테고리의 다른 글

[캡스톤 디자인] LSTM(with Example을 통한 동작 과정 이해) (0)	2025.03.29
[캡스톤디자인] 시계열 예측 모델(LSTM, ARIMA, ..) 기초 (0)	2025.03.23

'4학년 1학기 전공/캡스톤디자인' Related Articles

SYDev

[캡스톤 디자인] Parallel Attention-based LSTM 본문

[캡스톤 디자인] Parallel Attention-based LSTM

PA-LSTM

모델 선정 과정

Attention-based LSTM의 병렬 학습

Attention-based LSTM 병렬 학습(Data Parallelism vs Model parallelism)

- Model Paralleism

- Data Parallelism

LSTM code reference

'4학년 1학기 전공 > 캡스톤디자인' 카테고리의 다른 글

티스토리툴바