Task queue[二] Celery

postedOn: 2024-1-27 updatedOn: 2024-1-27 includedIn: 程式

wordsCount: 684 readingTime: 2 mins viewers:

緣由

在工作上，遇到單機已經過多負荷，所以開始做實作分散式爬蟲

Celery定義

分散式任務隊列處理

架構

任務發啟=>Broker=>Worker處理任務=>Result Backend(有需要再回傳)=>任務結果

任務傳送

Broker

可選RabbitMq,Redis,…..

必需決定的就是Broker的選用，建議學習RabbitMq，開發快速選Redis，但作者是採用RabbitMq的術語來當作參考，如有需要進階設定，還是要了解

Result Backend

可選關聯式DB,Redis,rpc(回傳但不存)

無需結果回傳，就不需選擇

任務發起

Beat

定期任務發起的process

手動

在程式內調用

任務處理

Worker

處理任務的process

所有任務分發都經過這個主要的process，再做併發

任務監控

Flower

任務的監控網站

任務的搜尋要看一下官方範例，比較容易了解，網站寫的簡略

https://github.com/mher/flower/blob/master/tests/unit/utils/test_search.py

可經由nginx反向代理到外部，可簡單設定使用者驗證，可與prometheus和grafana結合

worker 參數

-n

命名

-l

log等級

–logfile

指定log位置與檔名，檔名會參考worker的命名

-c | –autoscale

指定併發數｜基礎併發數與最大併發數

如果使用autoscale，偵測到不足才會去scale啟動比較慢

-P

預設併發是 multiprocessing，可以改thread或Coroutine(支援gevent)，看任務取向，io類型用gevent，不用自己寫async/await，使用簡單，官方範例https://github.com/celery/celery/tree/main/examples/gevent

-Q

接收特定queue的任務，可指定多個

celery

distributed task queue

message queue

Task queue[一] Tq基礎

Task queue[三] Celery(二)

HankSky

Hello This World!

posts